All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 00/11] packed ring virtio-net backends support
@ 2019-02-14  4:26 wexu
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 01/11] virtio: rename structure for packed ring wexu
                   ` (11 more replies)
  0 siblings, 12 replies; 41+ messages in thread
From: wexu @ 2019-02-14  4:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: wexu, jasowang, mst, jfreiman, maxime.coquelin, tiwei.bie

From: Wei Xu <wexu@redhat.com>

https://github.com/Whishay/qemu.git 

Userspace and vhost-net backend test has been done with upstream kernel
in guest.

v3->v4:
    - add version number to the subject of each patch.(mst)

v2->v3:
    v2/01 - drop it since the header has been synchronized from kernel.(mst & jason)
    v3/01 - rename 'avail_wrap_counter' to 'last_avail_wrap_counter',
            'event_wrap_counter' to 'avail_wrap_counter' to make it easier
            to understand.(Jason)
          - revise commit message.(Jason)
    v3/02 - split packed ring areas size calculation to next patch.(Jason)
            to not break bisect(Jason).
    v3/03 - initialize packed ring region with correct size and attribute.
          - remove unnecessary 'else' checks. (Jason)
    v3/06 - add commit log.
          - replace 'event_wrap-counter' with 'avail_wrap_counter'.
          - merge common memory cache size check to virtqueue_get_avail_bytes().(Jason)
          - revise memory barrier comment.(Jason) 
          - check indirect descriptors by desc.len/sizeof(desc).(Jason)
          - flip wrap counter with '^=1'.(Jason)
    v3/07 - move desc.id/len initialization to the declaration.(Jason)
          - flip wrap counter '!' with '^=1'.(Jason)
          - add memory barrier comments in commit message.
    v3/08 - use offsetof() when writing cache.(Jason)
          - avoid duplicated memory region write when turning off event_idx
            supported notification.(Jason)
          - add commit log.(Jason)
          - add avail & last_avail wrap counter difference description in commit log.
    v3/09 - remove unnecessary used/avail idx/wrap-counter from subsection.
          - put new subsection to the end of vmstate_virtio.(Jason)
          - squash the two userspace and vhost-net migration patches in v2.(Jason)
    v3/10 - reword commit message.
          - this is a help not a bug fix so I would like to keep it as a
            separate patch still.(Proposed a merge it by Jason)
          - the virtqueue_fill() is also not like an API so I would prefer not
            to touch it, please correct me if I did not get it in the right
            way.(Proposed a squash by Jason)
    v3/11 - squash feature bits for user space and vhost kernel/user backends.
          - enable packed ring feature bit provision on host by default.(Jason)

Wei Xu (11):
  virtio: rename structure for packed ring
  virtio: device/driver area size calculation helper for split ring
  virtio: initialize packed ring region
  virtio: initialize wrap counter for packed ring
  virtio: queue/descriptor check helpers for packed ring
  virtio: get avail bytes check for packed ring
  virtio: fill/flush/pop for packed ring
  virtio: event suppression support for packed ring
  virtio-net: update the head descriptor in a chain lastly
  virtio: migration support for packed ring
  virtio: CLI and provide packed ring feature bit by default

 hw/net/vhost_net.c         |   2 +
 hw/net/virtio-net.c        |  11 +-
 hw/virtio/virtio.c         | 798 +++++++++++++++++++++++++++++++++++++++++----
 include/hw/virtio/virtio.h |   4 +-
 4 files changed, 757 insertions(+), 58 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v4 01/11] virtio: rename structure for packed ring
  2019-02-14  4:26 [Qemu-devel] [PATCH v4 00/11] packed ring virtio-net backends support wexu
@ 2019-02-14  4:26 ` wexu
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 02/11] virtio: device/driver area size calculation helper for split ring wexu
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 41+ messages in thread
From: wexu @ 2019-02-14  4:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: wexu, jasowang, mst, jfreiman, maxime.coquelin, tiwei.bie

From: Wei Xu <wexu@redhat.com>

Redefine packed ring structure according to Qemu nomenclature.

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/virtio/virtio.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index a1ff647..eafb4cc 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -39,6 +39,13 @@ typedef struct VRingDesc
     uint16_t next;
 } VRingDesc;
 
+typedef struct VRingPackedDesc {
+    uint64_t addr;
+    uint32_t len;
+    uint16_t id;
+    uint16_t flags;
+} VRingPackedDesc;
+
 typedef struct VRingAvail
 {
     uint16_t flags;
@@ -77,17 +84,25 @@ typedef struct VRing
     VRingMemoryRegionCaches *caches;
 } VRing;
 
+typedef struct VRingPackedDescEvent {
+    uint16_t off_wrap;
+    uint16_t flags;
+} VRingPackedDescEvent ;
+
 struct VirtQueue
 {
     VRing vring;
 
     /* Next head to pop */
     uint16_t last_avail_idx;
+    bool last_avail_wrap_counter;
 
     /* Last avail_idx read from VQ. */
     uint16_t shadow_avail_idx;
+    bool avail_wrap_counter;
 
     uint16_t used_idx;
+    bool used_wrap_counter;
 
     /* Last used index value we have signalled on */
     uint16_t signalled_used;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v4 02/11] virtio: device/driver area size calculation helper for split ring
  2019-02-14  4:26 [Qemu-devel] [PATCH v4 00/11] packed ring virtio-net backends support wexu
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 01/11] virtio: rename structure for packed ring wexu
@ 2019-02-14  4:26 ` wexu
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 03/11] virtio: initialize packed ring region wexu
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 41+ messages in thread
From: wexu @ 2019-02-14  4:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: wexu, jasowang, mst, jfreiman, maxime.coquelin, tiwei.bie

From: Wei Xu <wexu@redhat.com>

There is slight size difference between split/packed rings.

This is a refactor of split ring as well as a helper to expand
device and driver area size calculation for packed ring.

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/virtio/virtio.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index eafb4cc..6769e54 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -155,10 +155,8 @@ static void virtio_init_region_cache(VirtIODevice *vdev, int n)
     VRingMemoryRegionCaches *old = vq->vring.caches;
     VRingMemoryRegionCaches *new = NULL;
     hwaddr addr, size;
-    int event_size;
     int64_t len;
 
-    event_size = virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
 
     addr = vq->vring.desc;
     if (!addr) {
@@ -173,7 +171,7 @@ static void virtio_init_region_cache(VirtIODevice *vdev, int n)
         goto err_desc;
     }
 
-    size = virtio_queue_get_used_size(vdev, n) + event_size;
+    size = virtio_queue_get_used_size(vdev, n);
     len = address_space_cache_init(&new->used, vdev->dma_as,
                                    vq->vring.used, size, true);
     if (len < size) {
@@ -181,7 +179,7 @@ static void virtio_init_region_cache(VirtIODevice *vdev, int n)
         goto err_used;
     }
 
-    size = virtio_queue_get_avail_size(vdev, n) + event_size;
+    size = virtio_queue_get_avail_size(vdev, n);
     len = address_space_cache_init(&new->avail, vdev->dma_as,
                                    vq->vring.avail, size, false);
     if (len < size) {
@@ -2335,14 +2333,20 @@ hwaddr virtio_queue_get_desc_size(VirtIODevice *vdev, int n)
 
 hwaddr virtio_queue_get_avail_size(VirtIODevice *vdev, int n)
 {
+    int s;
+
+    s = virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
     return offsetof(VRingAvail, ring) +
-        sizeof(uint16_t) * vdev->vq[n].vring.num;
+        sizeof(uint16_t) * vdev->vq[n].vring.num + s;
 }
 
 hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n)
 {
+    int s;
+
+    s = virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
     return offsetof(VRingUsed, ring) +
-        sizeof(VRingUsedElem) * vdev->vq[n].vring.num;
+        sizeof(VRingUsedElem) * vdev->vq[n].vring.num + s;
 }
 
 uint16_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v4 03/11] virtio: initialize packed ring region
  2019-02-14  4:26 [Qemu-devel] [PATCH v4 00/11] packed ring virtio-net backends support wexu
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 01/11] virtio: rename structure for packed ring wexu
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 02/11] virtio: device/driver area size calculation helper for split ring wexu
@ 2019-02-14  4:26 ` wexu
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 04/11] virtio: initialize wrap counter for packed ring wexu
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 41+ messages in thread
From: wexu @ 2019-02-14  4:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: wexu, jasowang, mst, jfreiman, maxime.coquelin, tiwei.bie

From: Wei Xu <wexu@redhat.com>

Initialize packed ring memory region with correct size and attribute.

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/virtio/virtio.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 6769e54..1a98e61 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -156,7 +156,7 @@ static void virtio_init_region_cache(VirtIODevice *vdev, int n)
     VRingMemoryRegionCaches *new = NULL;
     hwaddr addr, size;
     int64_t len;
-
+    bool attr;
 
     addr = vq->vring.desc;
     if (!addr) {
@@ -164,8 +164,10 @@ static void virtio_init_region_cache(VirtIODevice *vdev, int n)
     }
     new = g_new0(VRingMemoryRegionCaches, 1);
     size = virtio_queue_get_desc_size(vdev, n);
+    attr = virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED) ?
+                                   true : false;
     len = address_space_cache_init(&new->desc, vdev->dma_as,
-                                   addr, size, false);
+                                   addr, size, attr);
     if (len < size) {
         virtio_error(vdev, "Cannot map desc");
         goto err_desc;
@@ -2335,6 +2337,10 @@ hwaddr virtio_queue_get_avail_size(VirtIODevice *vdev, int n)
 {
     int s;
 
+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+        return sizeof(struct VRingPackedDescEvent);
+    }
+
     s = virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
     return offsetof(VRingAvail, ring) +
         sizeof(uint16_t) * vdev->vq[n].vring.num + s;
@@ -2344,6 +2350,10 @@ hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n)
 {
     int s;
 
+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+        return sizeof(struct VRingPackedDescEvent);
+    }
+
     s = virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
     return offsetof(VRingUsed, ring) +
         sizeof(VRingUsedElem) * vdev->vq[n].vring.num + s;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v4 04/11] virtio: initialize wrap counter for packed ring
  2019-02-14  4:26 [Qemu-devel] [PATCH v4 00/11] packed ring virtio-net backends support wexu
                   ` (2 preceding siblings ...)
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 03/11] virtio: initialize packed ring region wexu
@ 2019-02-14  4:26 ` wexu
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 05/11] virtio: queue/descriptor check helpers " wexu
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 41+ messages in thread
From: wexu @ 2019-02-14  4:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: wexu, jasowang, mst, jfreiman, maxime.coquelin, tiwei.bie

From: Wei Xu <wexu@redhat.com>

Set to 'true' by default due to spec.

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/virtio/virtio.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 1a98e61..54dc098 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1238,6 +1238,9 @@ void virtio_reset(void *opaque)
         vdev->vq[i].last_avail_idx = 0;
         vdev->vq[i].shadow_avail_idx = 0;
         vdev->vq[i].used_idx = 0;
+        vdev->vq[i].last_avail_wrap_counter = true;
+        vdev->vq[i].avail_wrap_counter = true;
+        vdev->vq[i].used_wrap_counter = true;
         virtio_queue_set_vector(vdev, i, VIRTIO_NO_VECTOR);
         vdev->vq[i].signalled_used = 0;
         vdev->vq[i].signalled_used_valid = false;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v4 05/11] virtio: queue/descriptor check helpers for packed ring
  2019-02-14  4:26 [Qemu-devel] [PATCH v4 00/11] packed ring virtio-net backends support wexu
                   ` (3 preceding siblings ...)
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 04/11] virtio: initialize wrap counter for packed ring wexu
@ 2019-02-14  4:26 ` wexu
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 06/11] virtio: get avail bytes check " wexu
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 41+ messages in thread
From: wexu @ 2019-02-14  4:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: wexu, jasowang, mst, jfreiman, maxime.coquelin, tiwei.bie

From: Wei Xu <wexu@redhat.com>

These are descriptor available and queue empty check helpers.

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/virtio/virtio.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 54dc098..f2ff980 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -368,6 +368,25 @@ int virtio_queue_ready(VirtQueue *vq)
     return vq->vring.avail != 0;
 }
 
+static void vring_packed_desc_read_flags(VirtIODevice *vdev,
+                    VRingPackedDesc *desc, MemoryRegionCache *cache, int i)
+{
+    address_space_read_cached(cache,
+              i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, flags),
+              &desc->flags, sizeof(desc->flags));
+    virtio_tswap16s(vdev, &desc->flags);
+}
+
+static inline bool is_desc_avail(struct VRingPackedDesc *desc,
+                                bool wrap_counter)
+{
+    bool avail, used;
+
+    avail = !!(desc->flags & (1 << VRING_PACKED_DESC_F_AVAIL));
+    used = !!(desc->flags & (1 << VRING_PACKED_DESC_F_USED));
+    return (avail != used) && (avail == wrap_counter);
+}
+
 /* Fetch avail_idx from VQ memory only when we really need to know if
  * guest has added some buffers.
  * Called within rcu_read_lock().  */
@@ -388,7 +407,7 @@ static int virtio_queue_empty_rcu(VirtQueue *vq)
     return vring_avail_idx(vq) == vq->last_avail_idx;
 }
 
-int virtio_queue_empty(VirtQueue *vq)
+static int virtio_queue_split_empty(VirtQueue *vq)
 {
     bool empty;
 
@@ -410,6 +429,41 @@ int virtio_queue_empty(VirtQueue *vq)
     return empty;
 }
 
+static int virtio_queue_packed_empty_rcu(VirtQueue *vq)
+{
+    struct VRingPackedDesc desc;
+    VRingMemoryRegionCaches *cache;
+
+    if (unlikely(!vq->vring.desc)) {
+        return 1;
+    }
+
+    cache = vring_get_region_caches(vq);
+    vring_packed_desc_read_flags(vq->vdev, &desc, &cache->desc,
+                                vq->last_avail_idx);
+
+    return !is_desc_avail(&desc, vq->last_avail_wrap_counter);
+}
+
+static int virtio_queue_packed_empty(VirtQueue *vq)
+{
+    bool empty;
+
+    rcu_read_lock();
+    empty = virtio_queue_packed_empty_rcu(vq);
+    rcu_read_unlock();
+    return empty;
+}
+
+int virtio_queue_empty(VirtQueue *vq)
+{
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+        return virtio_queue_packed_empty(vq);
+    } else {
+        return virtio_queue_split_empty(vq);
+    }
+}
+
 static void virtqueue_unmap_sg(VirtQueue *vq, const VirtQueueElement *elem,
                                unsigned int len)
 {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v4 06/11] virtio: get avail bytes check for packed ring
  2019-02-14  4:26 [Qemu-devel] [PATCH v4 00/11] packed ring virtio-net backends support wexu
                   ` (4 preceding siblings ...)
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 05/11] virtio: queue/descriptor check helpers " wexu
@ 2019-02-14  4:26 ` wexu
  2019-02-18  7:27   ` Jason Wang
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 07/11] virtio: fill/flush/pop " wexu
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 41+ messages in thread
From: wexu @ 2019-02-14  4:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: wexu, jasowang, mst, jfreiman, maxime.coquelin, tiwei.bie

From: Wei Xu <wexu@redhat.com>

Add packed ring headcount check.

Common part of split/packed ring are kept.

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/virtio/virtio.c | 197 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 179 insertions(+), 18 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index f2ff980..832287b 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -368,6 +368,17 @@ int virtio_queue_ready(VirtQueue *vq)
     return vq->vring.avail != 0;
 }
 
+static void vring_packed_desc_read(VirtIODevice *vdev, VRingPackedDesc *desc,
+                            MemoryRegionCache *cache, int i)
+{
+    address_space_read_cached(cache, i * sizeof(VRingPackedDesc),
+                              desc, sizeof(VRingPackedDesc));
+    virtio_tswap16s(vdev, &desc->flags);
+    virtio_tswap64s(vdev, &desc->addr);
+    virtio_tswap32s(vdev, &desc->len);
+    virtio_tswap16s(vdev, &desc->id);
+}
+
 static void vring_packed_desc_read_flags(VirtIODevice *vdev,
                     VRingPackedDesc *desc, MemoryRegionCache *cache, int i)
 {
@@ -667,9 +678,9 @@ static int virtqueue_read_next_desc(VirtIODevice *vdev, VRingDesc *desc,
     return VIRTQUEUE_READ_DESC_MORE;
 }
 
-void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
-                               unsigned int *out_bytes,
-                               unsigned max_in_bytes, unsigned max_out_bytes)
+static void virtqueue_split_get_avail_bytes(VirtQueue *vq,
+                            unsigned int *in_bytes, unsigned int *out_bytes,
+                            unsigned max_in_bytes, unsigned max_out_bytes)
 {
     VirtIODevice *vdev = vq->vdev;
     unsigned int max, idx;
@@ -679,27 +690,12 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
     int64_t len = 0;
     int rc;
 
-    if (unlikely(!vq->vring.desc)) {
-        if (in_bytes) {
-            *in_bytes = 0;
-        }
-        if (out_bytes) {
-            *out_bytes = 0;
-        }
-        return;
-    }
-
     rcu_read_lock();
     idx = vq->last_avail_idx;
     total_bufs = in_total = out_total = 0;
 
     max = vq->vring.num;
     caches = vring_get_region_caches(vq);
-    if (caches->desc.len < max * sizeof(VRingDesc)) {
-        virtio_error(vdev, "Cannot map descriptor ring");
-        goto err;
-    }
-
     while ((rc = virtqueue_num_heads(vq, idx)) > 0) {
         MemoryRegionCache *desc_cache = &caches->desc;
         unsigned int num_bufs;
@@ -792,6 +788,171 @@ err:
     goto done;
 }
 
+static void virtqueue_packed_get_avail_bytes(VirtQueue *vq,
+                            unsigned int *in_bytes, unsigned int *out_bytes,
+                            unsigned max_in_bytes, unsigned max_out_bytes)
+{
+    VirtIODevice *vdev = vq->vdev;
+    unsigned int max, idx;
+    unsigned int total_bufs, in_total, out_total;
+    MemoryRegionCache *desc_cache;
+    VRingMemoryRegionCaches *caches;
+    MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
+    int64_t len = 0;
+    VRingPackedDesc desc;
+    bool wrap_counter;
+
+    rcu_read_lock();
+    idx = vq->last_avail_idx;
+    wrap_counter = vq->last_avail_wrap_counter;
+    total_bufs = in_total = out_total = 0;
+
+    max = vq->vring.num;
+    caches = vring_get_region_caches(vq);
+    desc_cache = &caches->desc;
+    vring_packed_desc_read_flags(vdev, &desc, desc_cache, idx);
+    while (is_desc_avail(&desc, wrap_counter)) {
+        unsigned int num_bufs;
+        unsigned int i = 0;
+
+        num_bufs = total_bufs;
+
+        /* Make sure flags has been read before all the fields. */
+        smp_rmb();
+        vring_packed_desc_read(vdev, &desc, desc_cache, idx);
+
+        if (desc.flags & VRING_DESC_F_INDIRECT) {
+            if (desc.len % sizeof(VRingPackedDesc)) {
+                virtio_error(vdev, "Invalid size for indirect buffer table");
+                goto err;
+            }
+
+            /* If we've got too many, that implies a descriptor loop. */
+            if (num_bufs >= max) {
+                virtio_error(vdev, "Looped descriptor");
+                goto err;
+            }
+
+            /* loop over the indirect descriptor table */
+            len = address_space_cache_init(&indirect_desc_cache,
+                                           vdev->dma_as,
+                                           desc.addr, desc.len, false);
+            desc_cache = &indirect_desc_cache;
+            if (len < desc.len) {
+                virtio_error(vdev, "Cannot map indirect buffer");
+                goto err;
+            }
+
+            max = desc.len / sizeof(VRingPackedDesc);
+            num_bufs = i = 0;
+            vring_packed_desc_read(vdev, &desc, desc_cache, i);
+        }
+
+        do {
+            /* If we've got too many, that implies a descriptor loop. */
+            if (++num_bufs > max) {
+                virtio_error(vdev, "Looped descriptor");
+                goto err;
+            }
+
+            if (desc.flags & VRING_DESC_F_WRITE) {
+                in_total += desc.len;
+            } else {
+                out_total += desc.len;
+            }
+            if (in_total >= max_in_bytes && out_total >= max_out_bytes) {
+                goto done;
+            }
+
+            if (desc_cache == &indirect_desc_cache) {
+                if (++i >= max) {
+                    break;
+                }
+                vring_packed_desc_read(vdev, &desc, desc_cache, i);
+            } else {
+                if (++idx >= vq->vring.num) {
+                    idx -= vq->vring.num;
+                    wrap_counter ^= 1;
+                }
+                vring_packed_desc_read(vdev, &desc, desc_cache, idx);
+            }
+            /* Make sure flags has been read after all the other fields */
+            smp_rmb();
+        } while (desc.flags & VRING_DESC_F_NEXT);
+
+        if (desc_cache == &indirect_desc_cache) {
+            address_space_cache_destroy(&indirect_desc_cache);
+            total_bufs++;
+            /* We missed one step on for indirect desc */
+            idx++;
+            if (++idx >= vq->vring.num) {
+                idx -= vq->vring.num;
+                wrap_counter ^= 1;
+            }
+        } else {
+            total_bufs = num_bufs;
+        }
+
+        desc_cache = &caches->desc;
+        vring_packed_desc_read_flags(vdev, &desc, desc_cache, idx);
+    }
+
+    /* Record the index and wrap counter for a kick we want */
+    vq->shadow_avail_idx = idx;
+    vq->avail_wrap_counter = wrap_counter;
+done:
+    address_space_cache_destroy(&indirect_desc_cache);
+    if (in_bytes) {
+        *in_bytes = in_total;
+    }
+    if (out_bytes) {
+        *out_bytes = out_total;
+    }
+    rcu_read_unlock();
+    return;
+
+err:
+    in_total = out_total = 0;
+    goto done;
+}
+
+void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
+                               unsigned int *out_bytes,
+                               unsigned max_in_bytes, unsigned max_out_bytes)
+{
+    uint16_t desc_size;
+    VRingMemoryRegionCaches *caches;
+
+    if (unlikely(!vq->vring.desc)) {
+        goto err;
+    }
+
+    caches = vring_get_region_caches(vq);
+    desc_size = virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED) ?
+                                sizeof(VRingPackedDesc) : sizeof(VRingDesc);
+    if (caches->desc.len < vq->vring.num * desc_size) {
+        virtio_error(vq->vdev, "Cannot map descriptor ring");
+        goto err;
+    }
+
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+        virtqueue_packed_get_avail_bytes(vq, in_bytes, out_bytes,
+                                         max_in_bytes, max_out_bytes);
+    } else {
+        virtqueue_split_get_avail_bytes(vq, in_bytes, out_bytes,
+                                        max_in_bytes, max_out_bytes);
+    }
+
+    return;
+err:
+    if (in_bytes) {
+        *in_bytes = 0;
+    }
+    if (out_bytes) {
+        *out_bytes = 0;
+    }
+}
+
 int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
                           unsigned int out_bytes)
 {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v4 07/11] virtio: fill/flush/pop for packed ring
  2019-02-14  4:26 [Qemu-devel] [PATCH v4 00/11] packed ring virtio-net backends support wexu
                   ` (5 preceding siblings ...)
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 06/11] virtio: get avail bytes check " wexu
@ 2019-02-14  4:26 ` wexu
  2019-02-18  7:51   ` Jason Wang
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 08/11] virtio: event suppression support " wexu
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 41+ messages in thread
From: wexu @ 2019-02-14  4:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: wexu, jasowang, mst, jfreiman, maxime.coquelin, tiwei.bie

From: Wei Xu <wexu@redhat.com>

last_used_idx/wrap_counter should be equal to last_avail_idx/wrap_counter
after a successful flush.

Batching in vhost-net & dpdk testpmd is not equivalently supported in
userspace backend, but a chained descriptors for Rx is similarly presented
as a lightweight batch, so a write barrier is nailed only for the
first(head) descriptor.

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/virtio/virtio.c | 291 +++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 274 insertions(+), 17 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 832287b..7e276b4 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -379,6 +379,25 @@ static void vring_packed_desc_read(VirtIODevice *vdev, VRingPackedDesc *desc,
     virtio_tswap16s(vdev, &desc->id);
 }
 
+static void vring_packed_desc_write_data(VirtIODevice *vdev,
+                    VRingPackedDesc *desc, MemoryRegionCache *cache, int i)
+{
+    virtio_tswap32s(vdev, &desc->len);
+    virtio_tswap16s(vdev, &desc->id);
+    address_space_write_cached(cache,
+              i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, id),
+              &desc->id, sizeof(desc->id));
+    address_space_cache_invalidate(cache,
+              i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, id),
+              sizeof(desc->id));
+    address_space_write_cached(cache,
+              i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, len),
+              &desc->len, sizeof(desc->len));
+    address_space_cache_invalidate(cache,
+              i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, len),
+              sizeof(desc->len));
+}
+
 static void vring_packed_desc_read_flags(VirtIODevice *vdev,
                     VRingPackedDesc *desc, MemoryRegionCache *cache, int i)
 {
@@ -388,6 +407,18 @@ static void vring_packed_desc_read_flags(VirtIODevice *vdev,
     virtio_tswap16s(vdev, &desc->flags);
 }
 
+static void vring_packed_desc_write_flags(VirtIODevice *vdev,
+                    VRingPackedDesc *desc, MemoryRegionCache *cache, int i)
+{
+    virtio_tswap16s(vdev, &desc->flags);
+    address_space_write_cached(cache,
+              i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, flags),
+              &desc->flags, sizeof(desc->flags));
+    address_space_cache_invalidate(cache,
+              i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, flags),
+              sizeof(desc->flags));
+}
+
 static inline bool is_desc_avail(struct VRingPackedDesc *desc,
                                 bool wrap_counter)
 {
@@ -554,19 +585,11 @@ bool virtqueue_rewind(VirtQueue *vq, unsigned int num)
 }
 
 /* Called within rcu_read_lock().  */
-void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
+static void virtqueue_split_fill(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len, unsigned int idx)
 {
     VRingUsedElem uelem;
 
-    trace_virtqueue_fill(vq, elem, len, idx);
-
-    virtqueue_unmap_sg(vq, elem, len);
-
-    if (unlikely(vq->vdev->broken)) {
-        return;
-    }
-
     if (unlikely(!vq->vring.used)) {
         return;
     }
@@ -578,16 +601,71 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
     vring_used_write(vq, &uelem, idx);
 }
 
-/* Called within rcu_read_lock().  */
-void virtqueue_flush(VirtQueue *vq, unsigned int count)
+static void virtqueue_packed_fill(VirtQueue *vq, const VirtQueueElement *elem,
+                        unsigned int len, unsigned int idx)
 {
-    uint16_t old, new;
+    uint16_t head;
+    VRingMemoryRegionCaches *caches;
+    VRingPackedDesc desc = {
+        .flags = 0,
+        .id = elem->index,
+        .len = len,
+    };
+    bool wrap_counter = vq->used_wrap_counter;
+
+    if (unlikely(!vq->vring.desc)) {
+        return;
+    }
+
+    head = vq->used_idx + idx;
+    if (head >= vq->vring.num) {
+        head -= vq->vring.num;
+        wrap_counter ^= 1;
+    }
+    if (wrap_counter) {
+        desc.flags |= (1 << VRING_PACKED_DESC_F_AVAIL);
+        desc.flags |= (1 << VRING_PACKED_DESC_F_USED);
+    } else {
+        desc.flags &= ~(1 << VRING_PACKED_DESC_F_AVAIL);
+        desc.flags &= ~(1 << VRING_PACKED_DESC_F_USED);
+    }
+
+    caches = vring_get_region_caches(vq);
+    vring_packed_desc_write_data(vq->vdev, &desc, &caches->desc, head);
+    if (idx == 0) {
+        /*
+         * Make sure descriptor id and len is written before
+         * flags for the first used buffer.
+         */
+        smp_wmb();
+    }
+
+    vring_packed_desc_write_flags(vq->vdev, &desc, &caches->desc, head);
+}
+
+void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
+                    unsigned int len, unsigned int idx)
+{
+    trace_virtqueue_fill(vq, elem, len, idx);
+
+    virtqueue_unmap_sg(vq, elem, len);
 
     if (unlikely(vq->vdev->broken)) {
-        vq->inuse -= count;
         return;
     }
 
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+        virtqueue_packed_fill(vq, elem, len, idx);
+    } else {
+        virtqueue_split_fill(vq, elem, len, idx);
+    }
+}
+
+/* Called within rcu_read_lock().  */
+static void virtqueue_split_flush(VirtQueue *vq, unsigned int count)
+{
+    uint16_t old, new;
+
     if (unlikely(!vq->vring.used)) {
         return;
     }
@@ -603,6 +681,31 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
         vq->signalled_used_valid = false;
 }
 
+static void virtqueue_packed_flush(VirtQueue *vq, unsigned int count)
+{
+    if (unlikely(!vq->vring.desc)) {
+        return;
+    }
+
+    vq->inuse -= count;
+    vq->used_idx = vq->last_avail_idx;
+    vq->used_wrap_counter = vq->last_avail_wrap_counter;
+}
+
+void virtqueue_flush(VirtQueue *vq, unsigned int count)
+{
+    if (unlikely(vq->vdev->broken)) {
+        vq->inuse -= count;
+        return;
+    }
+
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+        virtqueue_packed_flush(vq, count);
+    } else {
+        virtqueue_split_flush(vq, count);
+    }
+}
+
 void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len)
 {
@@ -1077,7 +1180,7 @@ static void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_nu
     return elem;
 }
 
-void *virtqueue_pop(VirtQueue *vq, size_t sz)
+static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
 {
     unsigned int i, head, max;
     VRingMemoryRegionCaches *caches;
@@ -1092,9 +1195,6 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     VRingDesc desc;
     int rc;
 
-    if (unlikely(vdev->broken)) {
-        return NULL;
-    }
     rcu_read_lock();
     if (virtio_queue_empty_rcu(vq)) {
         goto done;
@@ -1212,6 +1312,163 @@ err_undo_map:
     goto done;
 }
 
+static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
+{
+    unsigned int i, head, max;
+    VRingMemoryRegionCaches *caches;
+    MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
+    MemoryRegionCache *cache;
+    int64_t len;
+    VirtIODevice *vdev = vq->vdev;
+    VirtQueueElement *elem = NULL;
+    unsigned out_num, in_num, elem_entries;
+    hwaddr addr[VIRTQUEUE_MAX_SIZE];
+    struct iovec iov[VIRTQUEUE_MAX_SIZE];
+    VRingPackedDesc desc;
+    uint16_t id;
+
+    rcu_read_lock();
+    if (virtio_queue_packed_empty_rcu(vq)) {
+        goto done;
+    }
+
+    /* When we start there are none of either input nor output. */
+    out_num = in_num = elem_entries = 0;
+
+    max = vq->vring.num;
+
+    if (vq->inuse >= vq->vring.num) {
+        virtio_error(vdev, "Virtqueue size exceeded");
+        goto done;
+    }
+
+    head = vq->last_avail_idx;
+    i = head;
+
+    caches = vring_get_region_caches(vq);
+    cache = &caches->desc;
+
+    /* make sure flags is read before all the other fields */
+    smp_rmb();
+    vring_packed_desc_read(vdev, &desc, cache, i);
+
+    id = desc.id;
+    if (desc.flags & VRING_DESC_F_INDIRECT) {
+
+        if (desc.len % sizeof(VRingPackedDesc)) {
+            virtio_error(vdev, "Invalid size for indirect buffer table");
+            goto done;
+        }
+
+        /* loop over the indirect descriptor table */
+        len = address_space_cache_init(&indirect_desc_cache, vdev->dma_as,
+                                       desc.addr, desc.len, false);
+        cache = &indirect_desc_cache;
+        if (len < desc.len) {
+            virtio_error(vdev, "Cannot map indirect buffer");
+            goto done;
+        }
+
+        max = desc.len / sizeof(VRingPackedDesc);
+        i = 0;
+        vring_packed_desc_read(vdev, &desc, cache, i);
+        /* Make sure we see all the fields*/
+        smp_rmb();
+    }
+
+    /* Collect all the descriptors */
+    while (1) {
+        bool map_ok;
+
+        if (desc.flags & VRING_DESC_F_WRITE) {
+            map_ok = virtqueue_map_desc(vdev, &in_num, addr + out_num,
+                                        iov + out_num,
+                                        VIRTQUEUE_MAX_SIZE - out_num, true,
+                                        desc.addr, desc.len);
+        } else {
+            if (in_num) {
+                virtio_error(vdev, "Incorrect order for descriptors");
+                goto err_undo_map;
+            }
+            map_ok = virtqueue_map_desc(vdev, &out_num, addr, iov,
+                                        VIRTQUEUE_MAX_SIZE, false,
+                                        desc.addr, desc.len);
+        }
+        if (!map_ok) {
+            goto err_undo_map;
+        }
+
+        /* If we've got too many, that implies a descriptor loop. */
+        if (++elem_entries > max) {
+            virtio_error(vdev, "Looped descriptor");
+            goto err_undo_map;
+        }
+
+        if (++i >= vq->vring.num) {
+            i -= vq->vring.num;
+        }
+
+        if (cache == &indirect_desc_cache) {
+            if (i == max) {
+                break;
+            }
+            vring_packed_desc_read(vq->vdev, &desc, cache, i);
+        } else if (desc.flags & VRING_DESC_F_NEXT) {
+            vring_packed_desc_read(vq->vdev, &desc, cache, i);
+        } else {
+            break;
+        }
+    }
+
+    /* Now copy what we have collected and mapped */
+    elem = virtqueue_alloc_element(sz, out_num, in_num);
+    elem->index = id;
+    for (i = 0; i < out_num; i++) {
+        elem->out_addr[i] = addr[i];
+        elem->out_sg[i] = iov[i];
+    }
+    for (i = 0; i < in_num; i++) {
+        elem->in_addr[i] = addr[head + out_num + i];
+        elem->in_sg[i] = iov[out_num + i];
+    }
+
+    vq->last_avail_idx += (cache == &indirect_desc_cache) ?
+                          1 : elem_entries;
+    if (vq->last_avail_idx >= vq->vring.num) {
+        vq->last_avail_idx -= vq->vring.num;
+        vq->last_avail_wrap_counter ^= 1;
+    }
+    vq->inuse++;
+
+    vq->shadow_avail_idx = vq->last_avail_idx;
+    vq->avail_wrap_counter = vq->last_avail_wrap_counter;
+
+    trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
+done:
+    address_space_cache_destroy(&indirect_desc_cache);
+    rcu_read_unlock();
+
+    return elem;
+
+err_undo_map:
+    virtqueue_undo_map_desc(out_num, in_num, iov);
+    g_free(elem);
+    goto done;
+}
+
+void *virtqueue_pop(VirtQueue *vq, size_t sz)
+{
+    if (unlikely(vq->vdev->broken)) {
+        return NULL;
+    }
+
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+        return virtqueue_packed_pop(vq, sz);
+    } else {
+        return virtqueue_split_pop(vq, sz);
+    }
+}
+
 /* virtqueue_drop_all:
  * @vq: The #VirtQueue
  * Drops all queued buffers and indicates them to the guest
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v4 08/11] virtio: event suppression support for packed ring
  2019-02-14  4:26 [Qemu-devel] [PATCH v4 00/11] packed ring virtio-net backends support wexu
                   ` (6 preceding siblings ...)
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 07/11] virtio: fill/flush/pop " wexu
@ 2019-02-14  4:26 ` wexu
  2019-02-19  7:19   ` Jason Wang
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 09/11] virtio-net: update the head descriptor in a chain lastly wexu
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 41+ messages in thread
From: wexu @ 2019-02-14  4:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: wexu, jasowang, mst, jfreiman, maxime.coquelin, tiwei.bie

From: Wei Xu <wexu@redhat.com>

Difference between 'avail_wrap_counter' and 'last_avail_wrap_counter':
For Tx(guest transmitting), they are the same after each pop of a desc.

For Rx(guest receiving), they are also the same when there are enough
descriptors to carry the payload for a packet(e.g. usually 16 descs are
needed for a 64k packet in typical iperf tcp connection with tso enabled),
however, when the ring is running out of descriptors while there are
still a few free ones, e.g. 6 descriptors are available which is not
enough to carry an entire packet which needs 16 descriptors, in this
case the 'avail_wrap_counter' should be set as the first one pending
being handled by guest driver in order to get a notification, and the
'last_avail_wrap_counter' should stay unchanged to the head of available
descriptors, like below:

Mark meaning:
    | | -- available
    |*| -- used

A Snapshot of the queue:
                                      last_avail_idx = 253
                                      last_avail_wrap_counter = 1
                                             |
    +---------------------------------------------+
 0  | | | |*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*| | | | 255
    +---------------------------------------------+
           |
  shadow_avail_idx = 3
  avail_wrap_counter = 0

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/virtio/virtio.c | 137 +++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 128 insertions(+), 9 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 7e276b4..8cfc7b6 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -234,6 +234,34 @@ static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
     virtio_tswap16s(vdev, &desc->next);
 }
 
+static void vring_packed_event_read(VirtIODevice *vdev,
+                            MemoryRegionCache *cache, VRingPackedDescEvent *e)
+{
+    address_space_read_cached(cache, 0, e, sizeof(*e));
+    virtio_tswap16s(vdev, &e->off_wrap);
+    virtio_tswap16s(vdev, &e->flags);
+}
+
+static void vring_packed_off_wrap_write(VirtIODevice *vdev,
+                            MemoryRegionCache *cache, uint16_t off_wrap)
+{
+    virtio_tswap16s(vdev, &off_wrap);
+    address_space_write_cached(cache, offsetof(VRingPackedDescEvent, off_wrap),
+                            &off_wrap, sizeof(off_wrap));
+    address_space_cache_invalidate(cache,
+                offsetof(VRingPackedDescEvent, off_wrap), sizeof(off_wrap));
+}
+
+static void vring_packed_flags_write(VirtIODevice *vdev,
+                            MemoryRegionCache *cache, uint16_t flags)
+{
+    virtio_tswap16s(vdev, &flags);
+    address_space_write_cached(cache, offsetof(VRingPackedDescEvent, flags),
+                            &flags, sizeof(flags));
+    address_space_cache_invalidate(cache,
+                        offsetof(VRingPackedDescEvent, flags), sizeof(flags));
+}
+
 static VRingMemoryRegionCaches *vring_get_region_caches(struct VirtQueue *vq)
 {
     VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
@@ -340,14 +368,8 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
     address_space_cache_invalidate(&caches->used, pa, sizeof(val));
 }
 
-void virtio_queue_set_notification(VirtQueue *vq, int enable)
+static void virtio_queue_set_notification_split(VirtQueue *vq, int enable)
 {
-    vq->notification = enable;
-
-    if (!vq->vring.desc) {
-        return;
-    }
-
     rcu_read_lock();
     if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
         vring_set_avail_event(vq, vring_avail_idx(vq));
@@ -363,6 +385,57 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable)
     rcu_read_unlock();
 }
 
+static void virtio_queue_set_notification_packed(VirtQueue *vq, int enable)
+{
+    VRingPackedDescEvent e;
+    VRingMemoryRegionCaches *caches;
+
+    rcu_read_lock();
+    caches  = vring_get_region_caches(vq);
+    vring_packed_event_read(vq->vdev, &caches->used, &e);
+
+    if (!enable) {
+        if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
+            /* no need to write device area since this is outdated. */
+            goto out;
+        }
+
+        e.flags = VRING_PACKED_EVENT_FLAG_DISABLE;
+        goto update;
+    }
+
+    e.flags = VRING_PACKED_EVENT_FLAG_ENABLE;
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
+        uint16_t off_wrap = vq->shadow_avail_idx | vq->avail_wrap_counter << 15;
+
+        vring_packed_off_wrap_write(vq->vdev, &caches->used, off_wrap);
+        /* Make sure off_wrap is wrote before flags */
+        smp_wmb();
+
+        e.flags = VRING_PACKED_EVENT_FLAG_DESC;
+    }
+
+update:
+    vring_packed_flags_write(vq->vdev, &caches->used, e.flags);
+out:
+    rcu_read_unlock();
+}
+
+void virtio_queue_set_notification(VirtQueue *vq, int enable)
+{
+    vq->notification = enable;
+
+    if (!vq->vring.desc) {
+        return;
+    }
+
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+        virtio_queue_set_notification_packed(vq, enable);
+    } else {
+        virtio_queue_set_notification_split(vq, enable);
+    }
+}
+
 int virtio_queue_ready(VirtQueue *vq)
 {
     return vq->vring.avail != 0;
@@ -2117,8 +2190,7 @@ static void virtio_set_isr(VirtIODevice *vdev, int value)
     }
 }
 
-/* Called within rcu_read_lock().  */
-static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
+static bool virtio_split_should_notify(VirtIODevice *vdev, VirtQueue *vq)
 {
     uint16_t old, new;
     bool v;
@@ -2141,6 +2213,53 @@ static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
     return !v || vring_need_event(vring_get_used_event(vq), new, old);
 }
 
+static bool vring_packed_need_event(VirtQueue *vq, bool wrap,
+                            uint16_t off_wrap, uint16_t new, uint16_t old)
+{
+    int off = off_wrap & ~(1 << 15);
+
+    if (wrap != off_wrap >> 15) {
+        off -= vq->vring.num;
+    }
+
+    return vring_need_event(off, new, old);
+}
+
+static bool virtio_packed_should_notify(VirtIODevice *vdev, VirtQueue *vq)
+{
+    VRingPackedDescEvent e;
+    uint16_t old, new;
+    bool v;
+    VRingMemoryRegionCaches *caches;
+
+    caches  = vring_get_region_caches(vq);
+    vring_packed_event_read(vdev, &caches->avail, &e);
+
+    old = vq->signalled_used;
+    new = vq->signalled_used = vq->used_idx;
+    v = vq->signalled_used_valid;
+    vq->signalled_used_valid = true;
+
+    if (e.flags == VRING_PACKED_EVENT_FLAG_DISABLE) {
+        return false;
+    } else if (e.flags == VRING_PACKED_EVENT_FLAG_ENABLE) {
+        return true;
+    }
+
+    return !v || vring_packed_need_event(vq,
+        vq->used_wrap_counter, e.off_wrap, new, old);
+}
+
+/* Called within rcu_read_lock().  */
+static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
+{
+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+        return virtio_packed_should_notify(vdev, vq);
+    } else {
+        return virtio_split_should_notify(vdev, vq);
+    }
+}
+
 void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
 {
     bool should_notify;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v4 09/11] virtio-net: update the head descriptor in a chain lastly
  2019-02-14  4:26 [Qemu-devel] [PATCH v4 00/11] packed ring virtio-net backends support wexu
                   ` (7 preceding siblings ...)
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 08/11] virtio: event suppression support " wexu
@ 2019-02-14  4:26 ` wexu
  2019-02-19  7:23   ` Jason Wang
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 10/11] virtio: migration support for packed ring wexu
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 41+ messages in thread
From: wexu @ 2019-02-14  4:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: wexu, jasowang, mst, jfreiman, maxime.coquelin, tiwei.bie

From: Wei Xu <wexu@redhat.com>

This is a helper for packed ring.

To support packed ring, the head descriptor in a chain should be updated
lastly since no 'avail_idx' like in packed ring to explicitly tell the
driver side that all payload is ready after having done the chain, so
the head is always visible immediately.

This patch fills the header after done all the other ones.

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/net/virtio-net.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 3f319ef..330abea 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1251,6 +1251,8 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
     struct virtio_net_hdr_mrg_rxbuf mhdr;
     unsigned mhdr_cnt = 0;
     size_t offset, i, guest_offset;
+    VirtQueueElement head;
+    int head_len = 0;
 
     if (!virtio_net_can_receive(nc)) {
         return -1;
@@ -1328,7 +1330,13 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
         }
 
         /* signal other side */
-        virtqueue_fill(q->rx_vq, elem, total, i++);
+        if (i == 0) {
+            head_len = total;
+            head = *elem;
+        } else {
+            virtqueue_fill(q->rx_vq, elem, len, i);
+        }
+        i++;
         g_free(elem);
     }
 
@@ -1339,6 +1347,7 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
                      &mhdr.num_buffers, sizeof mhdr.num_buffers);
     }
 
+    virtqueue_fill(q->rx_vq, &head, head_len, 0);
     virtqueue_flush(q->rx_vq, i);
     virtio_notify(vdev, q->rx_vq);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v4 10/11] virtio: migration support for packed ring
  2019-02-14  4:26 [Qemu-devel] [PATCH v4 00/11] packed ring virtio-net backends support wexu
                   ` (8 preceding siblings ...)
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 09/11] virtio-net: update the head descriptor in a chain lastly wexu
@ 2019-02-14  4:26 ` wexu
  2019-02-19  7:30   ` Jason Wang
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 11/11] virtio: CLI and provide packed ring feature bit by default wexu
  2019-02-19  7:35 ` [Qemu-devel] [PATCH v4 00/11] packed ring virtio-net backends support Jason Wang
  11 siblings, 1 reply; 41+ messages in thread
From: wexu @ 2019-02-14  4:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: wexu, jasowang, mst, jfreiman, maxime.coquelin, tiwei.bie

From: Wei Xu <wexu@redhat.com>

Both userspace and vhost-net/user are supported with this patch.

A new subsection is introduced for packed ring, only 'last_avail_idx'
and 'last_avail_wrap_counter' are saved/loaded presumably based on
all the others relevant data(inuse, used/avail index and wrap count
should be the same.

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/virtio/virtio.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 66 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 8cfc7b6..7c5de07 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2349,6 +2349,13 @@ static bool virtio_virtqueue_needed(void *opaque)
     return virtio_host_has_feature(vdev, VIRTIO_F_VERSION_1);
 }
 
+static bool virtio_packed_virtqueue_needed(void *opaque)
+{
+    VirtIODevice *vdev = opaque;
+
+    return virtio_host_has_feature(vdev, VIRTIO_F_RING_PACKED);
+}
+
 static bool virtio_ringsize_needed(void *opaque)
 {
     VirtIODevice *vdev = opaque;
@@ -2390,6 +2397,17 @@ static const VMStateDescription vmstate_virtqueue = {
     }
 };
 
+static const VMStateDescription vmstate_packed_virtqueue = {
+    .name = "packed_virtqueue_state",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT16(last_avail_idx, struct VirtQueue),
+        VMSTATE_BOOL(last_avail_wrap_counter, struct VirtQueue),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static const VMStateDescription vmstate_virtio_virtqueues = {
     .name = "virtio/virtqueues",
     .version_id = 1,
@@ -2402,6 +2420,18 @@ static const VMStateDescription vmstate_virtio_virtqueues = {
     }
 };
 
+static const VMStateDescription vmstate_virtio_packed_virtqueues = {
+    .name = "virtio/packed_virtqueues",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = &virtio_packed_virtqueue_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_STRUCT_VARRAY_POINTER_KNOWN(vq, struct VirtIODevice,
+                      VIRTIO_QUEUE_MAX, 0, vmstate_packed_virtqueue, VirtQueue),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static const VMStateDescription vmstate_ringsize = {
     .name = "ringsize_state",
     .version_id = 1,
@@ -2522,6 +2552,7 @@ static const VMStateDescription vmstate_virtio = {
         &vmstate_virtio_ringsize,
         &vmstate_virtio_broken,
         &vmstate_virtio_extra_state,
+        &vmstate_virtio_packed_virtqueues,
         NULL
     }
 };
@@ -2794,6 +2825,17 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
                 virtio_queue_update_rings(vdev, i);
             }
 
+            if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+                vdev->vq[i].shadow_avail_idx = vdev->vq[i].last_avail_idx;
+                vdev->vq[i].avail_wrap_counter =
+                                        vdev->vq[i].last_avail_wrap_counter;
+
+                vdev->vq[i].used_idx = vdev->vq[i].last_avail_idx;
+                vdev->vq[i].used_wrap_counter =
+                                        vdev->vq[i].last_avail_wrap_counter;
+                continue;
+            }
+
             nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx;
             /* Check it isn't doing strange things with descriptor numbers. */
             if (nheads > vdev->vq[i].vring.num) {
@@ -2955,17 +2997,34 @@ hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n)
 
 uint16_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n)
 {
-    return vdev->vq[n].last_avail_idx;
+    uint16_t idx;
+
+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+        idx = vdev->vq[n].last_avail_idx;
+        idx |= ((int)vdev->vq[n].avail_wrap_counter) << 15;
+    } else {
+        idx = (int)vdev->vq[n].last_avail_idx;
+    }
+    return idx;
 }
 
 void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx)
 {
-    vdev->vq[n].last_avail_idx = idx;
-    vdev->vq[n].shadow_avail_idx = idx;
+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+        vdev->vq[n].last_avail_idx = idx & 0x7fff;
+        vdev->vq[n].avail_wrap_counter = !!(idx & 0x8000);
+    } else {
+        vdev->vq[n].last_avail_idx = idx;
+        vdev->vq[n].shadow_avail_idx = idx;
+    }
 }
 
 void virtio_queue_restore_last_avail_idx(VirtIODevice *vdev, int n)
 {
+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+        return;
+    }
+
     rcu_read_lock();
     if (vdev->vq[n].vring.desc) {
         vdev->vq[n].last_avail_idx = vring_used_idx(&vdev->vq[n]);
@@ -2976,6 +3035,10 @@ void virtio_queue_restore_last_avail_idx(VirtIODevice *vdev, int n)
 
 void virtio_queue_update_used_idx(VirtIODevice *vdev, int n)
 {
+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+        return;
+    }
+
     rcu_read_lock();
     if (vdev->vq[n].vring.desc) {
         vdev->vq[n].used_idx = vring_used_idx(&vdev->vq[n]);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v4 11/11] virtio: CLI and provide packed ring feature bit by default
  2019-02-14  4:26 [Qemu-devel] [PATCH v4 00/11] packed ring virtio-net backends support wexu
                   ` (9 preceding siblings ...)
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 10/11] virtio: migration support for packed ring wexu
@ 2019-02-14  4:26 ` wexu
  2019-02-19  7:32   ` Jason Wang
  2019-02-19  7:35 ` [Qemu-devel] [PATCH v4 00/11] packed ring virtio-net backends support Jason Wang
  11 siblings, 1 reply; 41+ messages in thread
From: wexu @ 2019-02-14  4:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: wexu, jasowang, mst, jfreiman, maxime.coquelin, tiwei.bie

From: Wei Xu <wexu@redhat.com>

Add userspace and vhost kernel/user support.

Add CLI "ring_packed=true/false" to enable/disable packed ring provision.
Usage:
    -device virtio-net-pci,netdev=xx,mac=xx:xx:xx:xx:xx:xx,ring_packed=false

By default it is provided.

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/net/vhost_net.c         | 2 ++
 include/hw/virtio/virtio.h | 4 +++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e037db6..f593086 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -53,6 +53,7 @@ static const int kernel_feature_bits[] = {
     VIRTIO_F_VERSION_1,
     VIRTIO_NET_F_MTU,
     VIRTIO_F_IOMMU_PLATFORM,
+    VIRTIO_F_RING_PACKED,
     VHOST_INVALID_FEATURE_BIT
 };
 
@@ -78,6 +79,7 @@ static const int user_feature_bits[] = {
     VIRTIO_NET_F_MRG_RXBUF,
     VIRTIO_NET_F_MTU,
     VIRTIO_F_IOMMU_PLATFORM,
+    VIRTIO_F_RING_PACKED,
 
     /* This bit implies RARP isn't sent by QEMU out of band */
     VIRTIO_NET_F_GUEST_ANNOUNCE,
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 9c1fa07..2eb27d2 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -264,7 +264,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
     DEFINE_PROP_BIT64("any_layout", _state, _field, \
                       VIRTIO_F_ANY_LAYOUT, true), \
     DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
-                      VIRTIO_F_IOMMU_PLATFORM, false)
+                      VIRTIO_F_IOMMU_PLATFORM, false), \
+    DEFINE_PROP_BIT64("ring_packed", _state, _field, \
+                      VIRTIO_F_RING_PACKED, true)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 hwaddr virtio_queue_get_avail_addr(VirtIODevice *vdev, int n);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 06/11] virtio: get avail bytes check for packed ring
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 06/11] virtio: get avail bytes check " wexu
@ 2019-02-18  7:27   ` Jason Wang
  2019-02-18 17:07     ` Wei Xu
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2019-02-18  7:27 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: mst, jfreiman, maxime.coquelin, tiwei.bie


On 2019/2/14 下午12:26, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> Add packed ring headcount check.
>
> Common part of split/packed ring are kept.
>
> Signed-off-by: Wei Xu <wexu@redhat.com>
> ---
>   hw/virtio/virtio.c | 197 ++++++++++++++++++++++++++++++++++++++++++++++++-----
>   1 file changed, 179 insertions(+), 18 deletions(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index f2ff980..832287b 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -368,6 +368,17 @@ int virtio_queue_ready(VirtQueue *vq)
>       return vq->vring.avail != 0;
>   }
>   
> +static void vring_packed_desc_read(VirtIODevice *vdev, VRingPackedDesc *desc,
> +                            MemoryRegionCache *cache, int i)
> +{
> +    address_space_read_cached(cache, i * sizeof(VRingPackedDesc),
> +                              desc, sizeof(VRingPackedDesc));
> +    virtio_tswap16s(vdev, &desc->flags);
> +    virtio_tswap64s(vdev, &desc->addr);
> +    virtio_tswap32s(vdev, &desc->len);
> +    virtio_tswap16s(vdev, &desc->id);
> +}
> +
>   static void vring_packed_desc_read_flags(VirtIODevice *vdev,
>                       VRingPackedDesc *desc, MemoryRegionCache *cache, int i)
>   {
> @@ -667,9 +678,9 @@ static int virtqueue_read_next_desc(VirtIODevice *vdev, VRingDesc *desc,
>       return VIRTQUEUE_READ_DESC_MORE;
>   }
>   
> -void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> -                               unsigned int *out_bytes,
> -                               unsigned max_in_bytes, unsigned max_out_bytes)
> +static void virtqueue_split_get_avail_bytes(VirtQueue *vq,
> +                            unsigned int *in_bytes, unsigned int *out_bytes,
> +                            unsigned max_in_bytes, unsigned max_out_bytes)
>   {
>       VirtIODevice *vdev = vq->vdev;
>       unsigned int max, idx;
> @@ -679,27 +690,12 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
>       int64_t len = 0;
>       int rc;
>   
> -    if (unlikely(!vq->vring.desc)) {
> -        if (in_bytes) {
> -            *in_bytes = 0;
> -        }
> -        if (out_bytes) {
> -            *out_bytes = 0;
> -        }
> -        return;
> -    }
> -
>       rcu_read_lock();
>       idx = vq->last_avail_idx;
>       total_bufs = in_total = out_total = 0;
>   
>       max = vq->vring.num;
>       caches = vring_get_region_caches(vq);
> -    if (caches->desc.len < max * sizeof(VRingDesc)) {
> -        virtio_error(vdev, "Cannot map descriptor ring");
> -        goto err;
> -    }
> -
>       while ((rc = virtqueue_num_heads(vq, idx)) > 0) {
>           MemoryRegionCache *desc_cache = &caches->desc;
>           unsigned int num_bufs;
> @@ -792,6 +788,171 @@ err:
>       goto done;
>   }
>   
> +static void virtqueue_packed_get_avail_bytes(VirtQueue *vq,
> +                            unsigned int *in_bytes, unsigned int *out_bytes,
> +                            unsigned max_in_bytes, unsigned max_out_bytes)
> +{
> +    VirtIODevice *vdev = vq->vdev;
> +    unsigned int max, idx;
> +    unsigned int total_bufs, in_total, out_total;
> +    MemoryRegionCache *desc_cache;
> +    VRingMemoryRegionCaches *caches;
> +    MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
> +    int64_t len = 0;
> +    VRingPackedDesc desc;
> +    bool wrap_counter;
> +
> +    rcu_read_lock();
> +    idx = vq->last_avail_idx;
> +    wrap_counter = vq->last_avail_wrap_counter;
> +    total_bufs = in_total = out_total = 0;
> +
> +    max = vq->vring.num;
> +    caches = vring_get_region_caches(vq);
> +    desc_cache = &caches->desc;
> +    vring_packed_desc_read_flags(vdev, &desc, desc_cache, idx);
> +    while (is_desc_avail(&desc, wrap_counter)) {
> +        unsigned int num_bufs;
> +        unsigned int i = 0;
> +
> +        num_bufs = total_bufs;
> +
> +        /* Make sure flags has been read before all the fields. */
> +        smp_rmb();
> +        vring_packed_desc_read(vdev, &desc, desc_cache, idx);


It's better to have single function to deal with reading flags and 
descriptors and check its availability like packed ring.


> +
> +        if (desc.flags & VRING_DESC_F_INDIRECT) {
> +            if (desc.len % sizeof(VRingPackedDesc)) {
> +                virtio_error(vdev, "Invalid size for indirect buffer table");
> +                goto err;
> +            }
> +
> +            /* If we've got too many, that implies a descriptor loop. */
> +            if (num_bufs >= max) {
> +                virtio_error(vdev, "Looped descriptor");
> +                goto err;
> +            }
> +
> +            /* loop over the indirect descriptor table */
> +            len = address_space_cache_init(&indirect_desc_cache,
> +                                           vdev->dma_as,
> +                                           desc.addr, desc.len, false);
> +            desc_cache = &indirect_desc_cache;
> +            if (len < desc.len) {
> +                virtio_error(vdev, "Cannot map indirect buffer");
> +                goto err;
> +            }
> +
> +            max = desc.len / sizeof(VRingPackedDesc);
> +            num_bufs = i = 0;
> +            vring_packed_desc_read(vdev, &desc, desc_cache, i);
> +        }
> +
> +        do {
> +            /* If we've got too many, that implies a descriptor loop. */
> +            if (++num_bufs > max) {
> +                virtio_error(vdev, "Looped descriptor");
> +                goto err;
> +            }
> +
> +            if (desc.flags & VRING_DESC_F_WRITE) {
> +                in_total += desc.len;
> +            } else {
> +                out_total += desc.len;
> +            }
> +            if (in_total >= max_in_bytes && out_total >= max_out_bytes) {
> +                goto done;
> +            }
> +
> +            if (desc_cache == &indirect_desc_cache) {
> +                if (++i >= max) {
> +                    break;
> +                }
> +                vring_packed_desc_read(vdev, &desc, desc_cache, i);
> +            } else {
> +                if (++idx >= vq->vring.num) {
> +                    idx -= vq->vring.num;
> +                    wrap_counter ^= 1;
> +                }
> +                vring_packed_desc_read(vdev, &desc, desc_cache, idx);
> +            }
> +            /* Make sure flags has been read after all the other fields */
> +            smp_rmb();


I don't think we need this according to the spec:

"

The driver always makes the first descriptor in the list
available after the rest of the list has been written out into
the ring. This guarantees that the device will never observe a
partial scatter/gather list in the ring.

"

So when the first is available, the rest of the chain should be 
available, otherwise device may see partial chain.

Thanks


> +        } while (desc.flags & VRING_DESC_F_NEXT);
> +
> +        if (desc_cache == &indirect_desc_cache) {
> +            address_space_cache_destroy(&indirect_desc_cache);
> +            total_bufs++;
> +            /* We missed one step on for indirect desc */
> +            idx++;
> +            if (++idx >= vq->vring.num) {
> +                idx -= vq->vring.num;
> +                wrap_counter ^= 1;
> +            }
> +        } else {
> +            total_bufs = num_bufs;
> +        }
> +
> +        desc_cache = &caches->desc;
> +        vring_packed_desc_read_flags(vdev, &desc, desc_cache, idx);
> +    }
> +
> +    /* Record the index and wrap counter for a kick we want */
> +    vq->shadow_avail_idx = idx;
> +    vq->avail_wrap_counter = wrap_counter;
> +done:
> +    address_space_cache_destroy(&indirect_desc_cache);
> +    if (in_bytes) {
> +        *in_bytes = in_total;
> +    }
> +    if (out_bytes) {
> +        *out_bytes = out_total;
> +    }
> +    rcu_read_unlock();
> +    return;
> +
> +err:
> +    in_total = out_total = 0;
> +    goto done;
> +}
> +
> +void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> +                               unsigned int *out_bytes,
> +                               unsigned max_in_bytes, unsigned max_out_bytes)
> +{
> +    uint16_t desc_size;
> +    VRingMemoryRegionCaches *caches;
> +
> +    if (unlikely(!vq->vring.desc)) {
> +        goto err;
> +    }
> +
> +    caches = vring_get_region_caches(vq);
> +    desc_size = virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED) ?
> +                                sizeof(VRingPackedDesc) : sizeof(VRingDesc);
> +    if (caches->desc.len < vq->vring.num * desc_size) {
> +        virtio_error(vq->vdev, "Cannot map descriptor ring");
> +        goto err;
> +    }
> +
> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> +        virtqueue_packed_get_avail_bytes(vq, in_bytes, out_bytes,
> +                                         max_in_bytes, max_out_bytes);
> +    } else {
> +        virtqueue_split_get_avail_bytes(vq, in_bytes, out_bytes,
> +                                        max_in_bytes, max_out_bytes);
> +    }
> +
> +    return;
> +err:
> +    if (in_bytes) {
> +        *in_bytes = 0;
> +    }
> +    if (out_bytes) {
> +        *out_bytes = 0;
> +    }
> +}
> +
>   int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
>                             unsigned int out_bytes)
>   {

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 07/11] virtio: fill/flush/pop for packed ring
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 07/11] virtio: fill/flush/pop " wexu
@ 2019-02-18  7:51   ` Jason Wang
  2019-02-18 14:46     ` Wei Xu
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2019-02-18  7:51 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: mst, jfreiman, maxime.coquelin, tiwei.bie


On 2019/2/14 下午12:26, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> last_used_idx/wrap_counter should be equal to last_avail_idx/wrap_counter
> after a successful flush.
>
> Batching in vhost-net & dpdk testpmd is not equivalently supported in
> userspace backend, but a chained descriptors for Rx is similarly presented
> as a lightweight batch, so a write barrier is nailed only for the
> first(head) descriptor.
>
> Signed-off-by: Wei Xu <wexu@redhat.com>
> ---
>   hw/virtio/virtio.c | 291 +++++++++++++++++++++++++++++++++++++++++++++++++----
>   1 file changed, 274 insertions(+), 17 deletions(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 832287b..7e276b4 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -379,6 +379,25 @@ static void vring_packed_desc_read(VirtIODevice *vdev, VRingPackedDesc *desc,
>       virtio_tswap16s(vdev, &desc->id);
>   }
>   
> +static void vring_packed_desc_write_data(VirtIODevice *vdev,
> +                    VRingPackedDesc *desc, MemoryRegionCache *cache, int i)
> +{
> +    virtio_tswap32s(vdev, &desc->len);
> +    virtio_tswap16s(vdev, &desc->id);
> +    address_space_write_cached(cache,
> +              i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, id),
> +              &desc->id, sizeof(desc->id));
> +    address_space_cache_invalidate(cache,
> +              i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, id),
> +              sizeof(desc->id));
> +    address_space_write_cached(cache,
> +              i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, len),
> +              &desc->len, sizeof(desc->len));
> +    address_space_cache_invalidate(cache,
> +              i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, len),
> +              sizeof(desc->len));
> +}
> +
>   static void vring_packed_desc_read_flags(VirtIODevice *vdev,
>                       VRingPackedDesc *desc, MemoryRegionCache *cache, int i)
>   {
> @@ -388,6 +407,18 @@ static void vring_packed_desc_read_flags(VirtIODevice *vdev,
>       virtio_tswap16s(vdev, &desc->flags);
>   }
>   
> +static void vring_packed_desc_write_flags(VirtIODevice *vdev,
> +                    VRingPackedDesc *desc, MemoryRegionCache *cache, int i)
> +{
> +    virtio_tswap16s(vdev, &desc->flags);
> +    address_space_write_cached(cache,
> +              i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, flags),
> +              &desc->flags, sizeof(desc->flags));
> +    address_space_cache_invalidate(cache,
> +              i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, flags),
> +              sizeof(desc->flags));
> +}
> +
>   static inline bool is_desc_avail(struct VRingPackedDesc *desc,
>                                   bool wrap_counter)
>   {
> @@ -554,19 +585,11 @@ bool virtqueue_rewind(VirtQueue *vq, unsigned int num)
>   }
>   
>   /* Called within rcu_read_lock().  */
> -void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
> +static void virtqueue_split_fill(VirtQueue *vq, const VirtQueueElement *elem,
>                       unsigned int len, unsigned int idx)
>   {
>       VRingUsedElem uelem;
>   
> -    trace_virtqueue_fill(vq, elem, len, idx);
> -
> -    virtqueue_unmap_sg(vq, elem, len);
> -
> -    if (unlikely(vq->vdev->broken)) {
> -        return;
> -    }
> -
>       if (unlikely(!vq->vring.used)) {
>           return;
>       }
> @@ -578,16 +601,71 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
>       vring_used_write(vq, &uelem, idx);
>   }
>   
> -/* Called within rcu_read_lock().  */
> -void virtqueue_flush(VirtQueue *vq, unsigned int count)
> +static void virtqueue_packed_fill(VirtQueue *vq, const VirtQueueElement *elem,
> +                        unsigned int len, unsigned int idx)
>   {
> -    uint16_t old, new;
> +    uint16_t head;
> +    VRingMemoryRegionCaches *caches;
> +    VRingPackedDesc desc = {
> +        .flags = 0,
> +        .id = elem->index,
> +        .len = len,
> +    };
> +    bool wrap_counter = vq->used_wrap_counter;
> +
> +    if (unlikely(!vq->vring.desc)) {
> +        return;
> +    }
> +
> +    head = vq->used_idx + idx;
> +    if (head >= vq->vring.num) {
> +        head -= vq->vring.num;
> +        wrap_counter ^= 1;
> +    }
> +    if (wrap_counter) {
> +        desc.flags |= (1 << VRING_PACKED_DESC_F_AVAIL);
> +        desc.flags |= (1 << VRING_PACKED_DESC_F_USED);
> +    } else {
> +        desc.flags &= ~(1 << VRING_PACKED_DESC_F_AVAIL);
> +        desc.flags &= ~(1 << VRING_PACKED_DESC_F_USED);
> +    }
> +
> +    caches = vring_get_region_caches(vq);
> +    vring_packed_desc_write_data(vq->vdev, &desc, &caches->desc, head);
> +    if (idx == 0) {
> +        /*
> +         * Make sure descriptor id and len is written before
> +         * flags for the first used buffer.
> +         */
> +        smp_wmb();
> +    }


I suspect you need to do this unconditionally since this function 
doesn't do any batched writing to used ring. What it did is to write 
used descriptors at idx. So there's no reason to supress wmb for the idx 
!= 0. See its usage in virtio-net.c


> +
> +    vring_packed_desc_write_flags(vq->vdev, &desc, &caches->desc, head);
> +}
> +
> +void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
> +                    unsigned int len, unsigned int idx)
> +{
> +    trace_virtqueue_fill(vq, elem, len, idx);
> +
> +    virtqueue_unmap_sg(vq, elem, len);
>   
>       if (unlikely(vq->vdev->broken)) {
> -        vq->inuse -= count;
>           return;
>       }
>   
> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> +        virtqueue_packed_fill(vq, elem, len, idx);
> +    } else {
> +        virtqueue_split_fill(vq, elem, len, idx);
> +    }
> +}
> +
> +/* Called within rcu_read_lock().  */
> +static void virtqueue_split_flush(VirtQueue *vq, unsigned int count)
> +{
> +    uint16_t old, new;
> +
>       if (unlikely(!vq->vring.used)) {
>           return;
>       }
> @@ -603,6 +681,31 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
>           vq->signalled_used_valid = false;
>   }
>   
> +static void virtqueue_packed_flush(VirtQueue *vq, unsigned int count)
> +{
> +    if (unlikely(!vq->vring.desc)) {
> +        return;
> +    }
> +
> +    vq->inuse -= count;
> +    vq->used_idx = vq->last_avail_idx;
> +    vq->used_wrap_counter = vq->last_avail_wrap_counter;
> +}
> +
> +void virtqueue_flush(VirtQueue *vq, unsigned int count)
> +{
> +    if (unlikely(vq->vdev->broken)) {
> +        vq->inuse -= count;
> +        return;
> +    }
> +
> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> +        virtqueue_packed_flush(vq, count);
> +    } else {
> +        virtqueue_split_flush(vq, count);
> +    }
> +}
> +
>   void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
>                       unsigned int len)
>   {
> @@ -1077,7 +1180,7 @@ static void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_nu
>       return elem;
>   }
>   
> -void *virtqueue_pop(VirtQueue *vq, size_t sz)
> +static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
>   {
>       unsigned int i, head, max;
>       VRingMemoryRegionCaches *caches;
> @@ -1092,9 +1195,6 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
>       VRingDesc desc;
>       int rc;
>   
> -    if (unlikely(vdev->broken)) {
> -        return NULL;
> -    }
>       rcu_read_lock();
>       if (virtio_queue_empty_rcu(vq)) {
>           goto done;
> @@ -1212,6 +1312,163 @@ err_undo_map:
>       goto done;
>   }
>   
> +static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
> +{
> +    unsigned int i, head, max;
> +    VRingMemoryRegionCaches *caches;
> +    MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
> +    MemoryRegionCache *cache;
> +    int64_t len;
> +    VirtIODevice *vdev = vq->vdev;
> +    VirtQueueElement *elem = NULL;
> +    unsigned out_num, in_num, elem_entries;
> +    hwaddr addr[VIRTQUEUE_MAX_SIZE];
> +    struct iovec iov[VIRTQUEUE_MAX_SIZE];
> +    VRingPackedDesc desc;
> +    uint16_t id;
> +
> +    rcu_read_lock();
> +    if (virtio_queue_packed_empty_rcu(vq)) {
> +        goto done;
> +    }
> +
> +    /* When we start there are none of either input nor output. */
> +    out_num = in_num = elem_entries = 0;
> +
> +    max = vq->vring.num;
> +
> +    if (vq->inuse >= vq->vring.num) {
> +        virtio_error(vdev, "Virtqueue size exceeded");
> +        goto done;
> +    }
> +
> +    head = vq->last_avail_idx;
> +    i = head;
> +
> +    caches = vring_get_region_caches(vq);
> +    cache = &caches->desc;
> +
> +    /* make sure flags is read before all the other fields */
> +    smp_rmb();
> +    vring_packed_desc_read(vdev, &desc, cache, i);
> +
> +    id = desc.id;
> +    if (desc.flags & VRING_DESC_F_INDIRECT) {
> +
> +        if (desc.len % sizeof(VRingPackedDesc)) {
> +            virtio_error(vdev, "Invalid size for indirect buffer table");
> +            goto done;
> +        }
> +
> +        /* loop over the indirect descriptor table */
> +        len = address_space_cache_init(&indirect_desc_cache, vdev->dma_as,
> +                                       desc.addr, desc.len, false);
> +        cache = &indirect_desc_cache;
> +        if (len < desc.len) {
> +            virtio_error(vdev, "Cannot map indirect buffer");
> +            goto done;
> +        }
> +
> +        max = desc.len / sizeof(VRingPackedDesc);
> +        i = 0;
> +        vring_packed_desc_read(vdev, &desc, cache, i);
> +        /* Make sure we see all the fields*/
> +        smp_rmb();


This looks unnecessary, all indirect descriptor should be available now.


> +    }
> +
> +    /* Collect all the descriptors */
> +    while (1) {
> +        bool map_ok;
> +
> +        if (desc.flags & VRING_DESC_F_WRITE) {
> +            map_ok = virtqueue_map_desc(vdev, &in_num, addr + out_num,
> +                                        iov + out_num,
> +                                        VIRTQUEUE_MAX_SIZE - out_num, true,
> +                                        desc.addr, desc.len);
> +        } else {
> +            if (in_num) {
> +                virtio_error(vdev, "Incorrect order for descriptors");
> +                goto err_undo_map;
> +            }
> +            map_ok = virtqueue_map_desc(vdev, &out_num, addr, iov,
> +                                        VIRTQUEUE_MAX_SIZE, false,
> +                                        desc.addr, desc.len);
> +        }
> +        if (!map_ok) {
> +            goto err_undo_map;
> +        }
> +
> +        /* If we've got too many, that implies a descriptor loop. */
> +        if (++elem_entries > max) {
> +            virtio_error(vdev, "Looped descriptor");
> +            goto err_undo_map;
> +        }
> +
> +        if (++i >= vq->vring.num) {
> +            i -= vq->vring.num;


Do we allow chain more descriptors than vq size in the case of indirect? 
According to the spec:

"

The device limits the number of descriptors in a list through a
transport-specific and/or device-specific value. If not limited,
the maximum number of descriptors in a list is the virt queue
size.
"

This looks possible, so the above is probably wrong if the max number of 
chained descriptors is negotiated through a device specific way.


> +        }
> +
> +        if (cache == &indirect_desc_cache) {
> +            if (i == max) {
> +                break;
> +            }
> +            vring_packed_desc_read(vq->vdev, &desc, cache, i);
> +        } else if (desc.flags & VRING_DESC_F_NEXT) {
> +            vring_packed_desc_read(vq->vdev, &desc, cache, i);
> +        } else {
> +            break;
> +        }
> +    }
> +
> +    /* Now copy what we have collected and mapped */
> +    elem = virtqueue_alloc_element(sz, out_num, in_num);
> +    elem->index = id;
> +    for (i = 0; i < out_num; i++) {
> +        elem->out_addr[i] = addr[i];
> +        elem->out_sg[i] = iov[i];
> +    }
> +    for (i = 0; i < in_num; i++) {
> +        elem->in_addr[i] = addr[head + out_num + i];
> +        elem->in_sg[i] = iov[out_num + i];
> +    }
> +
> +    vq->last_avail_idx += (cache == &indirect_desc_cache) ?
> +                          1 : elem_entries;
> +    if (vq->last_avail_idx >= vq->vring.num) {
> +        vq->last_avail_idx -= vq->vring.num;
> +        vq->last_avail_wrap_counter ^= 1;
> +    }
> +    vq->inuse++;
> +
> +    vq->shadow_avail_idx = vq->last_avail_idx;
> +    vq->avail_wrap_counter = vq->last_avail_wrap_counter;


It's better to name this to "shadow_avail_wrap_counter".

Thanks


> +
> +    trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
> +done:
> +    address_space_cache_destroy(&indirect_desc_cache);
> +    rcu_read_unlock();
> +
> +    return elem;
> +
> +err_undo_map:
> +    virtqueue_undo_map_desc(out_num, in_num, iov);
> +    g_free(elem);
> +    goto done;
> +}
> +
> +void *virtqueue_pop(VirtQueue *vq, size_t sz)
> +{
> +    if (unlikely(vq->vdev->broken)) {
> +        return NULL;
> +    }
> +
> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> +        return virtqueue_packed_pop(vq, sz);
> +    } else {
> +        return virtqueue_split_pop(vq, sz);
> +    }
> +}
> +
>   /* virtqueue_drop_all:
>    * @vq: The #VirtQueue
>    * Drops all queued buffers and indicates them to the guest

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 07/11] virtio: fill/flush/pop for packed ring
  2019-02-18  7:51   ` Jason Wang
@ 2019-02-18 14:46     ` Wei Xu
  2019-02-19  6:49       ` Jason Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Wei Xu @ 2019-02-18 14:46 UTC (permalink / raw)
  To: Jason Wang; +Cc: qemu-devel, mst, jfreiman, maxime.coquelin, tiwei.bie

On Mon, Feb 18, 2019 at 03:51:05PM +0800, Jason Wang wrote:
> 
> On 2019/2/14 下午12:26, wexu@redhat.com wrote:
> >From: Wei Xu <wexu@redhat.com>
> >
> >last_used_idx/wrap_counter should be equal to last_avail_idx/wrap_counter
> >after a successful flush.
> >
> >Batching in vhost-net & dpdk testpmd is not equivalently supported in
> >userspace backend, but a chained descriptors for Rx is similarly presented
> >as a lightweight batch, so a write barrier is nailed only for the
> >first(head) descriptor.
> >
> >Signed-off-by: Wei Xu <wexu@redhat.com>
> >---
> >  hw/virtio/virtio.c | 291 +++++++++++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 274 insertions(+), 17 deletions(-)
> >
> >diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >index 832287b..7e276b4 100644
> >--- a/hw/virtio/virtio.c
> >+++ b/hw/virtio/virtio.c
> >@@ -379,6 +379,25 @@ static void vring_packed_desc_read(VirtIODevice *vdev, VRingPackedDesc *desc,
> >      virtio_tswap16s(vdev, &desc->id);
> >  }
> >+static void vring_packed_desc_write_data(VirtIODevice *vdev,
> >+                    VRingPackedDesc *desc, MemoryRegionCache *cache, int i)
> >+{
> >+    virtio_tswap32s(vdev, &desc->len);
> >+    virtio_tswap16s(vdev, &desc->id);
> >+    address_space_write_cached(cache,
> >+              i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, id),
> >+              &desc->id, sizeof(desc->id));
> >+    address_space_cache_invalidate(cache,
> >+              i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, id),
> >+              sizeof(desc->id));
> >+    address_space_write_cached(cache,
> >+              i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, len),
> >+              &desc->len, sizeof(desc->len));
> >+    address_space_cache_invalidate(cache,
> >+              i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, len),
> >+              sizeof(desc->len));
> >+}
> >+
> >  static void vring_packed_desc_read_flags(VirtIODevice *vdev,
> >                      VRingPackedDesc *desc, MemoryRegionCache *cache, int i)
> >  {
> >@@ -388,6 +407,18 @@ static void vring_packed_desc_read_flags(VirtIODevice *vdev,
> >      virtio_tswap16s(vdev, &desc->flags);
> >  }
> >+static void vring_packed_desc_write_flags(VirtIODevice *vdev,
> >+                    VRingPackedDesc *desc, MemoryRegionCache *cache, int i)
> >+{
> >+    virtio_tswap16s(vdev, &desc->flags);
> >+    address_space_write_cached(cache,
> >+              i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, flags),
> >+              &desc->flags, sizeof(desc->flags));
> >+    address_space_cache_invalidate(cache,
> >+              i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, flags),
> >+              sizeof(desc->flags));
> >+}
> >+
> >  static inline bool is_desc_avail(struct VRingPackedDesc *desc,
> >                                  bool wrap_counter)
> >  {
> >@@ -554,19 +585,11 @@ bool virtqueue_rewind(VirtQueue *vq, unsigned int num)
> >  }
> >  /* Called within rcu_read_lock().  */
> >-void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
> >+static void virtqueue_split_fill(VirtQueue *vq, const VirtQueueElement *elem,
> >                      unsigned int len, unsigned int idx)
> >  {
> >      VRingUsedElem uelem;
> >-    trace_virtqueue_fill(vq, elem, len, idx);
> >-
> >-    virtqueue_unmap_sg(vq, elem, len);
> >-
> >-    if (unlikely(vq->vdev->broken)) {
> >-        return;
> >-    }
> >-
> >      if (unlikely(!vq->vring.used)) {
> >          return;
> >      }
> >@@ -578,16 +601,71 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
> >      vring_used_write(vq, &uelem, idx);
> >  }
> >-/* Called within rcu_read_lock().  */
> >-void virtqueue_flush(VirtQueue *vq, unsigned int count)
> >+static void virtqueue_packed_fill(VirtQueue *vq, const VirtQueueElement *elem,
> >+                        unsigned int len, unsigned int idx)
> >  {
> >-    uint16_t old, new;
> >+    uint16_t head;
> >+    VRingMemoryRegionCaches *caches;
> >+    VRingPackedDesc desc = {
> >+        .flags = 0,
> >+        .id = elem->index,
> >+        .len = len,
> >+    };
> >+    bool wrap_counter = vq->used_wrap_counter;
> >+
> >+    if (unlikely(!vq->vring.desc)) {
> >+        return;
> >+    }
> >+
> >+    head = vq->used_idx + idx;
> >+    if (head >= vq->vring.num) {
> >+        head -= vq->vring.num;
> >+        wrap_counter ^= 1;
> >+    }
> >+    if (wrap_counter) {
> >+        desc.flags |= (1 << VRING_PACKED_DESC_F_AVAIL);
> >+        desc.flags |= (1 << VRING_PACKED_DESC_F_USED);
> >+    } else {
> >+        desc.flags &= ~(1 << VRING_PACKED_DESC_F_AVAIL);
> >+        desc.flags &= ~(1 << VRING_PACKED_DESC_F_USED);
> >+    }
> >+
> >+    caches = vring_get_region_caches(vq);
> >+    vring_packed_desc_write_data(vq->vdev, &desc, &caches->desc, head);
> >+    if (idx == 0) {
> >+        /*
> >+         * Make sure descriptor id and len is written before
> >+         * flags for the first used buffer.
> >+         */
> >+        smp_wmb();
> >+    }
> 
> 
> I suspect you need to do this unconditionally since this function doesn't do
> any batched writing to used ring. What it did is to write used descriptors
> at idx. So there's no reason to supress wmb for the idx != 0. See its usage
> in virtio-net.c

I see, this is unnecessary, will remove it.

> 
> 
> >+
> >+    vring_packed_desc_write_flags(vq->vdev, &desc, &caches->desc, head);
> >+}
> >+
> >+void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
> >+                    unsigned int len, unsigned int idx)
> >+{
> >+    trace_virtqueue_fill(vq, elem, len, idx);
> >+
> >+    virtqueue_unmap_sg(vq, elem, len);
> >      if (unlikely(vq->vdev->broken)) {
> >-        vq->inuse -= count;
> >          return;
> >      }
> >+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> >+        virtqueue_packed_fill(vq, elem, len, idx);
> >+    } else {
> >+        virtqueue_split_fill(vq, elem, len, idx);
> >+    }
> >+}
> >+
> >+/* Called within rcu_read_lock().  */
> >+static void virtqueue_split_flush(VirtQueue *vq, unsigned int count)
> >+{
> >+    uint16_t old, new;
> >+
> >      if (unlikely(!vq->vring.used)) {
> >          return;
> >      }
> >@@ -603,6 +681,31 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
> >          vq->signalled_used_valid = false;
> >  }
> >+static void virtqueue_packed_flush(VirtQueue *vq, unsigned int count)
> >+{
> >+    if (unlikely(!vq->vring.desc)) {
> >+        return;
> >+    }
> >+
> >+    vq->inuse -= count;
> >+    vq->used_idx = vq->last_avail_idx;
> >+    vq->used_wrap_counter = vq->last_avail_wrap_counter;
> >+}
> >+
> >+void virtqueue_flush(VirtQueue *vq, unsigned int count)
> >+{
> >+    if (unlikely(vq->vdev->broken)) {
> >+        vq->inuse -= count;
> >+        return;
> >+    }
> >+
> >+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> >+        virtqueue_packed_flush(vq, count);
> >+    } else {
> >+        virtqueue_split_flush(vq, count);
> >+    }
> >+}
> >+
> >  void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
> >                      unsigned int len)
> >  {
> >@@ -1077,7 +1180,7 @@ static void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_nu
> >      return elem;
> >  }
> >-void *virtqueue_pop(VirtQueue *vq, size_t sz)
> >+static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
> >  {
> >      unsigned int i, head, max;
> >      VRingMemoryRegionCaches *caches;
> >@@ -1092,9 +1195,6 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
> >      VRingDesc desc;
> >      int rc;
> >-    if (unlikely(vdev->broken)) {
> >-        return NULL;
> >-    }
> >      rcu_read_lock();
> >      if (virtio_queue_empty_rcu(vq)) {
> >          goto done;
> >@@ -1212,6 +1312,163 @@ err_undo_map:
> >      goto done;
> >  }
> >+static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
> >+{
> >+    unsigned int i, head, max;
> >+    VRingMemoryRegionCaches *caches;
> >+    MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
> >+    MemoryRegionCache *cache;
> >+    int64_t len;
> >+    VirtIODevice *vdev = vq->vdev;
> >+    VirtQueueElement *elem = NULL;
> >+    unsigned out_num, in_num, elem_entries;
> >+    hwaddr addr[VIRTQUEUE_MAX_SIZE];
> >+    struct iovec iov[VIRTQUEUE_MAX_SIZE];
> >+    VRingPackedDesc desc;
> >+    uint16_t id;
> >+
> >+    rcu_read_lock();
> >+    if (virtio_queue_packed_empty_rcu(vq)) {
> >+        goto done;
> >+    }
> >+
> >+    /* When we start there are none of either input nor output. */
> >+    out_num = in_num = elem_entries = 0;
> >+
> >+    max = vq->vring.num;
> >+
> >+    if (vq->inuse >= vq->vring.num) {
> >+        virtio_error(vdev, "Virtqueue size exceeded");
> >+        goto done;
> >+    }
> >+
> >+    head = vq->last_avail_idx;
> >+    i = head;
> >+
> >+    caches = vring_get_region_caches(vq);
> >+    cache = &caches->desc;
> >+
> >+    /* make sure flags is read before all the other fields */
> >+    smp_rmb();
> >+    vring_packed_desc_read(vdev, &desc, cache, i);
> >+
> >+    id = desc.id;
> >+    if (desc.flags & VRING_DESC_F_INDIRECT) {
> >+
> >+        if (desc.len % sizeof(VRingPackedDesc)) {
> >+            virtio_error(vdev, "Invalid size for indirect buffer table");
> >+            goto done;
> >+        }
> >+
> >+        /* loop over the indirect descriptor table */
> >+        len = address_space_cache_init(&indirect_desc_cache, vdev->dma_as,
> >+                                       desc.addr, desc.len, false);
> >+        cache = &indirect_desc_cache;
> >+        if (len < desc.len) {
> >+          gi  virtio_error(vdev, "Cannot map indirect buffer");
> >+            goto done;
> >+        }
> >+
> >+        max = desc.len / sizeof(VRingPackedDesc);
> >+        i = 0;
> >+        vring_packed_desc_read(vdev, &desc, cache, i);
> >+        /* Make sure we see all the fields*/
> >+        smp_rmb();
> 
> 
> This looks unnecessary, all indirect descriptor should be available now.

Right as the last one.

> 
> 
> >+    }
> >+
> >+    /* Collect all the descriptors */
> >+    while (1) {
> >+        bool map_ok;
> >+
> >+        if (desc.flags & VRING_DESC_F_WRITE) {
> >+            map_ok = virtqueue_map_desc(vdev, &in_num, addr + out_num,
> >+                                        iov + out_num,
> >+                                        VIRTQUEUE_MAX_SIZE - out_num, true,
> >+                                        desc.addr, desc.len);
> >+        } else {
> >+            if (in_num) {
> >+                virtio_error(vdev, "Incorrect order for descriptors");
> >+                goto err_undo_map;
> >+            }
> >+            map_ok = virtqueue_map_desc(vdev, &out_num, addr, iov,
> >+                                        VIRTQUEUE_MAX_SIZE, false,
> >+                                        desc.addr, desc.len);
> >+        }
> >+        if (!map_ok) {
> >+            goto err_undo_map;
> >+        }
> >+
> >+        /* If we've got too many, that implies a descriptor loop. */
> >+        if (++elem_entries > max) {
> >+            virtio_error(vdev, "Looped descriptor");
> >+            goto err_undo_map;
> >+        }
> >+
> >+        if (++i >= vq->vring.num) {
> >+            i -= vq->vring.num;
> 
> 
> Do we allow chain more descriptors than vq size in the case of indirect?
> According to the spec:
> 
> "
> 
> The device limits the number of descriptors in a list through a
> transport-specific and/or device-specific value. If not limited,
> the maximum number of descriptors in a list is the virt queue
> size.
> "
> 
> This looks possible, so the above is probably wrong if the max number of
> chained descriptors is negotiated through a device specific way.
> 

OK, I will remove this check, while it is necessary to have a limitation
for indirect descriptor anyway, otherwise it is possible to hit an overflow
since presumably u16 is used for most number/size in the spec. 

> 
> >+        }
> >+
> >+        if (cache == &indirect_desc_cache) {
> >+            if (i == max) {
> >+                break;
> >+            }
> >+            vring_packed_desc_read(vq->vdev, &desc, cache, i);
> >+        } else if (desc.flags & VRING_DESC_F_NEXT) {
> >+            vring_packed_desc_read(vq->vdev, &desc, cache, i);
> >+        } else {
> >+            break;
> >+        }
> >+    }
> >+
> >+    /* Now copy what we have collected and mapped */
> >+    elem = virtqueue_alloc_element(sz, out_num, in_num);
> >+    elem->index = id;
> >+    for (i = 0; i < out_num; i++) {
> >+        elem->out_addr[i] = addr[i];
> >+        elem->out_sg[i] = iov[i];
> >+    }
> >+    for (i = 0; i < in_num; i++) {
> >+        elem->in_addr[i] = addr[head + out_num + i];
> >+        elem->in_sg[i] = iov[out_num + i];
> >+    }
> >+
> >+    vq->last_avail_idx += (cache == &indirect_desc_cache) ?
> >+                          1 : elem_entries;
> >+    if (vq->last_avail_idx >= vq->vring.num) {
> >+        vq->last_avail_idx -= vq->vring.num;
> >+        vq->last_avail_wrap_counter ^= 1;
> >+    }
> >+    vq->inuse++;
> >+
> >+    vq->shadow_avail_idx = vq->last_avail_idx;
> >+    vq->avail_wrap_counter = vq->last_avail_wrap_counter;
> 
> 
> It's better to name this to "shadow_avail_wrap_counter".

OK, thanks for reviewing.

Wei
> 
> Thanks
> 
> 
> >+
> >+    trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
> >+done:
> >+    address_space_cache_destroy(&indirect_desc_cache);
> >+    rcu_read_unlock();
> >+
> >+    return elem;
> >+
> >+err_undo_map:
> >+    virtqueue_undo_map_desc(out_num, in_num, iov);
> >+    g_free(elem);
> >+    goto done;
> >+}
> >+
> >+void *virtqueue_pop(VirtQueue *vq, size_t sz)
> >+{
> >+    if (unlikely(vq->vdev->broken)) {
> >+        return NULL;
> >+    }
> >+
> >+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> >+        return virtqueue_packed_pop(vq, sz);
> >+    } else {
> >+        return virtqueue_split_pop(vq, sz);
> >+    }
> >+}
> >+
> >  /* virtqueue_drop_all:
> >   * @vq: The #VirtQueue
> >   * Drops all queued buffers and indicates them to the guest

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 06/11] virtio: get avail bytes check for packed ring
  2019-02-18  7:27   ` Jason Wang
@ 2019-02-18 17:07     ` Wei Xu
  2019-02-19  6:24       ` Jason Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Wei Xu @ 2019-02-18 17:07 UTC (permalink / raw)
  To: Jason Wang; +Cc: qemu-devel, maxime.coquelin, tiwei.bie, jfreiman, mst

On Mon, Feb 18, 2019 at 03:27:21PM +0800, Jason Wang wrote:
> 
> On 2019/2/14 下午12:26, wexu@redhat.com wrote:
> >From: Wei Xu <wexu@redhat.com>
> >
> >Add packed ring headcount check.
> >
> >Common part of split/packed ring are kept.
> >
> >Signed-off-by: Wei Xu <wexu@redhat.com>
> >---
> >  hw/virtio/virtio.c | 197 ++++++++++++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 179 insertions(+), 18 deletions(-)
> >
> >diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >index f2ff980..832287b 100644
> >--- a/hw/virtio/virtio.c
> >+++ b/hw/virtio/virtio.c
> >@@ -368,6 +368,17 @@ int virtio_queue_ready(VirtQueue *vq)
> >      return vq->vring.avail != 0;
> >  }
> >+static void vring_packed_desc_read(VirtIODevice *vdev, VRingPackedDesc *desc,
> >+                            MemoryRegionCache *cache, int i)
> >+{
> >+    address_space_read_cached(cache, i * sizeof(VRingPackedDesc),
> >+                              desc, sizeof(VRingPackedDesc));
> >+    virtio_tswap16s(vdev, &desc->flags);
> >+    virtio_tswap64s(vdev, &desc->addr);
> >+    virtio_tswap32s(vdev, &desc->len);
> >+    virtio_tswap16s(vdev, &desc->id);
> >+}
> >+
> >  static void vring_packed_desc_read_flags(VirtIODevice *vdev,
> >                      VRingPackedDesc *desc, MemoryRegionCache *cache, int i)
> >  {
> >@@ -667,9 +678,9 @@ static int virtqueue_read_next_desc(VirtIODevice *vdev, VRingDesc *desc,
> >      return VIRTQUEUE_READ_DESC_MORE;
> >  }
> >-void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> >-                               unsigned int *out_bytes,
> >-                               unsigned max_in_bytes, unsigned max_out_bytes)
> >+static void virtqueue_split_get_avail_bytes(VirtQueue *vq,
> >+                            unsigned int *in_bytes, unsigned int *out_bytes,
> >+                            unsigned max_in_bytes, unsigned max_out_bytes)
> >  {
> >      VirtIODevice *vdev = vq->vdev;
> >      unsigned int max, idx;
> >@@ -679,27 +690,12 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> >      int64_t len = 0;
> >      int rc;
> >-    if (unlikely(!vq->vring.desc)) {
> >-        if (in_bytes) {
> >-            *in_bytes = 0;
> >-        }
> >-        if (out_bytes) {
> >-            *out_bytes = 0;
> >-        }
> >-        return;
> >-    }
> >-
> >      rcu_read_lock();
> >      idx = vq->last_avail_idx;
> >      total_bufs = in_total = out_total = 0;
> >      max = vq->vring.num;
> >      caches = vring_get_region_caches(vq);
> >-    if (caches->desc.len < max * sizeof(VRingDesc)) {
> >-        virtio_error(vdev, "Cannot map descriptor ring");
> >-        goto err;
> >-    }
> >-
> >      while ((rc = virtqueue_num_heads(vq, idx)) > 0) {
> >          MemoryRegionCache *desc_cache = &caches->desc;
> >          unsigned int num_bufs;
> >@@ -792,6 +788,171 @@ err:
> >      goto done;
> >  }
> >+static void virtqueue_packed_get_avail_bytes(VirtQueue *vq,
> >+                            unsigned int *in_bytes, unsigned int *out_bytes,
> >+                            unsigned max_in_bytes, unsigned max_out_bytes)
> >+{
> >+    VirtIODevice *vdev = vq->vdev;
> >+    unsigned int max, idx;
> >+    unsigned int total_bufs, in_total, out_total;
> >+    MemoryRegionCache *desc_cache;
> >+    VRingMemoryRegionCaches *caches;
> >+    MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
> >+    int64_t len = 0;
> >+    VRingPackedDesc desc;
> >+    bool wrap_counter;
> >+
> >+    rcu_read_lock();
> >+    idx = vq->last_avail_idx;
> >+    wrap_counter = vq->last_avail_wrap_counter;
> >+    total_bufs = in_total = out_total = 0;
> >+
> >+    max = vq->vring.num;
> >+    caches = vring_get_region_caches(vq);
> >+    desc_cache = &caches->desc;
> >+    vring_packed_desc_read_flags(vdev, &desc, desc_cache, idx);
> >+    while (is_desc_avail(&desc, wrap_counter)) {
> >+        unsigned int num_bufs;
> >+        unsigned int i = 0;
> >+
> >+        num_bufs = total_bufs;
> >+
> >+        /* Make sure flags has been read before all the fields. */
> >+        smp_rmb();
> >+        vring_packed_desc_read(vdev, &desc, desc_cache, idx);
> 
> 
> It's better to have single function to deal with reading flags and
> descriptors and check its availability like packed ring.

There is something different between split and packed ring here.
For split ring, 'avail_idx' and descriptor are separately used so the
interfaces of them are straightforward, while the flag and data fields
of the descriptors for packed ring are mixed and independent accesses to
them have been brought in, it is good to use them as what they are supposed
to work. :)

Another neat way is to pack the two operations to a new one, but it would
introduce memory cache parameter passing.

So personally I prefer to keep it unchanged, still want to sort it out?

> 
> 
> >+
> >+        if (desc.flags & VRING_DESC_F_INDIRECT) {
> >+            if (desc.len % sizeof(VRingPackedDesc)) {
> >+                virtio_error(vdev, "Invalid size for indirect buffer table");
> >+                goto err;
> >+            }
> >+
> >+            /* If we've got too many, that implies a descriptor loop. */
> >+            if (num_bufs >= max) {
> >+                virtio_error(vdev, "Looped descriptor");
> >+                goto err;
> >+            }
> >+
> >+            /* loop over the indirect descriptor table */
> >+            len = address_space_cache_init(&indirect_desc_cache,
> >+                                           vdev->dma_as,
> >+                                           desc.addr, desc.len, false);
> >+            desc_cache = &indirect_desc_cache;
> >+            if (len < desc.len) {
> >+                virtio_error(vdev, "Cannot map indirect buffer");
> >+                goto err;
> >+            }
> >+
> >+            max = desc.len / sizeof(VRingPackedDesc);
> >+            num_bufs = i = 0;
> >+            vring_packed_desc_read(vdev, &desc, desc_cache, i);
> >+        }
> >+
> >+        do {
> >+            /* If we've got too many, that implies a descriptor loop. */
> >+            if (++num_bufs > max) {
> >+                virtio_error(vdev, "Looped descriptor");
> >+                goto err;
> >+            }
> >+
> >+            if (desc.flags & VRING_DESC_F_WRITE) {
> >+                in_total += desc.len;
> >+            } else {
> >+                out_total += desc.len;
> >+            }
> >+            if (in_total >= max_in_bytes && out_total >= max_out_bytes) {
> >+                goto done;
> >+            }
> >+
> >+            if (desc_cache == &indirect_desc_cache) {
> >+                if (++i >= max) {
> >+                    break;
> >+                }
> >+                vring_packed_desc_read(vdev, &desc, desc_cache, i);
> >+            } else {
> >+                if (++idx >= vq->vring.num) {
> >+                    idx -= vq->vring.num;
> >+                    wrap_counter ^= 1;
> >+                }
> >+                vring_packed_desc_read(vdev, &desc, desc_cache, idx);
> >+            }
> >+            /* Make sure flags has been read after all the other fields */
> >+            smp_rmb();
> 
> 
> I don't think we need this according to the spec:
> 
> "
> 
> The driver always makes the first descriptor in the list
> available after the rest of the list has been written out into
> the ring. This guarantees that the device will never observe a
> partial scatter/gather list in the ring.
> 
> "
> 
> So when the first is available, the rest of the chain should be available,
> otherwise device may see partial chain.

As always, I will remove it, thanks a lot.

Wei

> 
> Thanks
> 
> 
> >+        } while (desc.flags & VRING_DESC_F_NEXT);
> >+
> >+        if (desc_cache == &indirect_desc_cache) {
> >+            address_space_cache_destroy(&indirect_desc_cache);
> >+            total_bufs++;
> >+            /* We missed one step on for indirect desc */
> >+            idx++;
> >+            if (++idx >= vq->vring.num) {
> >+                idx -= vq->vring.num;
> >+                wrap_counter ^= 1;
> >+            }
> >+        } else {
> >+            total_bufs = num_bufs;
> >+        }
> >+
> >+        desc_cache = &caches->desc;
> >+        vring_packed_desc_read_flags(vdev, &desc, desc_cache, idx);
> >+    }
> >+
> >+    /* Record the index and wrap counter for a kick we want */
> >+    vq->shadow_avail_idx = idx;
> >+    vq->avail_wrap_counter = wrap_counter;
> >+done:
> >+    address_space_cache_destroy(&indirect_desc_cache);
> >+    if (in_bytes) {
> >+        *in_bytes = in_total;
> >+    }
> >+    if (out_bytes) {
> >+        *out_bytes = out_total;
> >+    }
> >+    rcu_read_unlock();
> >+    return;
> >+
> >+err:
> >+    in_total = out_total = 0;
> >+    goto done;
> >+}
> >+
> >+void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> >+                               unsigned int *out_bytes,
> >+                               unsigned max_in_bytes, unsigned max_out_bytes)
> >+{
> >+    uint16_t desc_size;
> >+    VRingMemoryRegionCaches *caches;
> >+
> >+    if (unlikely(!vq->vring.desc)) {
> >+        goto err;
> >+    }
> >+
> >+    caches = vring_get_region_caches(vq);
> >+    desc_size = virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED) ?
> >+                                sizeof(VRingPackedDesc) : sizeof(VRingDesc);
> >+    if (caches->desc.len < vq->vring.num * desc_size) {
> >+        virtio_error(vq->vdev, "Cannot map descriptor ring");
> >+        goto err;
> >+    }
> >+
> >+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> >+        virtqueue_packed_get_avail_bytes(vq, in_bytes, out_bytes,
> >+                                         max_in_bytes, max_out_bytes);
> >+    } else {
> >+        virtqueue_split_get_avail_bytes(vq, in_bytes, out_bytes,
> >+                                        max_in_bytes, max_out_bytes);
> >+    }
> >+
> >+    return;
> >+err:
> >+    if (in_bytes) {
> >+        *in_bytes = 0;
> >+    }
> >+    if (out_bytes) {
> >+        *out_bytes = 0;
> >+    }
> >+}
> >+
> >  int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
> >                            unsigned int out_bytes)
> >  {
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 06/11] virtio: get avail bytes check for packed ring
  2019-02-18 17:07     ` Wei Xu
@ 2019-02-19  6:24       ` Jason Wang
  2019-02-19  8:24         ` Wei Xu
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2019-02-19  6:24 UTC (permalink / raw)
  To: Wei Xu; +Cc: qemu-devel, maxime.coquelin, tiwei.bie, jfreiman, mst


On 2019/2/19 上午1:07, Wei Xu wrote:
> On Mon, Feb 18, 2019 at 03:27:21PM +0800, Jason Wang wrote:
>> On 2019/2/14 下午12:26, wexu@redhat.com wrote:
>>> From: Wei Xu <wexu@redhat.com>
>>>
>>> Add packed ring headcount check.
>>>
>>> Common part of split/packed ring are kept.
>>>
>>> Signed-off-by: Wei Xu <wexu@redhat.com>
>>> ---
>>>   hw/virtio/virtio.c | 197 ++++++++++++++++++++++++++++++++++++++++++++++++-----
>>>   1 file changed, 179 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>>> index f2ff980..832287b 100644
>>> --- a/hw/virtio/virtio.c
>>> +++ b/hw/virtio/virtio.c
>>> @@ -368,6 +368,17 @@ int virtio_queue_ready(VirtQueue *vq)
>>>       return vq->vring.avail != 0;
>>>   }
>>> +static void vring_packed_desc_read(VirtIODevice *vdev, VRingPackedDesc *desc,
>>> +                            MemoryRegionCache *cache, int i)
>>> +{
>>> +    address_space_read_cached(cache, i * sizeof(VRingPackedDesc),
>>> +                              desc, sizeof(VRingPackedDesc));
>>> +    virtio_tswap16s(vdev, &desc->flags);
>>> +    virtio_tswap64s(vdev, &desc->addr);
>>> +    virtio_tswap32s(vdev, &desc->len);
>>> +    virtio_tswap16s(vdev, &desc->id);
>>> +}
>>> +
>>>   static void vring_packed_desc_read_flags(VirtIODevice *vdev,
>>>                       VRingPackedDesc *desc, MemoryRegionCache *cache, int i)
>>>   {
>>> @@ -667,9 +678,9 @@ static int virtqueue_read_next_desc(VirtIODevice *vdev, VRingDesc *desc,
>>>       return VIRTQUEUE_READ_DESC_MORE;
>>>   }
>>> -void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
>>> -                               unsigned int *out_bytes,
>>> -                               unsigned max_in_bytes, unsigned max_out_bytes)
>>> +static void virtqueue_split_get_avail_bytes(VirtQueue *vq,
>>> +                            unsigned int *in_bytes, unsigned int *out_bytes,
>>> +                            unsigned max_in_bytes, unsigned max_out_bytes)
>>>   {
>>>       VirtIODevice *vdev = vq->vdev;
>>>       unsigned int max, idx;
>>> @@ -679,27 +690,12 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
>>>       int64_t len = 0;
>>>       int rc;
>>> -    if (unlikely(!vq->vring.desc)) {
>>> -        if (in_bytes) {
>>> -            *in_bytes = 0;
>>> -        }
>>> -        if (out_bytes) {
>>> -            *out_bytes = 0;
>>> -        }
>>> -        return;
>>> -    }
>>> -
>>>       rcu_read_lock();
>>>       idx = vq->last_avail_idx;
>>>       total_bufs = in_total = out_total = 0;
>>>       max = vq->vring.num;
>>>       caches = vring_get_region_caches(vq);
>>> -    if (caches->desc.len < max * sizeof(VRingDesc)) {
>>> -        virtio_error(vdev, "Cannot map descriptor ring");
>>> -        goto err;
>>> -    }
>>> -
>>>       while ((rc = virtqueue_num_heads(vq, idx)) > 0) {
>>>           MemoryRegionCache *desc_cache = &caches->desc;
>>>           unsigned int num_bufs;
>>> @@ -792,6 +788,171 @@ err:
>>>       goto done;
>>>   }
>>> +static void virtqueue_packed_get_avail_bytes(VirtQueue *vq,
>>> +                            unsigned int *in_bytes, unsigned int *out_bytes,
>>> +                            unsigned max_in_bytes, unsigned max_out_bytes)
>>> +{
>>> +    VirtIODevice *vdev = vq->vdev;
>>> +    unsigned int max, idx;
>>> +    unsigned int total_bufs, in_total, out_total;
>>> +    MemoryRegionCache *desc_cache;
>>> +    VRingMemoryRegionCaches *caches;
>>> +    MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
>>> +    int64_t len = 0;
>>> +    VRingPackedDesc desc;
>>> +    bool wrap_counter;
>>> +
>>> +    rcu_read_lock();
>>> +    idx = vq->last_avail_idx;
>>> +    wrap_counter = vq->last_avail_wrap_counter;
>>> +    total_bufs = in_total = out_total = 0;
>>> +
>>> +    max = vq->vring.num;
>>> +    caches = vring_get_region_caches(vq);
>>> +    desc_cache = &caches->desc;
>>> +    vring_packed_desc_read_flags(vdev, &desc, desc_cache, idx);
>>> +    while (is_desc_avail(&desc, wrap_counter)) {
>>> +        unsigned int num_bufs;
>>> +        unsigned int i = 0;
>>> +
>>> +        num_bufs = total_bufs;
>>> +
>>> +        /* Make sure flags has been read before all the fields. */
>>> +        smp_rmb();
>>> +        vring_packed_desc_read(vdev, &desc, desc_cache, idx);
>>
>> It's better to have single function to deal with reading flags and
>> descriptors and check its availability like packed ring.
> There is something different between split and packed ring here.
> For split ring, 'avail_idx' and descriptor are separately used so the
> interfaces of them are straightforward, while the flag and data fields
> of the descriptors for packed ring are mixed and independent accesses to
> them have been brought in, it is good to use them as what they are supposed
> to work. :)
>
> Another neat way is to pack the two operations to a new one, but it would
> introduce memory cache parameter passing.
>
> So personally I prefer to keep it unchanged, still want to sort it out?


It's as simple as another helper that call read_flags() and desc_read()?

Btw, it's better to have a consistent naming for the function like 
vring_packed_flags_read().

Thanks


>
>>
>>> +
>>> +        if (desc.flags & VRING_DESC_F_INDIRECT) {
>>> +            if (desc.len % sizeof(VRingPackedDesc)) {
>>> +                virtio_error(vdev, "Invalid size for indirect buffer table");
>>> +                goto err;
>>> +            }
>>> +
>>> +            /* If we've got too many, that implies a descriptor loop. */
>>> +            if (num_bufs >= max) {
>>> +                virtio_error(vdev, "Looped descriptor");
>>> +                goto err;
>>> +            }
>>> +
>>> +            /* loop over the indirect descriptor table */
>>> +            len = address_space_cache_init(&indirect_desc_cache,
>>> +                                           vdev->dma_as,
>>> +                                           desc.addr, desc.len, false);
>>> +            desc_cache = &indirect_desc_cache;
>>> +            if (len < desc.len) {
>>> +                virtio_error(vdev, "Cannot map indirect buffer");
>>> +                goto err;
>>> +            }
>>> +
>>> +            max = desc.len / sizeof(VRingPackedDesc);
>>> +            num_bufs = i = 0;
>>> +            vring_packed_desc_read(vdev, &desc, desc_cache, i);
>>> +        }
>>> +
>>> +        do {
>>> +            /* If we've got too many, that implies a descriptor loop. */
>>> +            if (++num_bufs > max) {
>>> +                virtio_error(vdev, "Looped descriptor");
>>> +                goto err;
>>> +            }
>>> +
>>> +            if (desc.flags & VRING_DESC_F_WRITE) {
>>> +                in_total += desc.len;
>>> +            } else {
>>> +                out_total += desc.len;
>>> +            }
>>> +            if (in_total >= max_in_bytes && out_total >= max_out_bytes) {
>>> +                goto done;
>>> +            }
>>> +
>>> +            if (desc_cache == &indirect_desc_cache) {
>>> +                if (++i >= max) {
>>> +                    break;
>>> +                }
>>> +                vring_packed_desc_read(vdev, &desc, desc_cache, i);
>>> +            } else {
>>> +                if (++idx >= vq->vring.num) {
>>> +                    idx -= vq->vring.num;
>>> +                    wrap_counter ^= 1;
>>> +                }
>>> +                vring_packed_desc_read(vdev, &desc, desc_cache, idx);
>>> +            }
>>> +            /* Make sure flags has been read after all the other fields */
>>> +            smp_rmb();
>>
>> I don't think we need this according to the spec:
>>
>> "
>>
>> The driver always makes the first descriptor in the list
>> available after the rest of the list has been written out into
>> the ring. This guarantees that the device will never observe a
>> partial scatter/gather list in the ring.
>>
>> "
>>
>> So when the first is available, the rest of the chain should be available,
>> otherwise device may see partial chain.
> As always, I will remove it, thanks a lot.
>
> Wei
>
>> Thanks
>>
>>
>>> +        } while (desc.flags & VRING_DESC_F_NEXT);
>>> +
>>> +        if (desc_cache == &indirect_desc_cache) {
>>> +            address_space_cache_destroy(&indirect_desc_cache);
>>> +            total_bufs++;
>>> +            /* We missed one step on for indirect desc */
>>> +            idx++;
>>> +            if (++idx >= vq->vring.num) {
>>> +                idx -= vq->vring.num;
>>> +                wrap_counter ^= 1;
>>> +            }
>>> +        } else {
>>> +            total_bufs = num_bufs;
>>> +        }
>>> +
>>> +        desc_cache = &caches->desc;
>>> +        vring_packed_desc_read_flags(vdev, &desc, desc_cache, idx);
>>> +    }
>>> +
>>> +    /* Record the index and wrap counter for a kick we want */
>>> +    vq->shadow_avail_idx = idx;
>>> +    vq->avail_wrap_counter = wrap_counter;
>>> +done:
>>> +    address_space_cache_destroy(&indirect_desc_cache);
>>> +    if (in_bytes) {
>>> +        *in_bytes = in_total;
>>> +    }
>>> +    if (out_bytes) {
>>> +        *out_bytes = out_total;
>>> +    }
>>> +    rcu_read_unlock();
>>> +    return;
>>> +
>>> +err:
>>> +    in_total = out_total = 0;
>>> +    goto done;
>>> +}
>>> +
>>> +void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
>>> +                               unsigned int *out_bytes,
>>> +                               unsigned max_in_bytes, unsigned max_out_bytes)
>>> +{
>>> +    uint16_t desc_size;
>>> +    VRingMemoryRegionCaches *caches;
>>> +
>>> +    if (unlikely(!vq->vring.desc)) {
>>> +        goto err;
>>> +    }
>>> +
>>> +    caches = vring_get_region_caches(vq);
>>> +    desc_size = virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED) ?
>>> +                                sizeof(VRingPackedDesc) : sizeof(VRingDesc);
>>> +    if (caches->desc.len < vq->vring.num * desc_size) {
>>> +        virtio_error(vq->vdev, "Cannot map descriptor ring");
>>> +        goto err;
>>> +    }
>>> +
>>> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
>>> +        virtqueue_packed_get_avail_bytes(vq, in_bytes, out_bytes,
>>> +                                         max_in_bytes, max_out_bytes);
>>> +    } else {
>>> +        virtqueue_split_get_avail_bytes(vq, in_bytes, out_bytes,
>>> +                                        max_in_bytes, max_out_bytes);
>>> +    }
>>> +
>>> +    return;
>>> +err:
>>> +    if (in_bytes) {
>>> +        *in_bytes = 0;
>>> +    }
>>> +    if (out_bytes) {
>>> +        *out_bytes = 0;
>>> +    }
>>> +}
>>> +
>>>   int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
>>>                             unsigned int out_bytes)
>>>   {

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 07/11] virtio: fill/flush/pop for packed ring
  2019-02-18 14:46     ` Wei Xu
@ 2019-02-19  6:49       ` Jason Wang
  2019-02-19  8:21         ` Wei Xu
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2019-02-19  6:49 UTC (permalink / raw)
  To: Wei Xu; +Cc: qemu-devel, mst, jfreiman, maxime.coquelin, tiwei.bie


On 2019/2/18 下午10:46, Wei Xu wrote:
>> Do we allow chain more descriptors than vq size in the case of indirect?
>> According to the spec:
>>
>> "
>>
>> The device limits the number of descriptors in a list through a
>> transport-specific and/or device-specific value. If not limited,
>> the maximum number of descriptors in a list is the virt queue
>> size.
>> "
>>
>> This looks possible, so the above is probably wrong if the max number of
>> chained descriptors is negotiated through a device specific way.
>>
> OK, I will remove this check, while it is necessary to have a limitation
> for indirect descriptor anyway, otherwise it is possible to hit an overflow
> since presumably u16 is used for most number/size in the spec.
>

Please try to test block and scsi device for you changes as well.

Thanks

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/11] virtio: event suppression support for packed ring
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 08/11] virtio: event suppression support " wexu
@ 2019-02-19  7:19   ` Jason Wang
  2019-02-19 10:40     ` Wei Xu
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2019-02-19  7:19 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: mst, jfreiman, maxime.coquelin, tiwei.bie


On 2019/2/14 下午12:26, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> Difference between 'avail_wrap_counter' and 'last_avail_wrap_counter':
> For Tx(guest transmitting), they are the same after each pop of a desc.
>
> For Rx(guest receiving), they are also the same when there are enough
> descriptors to carry the payload for a packet(e.g. usually 16 descs are
> needed for a 64k packet in typical iperf tcp connection with tso enabled),
> however, when the ring is running out of descriptors while there are
> still a few free ones, e.g. 6 descriptors are available which is not
> enough to carry an entire packet which needs 16 descriptors, in this
> case the 'avail_wrap_counter' should be set as the first one pending
> being handled by guest driver in order to get a notification, and the
> 'last_avail_wrap_counter' should stay unchanged to the head of available
> descriptors, like below:
>
> Mark meaning:
>      | | -- available
>      |*| -- used
>
> A Snapshot of the queue:
>                                        last_avail_idx = 253
>                                        last_avail_wrap_counter = 1
>                                               |
>      +---------------------------------------------+
>   0  | | | |*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*| | | | 255
>      +---------------------------------------------+
>             |
>    shadow_avail_idx = 3
>    avail_wrap_counter = 0


Well this might not be the good place to describe the difference between 
shadow_avail_idx and last_avail_idx. And the comments above their 
definition looks good enough?

     /* Next head to pop */
     uint16_t last_avail_idx;

     /* Last avail_idx read from VQ. */
     uint16_t shadow_avail_idx;

Instead, how or why need event suppress is not mentioned ...



>
> Signed-off-by: Wei Xu <wexu@redhat.com>
> ---
>   hw/virtio/virtio.c | 137 +++++++++++++++++++++++++++++++++++++++++++++++++----
>   1 file changed, 128 insertions(+), 9 deletions(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 7e276b4..8cfc7b6 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -234,6 +234,34 @@ static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
>       virtio_tswap16s(vdev, &desc->next);
>   }
>   
> +static void vring_packed_event_read(VirtIODevice *vdev,
> +                            MemoryRegionCache *cache, VRingPackedDescEvent *e)
> +{
> +    address_space_read_cached(cache, 0, e, sizeof(*e));
> +    virtio_tswap16s(vdev, &e->off_wrap);
> +    virtio_tswap16s(vdev, &e->flags);
> +}
> +
> +static void vring_packed_off_wrap_write(VirtIODevice *vdev,
> +                            MemoryRegionCache *cache, uint16_t off_wrap)
> +{
> +    virtio_tswap16s(vdev, &off_wrap);
> +    address_space_write_cached(cache, offsetof(VRingPackedDescEvent, off_wrap),
> +                            &off_wrap, sizeof(off_wrap));
> +    address_space_cache_invalidate(cache,
> +                offsetof(VRingPackedDescEvent, off_wrap), sizeof(off_wrap));
> +}
> +
> +static void vring_packed_flags_write(VirtIODevice *vdev,
> +                            MemoryRegionCache *cache, uint16_t flags)
> +{
> +    virtio_tswap16s(vdev, &flags);
> +    address_space_write_cached(cache, offsetof(VRingPackedDescEvent, flags),
> +                            &flags, sizeof(flags));
> +    address_space_cache_invalidate(cache,
> +                        offsetof(VRingPackedDescEvent, flags), sizeof(flags));
> +}
> +
>   static VRingMemoryRegionCaches *vring_get_region_caches(struct VirtQueue *vq)
>   {
>       VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
> @@ -340,14 +368,8 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
>       address_space_cache_invalidate(&caches->used, pa, sizeof(val));
>   }
>   
> -void virtio_queue_set_notification(VirtQueue *vq, int enable)
> +static void virtio_queue_set_notification_split(VirtQueue *vq, int enable)
>   {
> -    vq->notification = enable;
> -
> -    if (!vq->vring.desc) {
> -        return;
> -    }
> -
>       rcu_read_lock();
>       if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
>           vring_set_avail_event(vq, vring_avail_idx(vq));
> @@ -363,6 +385,57 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable)
>       rcu_read_unlock();
>   }
>   
> +static void virtio_queue_set_notification_packed(VirtQueue *vq, int enable)
> +{
> +    VRingPackedDescEvent e;
> +    VRingMemoryRegionCaches *caches;
> +
> +    rcu_read_lock();
> +    caches  = vring_get_region_caches(vq);
> +    vring_packed_event_read(vq->vdev, &caches->used, &e);
> +
> +    if (!enable) {
> +        if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> +            /* no need to write device area since this is outdated. */


We should advertise off and wrap in this case as well, otherwise we may 
get notifications earlier than expected.


> +            goto out;
> +        }
> +
> +        e.flags = VRING_PACKED_EVENT_FLAG_DISABLE;
> +        goto update;
> +    }
> +
> +    e.flags = VRING_PACKED_EVENT_FLAG_ENABLE;


Here and the above goto could be eliminated by:

if (even idx) {

...

} else if (enable) {

...

} else {

...

}


Thanks


> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> +        uint16_t off_wrap = vq->shadow_avail_idx | vq->avail_wrap_counter << 15;
> +
> +        vring_packed_off_wrap_write(vq->vdev, &caches->used, off_wrap);
> +        /* Make sure off_wrap is wrote before flags */
> +        smp_wmb();
> +
> +        e.flags = VRING_PACKED_EVENT_FLAG_DESC;
> +    }
> +
> +update:
> +    vring_packed_flags_write(vq->vdev, &caches->used, e.flags);
> +out:
> +    rcu_read_unlock();
> +}
> +
> +void virtio_queue_set_notification(VirtQueue *vq, int enable)
> +{
> +    vq->notification = enable;
> +
> +    if (!vq->vring.desc) {
> +        return;
> +    }
> +
> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> +        virtio_queue_set_notification_packed(vq, enable);
> +    } else {
> +        virtio_queue_set_notification_split(vq, enable);
> +    }
> +}
> +
>   int virtio_queue_ready(VirtQueue *vq)
>   {
>       return vq->vring.avail != 0;
> @@ -2117,8 +2190,7 @@ static void virtio_set_isr(VirtIODevice *vdev, int value)
>       }
>   }
>   
> -/* Called within rcu_read_lock().  */
> -static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
> +static bool virtio_split_should_notify(VirtIODevice *vdev, VirtQueue *vq)
>   {
>       uint16_t old, new;
>       bool v;
> @@ -2141,6 +2213,53 @@ static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
>       return !v || vring_need_event(vring_get_used_event(vq), new, old);
>   }
>   
> +static bool vring_packed_need_event(VirtQueue *vq, bool wrap,
> +                            uint16_t off_wrap, uint16_t new, uint16_t old)
> +{
> +    int off = off_wrap & ~(1 << 15);
> +
> +    if (wrap != off_wrap >> 15) {
> +        off -= vq->vring.num;
> +    }
> +
> +    return vring_need_event(off, new, old);
> +}
> +
> +static bool virtio_packed_should_notify(VirtIODevice *vdev, VirtQueue *vq)
> +{
> +    VRingPackedDescEvent e;
> +    uint16_t old, new;
> +    bool v;
> +    VRingMemoryRegionCaches *caches;
> +
> +    caches  = vring_get_region_caches(vq);
> +    vring_packed_event_read(vdev, &caches->avail, &e);
> +
> +    old = vq->signalled_used;
> +    new = vq->signalled_used = vq->used_idx;
> +    v = vq->signalled_used_valid;
> +    vq->signalled_used_valid = true;
> +
> +    if (e.flags == VRING_PACKED_EVENT_FLAG_DISABLE) {
> +        return false;
> +    } else if (e.flags == VRING_PACKED_EVENT_FLAG_ENABLE) {
> +        return true;
> +    }
> +
> +    return !v || vring_packed_need_event(vq,
> +        vq->used_wrap_counter, e.off_wrap, new, old);
> +}
> +
> +/* Called within rcu_read_lock().  */
> +static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
> +{
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +        return virtio_packed_should_notify(vdev, vq);
> +    } else {
> +        return virtio_split_should_notify(vdev, vq);
> +    }
> +}
> +
>   void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
>   {
>       bool should_notify;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 09/11] virtio-net: update the head descriptor in a chain lastly
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 09/11] virtio-net: update the head descriptor in a chain lastly wexu
@ 2019-02-19  7:23   ` Jason Wang
  2019-02-19 10:51     ` Wei Xu
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2019-02-19  7:23 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: mst, jfreiman, maxime.coquelin, tiwei.bie


On 2019/2/14 下午12:26, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> This is a helper for packed ring.
>
> To support packed ring, the head descriptor in a chain should be updated
> lastly since no 'avail_idx' like in packed ring to explicitly tell the
> driver side that all payload is ready after having done the chain, so
> the head is always visible immediately.
>
> This patch fills the header after done all the other ones.
>
> Signed-off-by: Wei Xu <wexu@redhat.com>


It's really odd to workaround API issue in the implementation of device. 
Please introduce batched used updating helpers instead.

Thanks


> ---
>   hw/net/virtio-net.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 3f319ef..330abea 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1251,6 +1251,8 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
>       struct virtio_net_hdr_mrg_rxbuf mhdr;
>       unsigned mhdr_cnt = 0;
>       size_t offset, i, guest_offset;
> +    VirtQueueElement head;
> +    int head_len = 0;
>   
>       if (!virtio_net_can_receive(nc)) {
>           return -1;
> @@ -1328,7 +1330,13 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
>           }
>   
>           /* signal other side */
> -        virtqueue_fill(q->rx_vq, elem, total, i++);
> +        if (i == 0) {
> +            head_len = total;
> +            head = *elem;
> +        } else {
> +            virtqueue_fill(q->rx_vq, elem, len, i);
> +        }
> +        i++;
>           g_free(elem);
>       }
>   
> @@ -1339,6 +1347,7 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
>                        &mhdr.num_buffers, sizeof mhdr.num_buffers);
>       }
>   
> +    virtqueue_fill(q->rx_vq, &head, head_len, 0);
>       virtqueue_flush(q->rx_vq, i);
>       virtio_notify(vdev, q->rx_vq);
>   

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/11] virtio: migration support for packed ring
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 10/11] virtio: migration support for packed ring wexu
@ 2019-02-19  7:30   ` Jason Wang
  2019-02-19 11:00     ` Wei Xu
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2019-02-19  7:30 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: tiwei.bie, mst, jfreiman, maxime.coquelin


On 2019/2/14 下午12:26, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> Both userspace and vhost-net/user are supported with this patch.
>
> A new subsection is introduced for packed ring, only 'last_avail_idx'
> and 'last_avail_wrap_counter' are saved/loaded presumably based on
> all the others relevant data(inuse, used/avail index and wrap count
> should be the same.


This is probably only true for net device, see comment in virtio_load():

             /*
              * Some devices migrate VirtQueueElements that have been popped
              * from the avail ring but not yet returned to the used ring.
              * Since max ring size < UINT16_MAX it's safe to use modulo
              * UINT16_MAX + 1 subtraction.
              */
             vdev->vq[i].inuse = (uint16_t)(vdev->vq[i].last_avail_idx -
                                 vdev->vq[i].used_idx);


So you need to migrate used_idx and used_wrap_counter since we don't 
have used idx.


>
> Signed-off-by: Wei Xu <wexu@redhat.com>
> ---
>   hw/virtio/virtio.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 66 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 8cfc7b6..7c5de07 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2349,6 +2349,13 @@ static bool virtio_virtqueue_needed(void *opaque)
>       return virtio_host_has_feature(vdev, VIRTIO_F_VERSION_1);
>   }
>   
> +static bool virtio_packed_virtqueue_needed(void *opaque)
> +{
> +    VirtIODevice *vdev = opaque;
> +
> +    return virtio_host_has_feature(vdev, VIRTIO_F_RING_PACKED);
> +}
> +
>   static bool virtio_ringsize_needed(void *opaque)
>   {
>       VirtIODevice *vdev = opaque;
> @@ -2390,6 +2397,17 @@ static const VMStateDescription vmstate_virtqueue = {
>       }
>   };
>   
> +static const VMStateDescription vmstate_packed_virtqueue = {
> +    .name = "packed_virtqueue_state",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT16(last_avail_idx, struct VirtQueue),
> +        VMSTATE_BOOL(last_avail_wrap_counter, struct VirtQueue),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>   static const VMStateDescription vmstate_virtio_virtqueues = {
>       .name = "virtio/virtqueues",
>       .version_id = 1,
> @@ -2402,6 +2420,18 @@ static const VMStateDescription vmstate_virtio_virtqueues = {
>       }
>   };
>   
> +static const VMStateDescription vmstate_virtio_packed_virtqueues = {
> +    .name = "virtio/packed_virtqueues",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = &virtio_packed_virtqueue_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_STRUCT_VARRAY_POINTER_KNOWN(vq, struct VirtIODevice,
> +                      VIRTIO_QUEUE_MAX, 0, vmstate_packed_virtqueue, VirtQueue),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>   static const VMStateDescription vmstate_ringsize = {
>       .name = "ringsize_state",
>       .version_id = 1,
> @@ -2522,6 +2552,7 @@ static const VMStateDescription vmstate_virtio = {
>           &vmstate_virtio_ringsize,
>           &vmstate_virtio_broken,
>           &vmstate_virtio_extra_state,
> +        &vmstate_virtio_packed_virtqueues,
>           NULL
>       }
>   };
> @@ -2794,6 +2825,17 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
>                   virtio_queue_update_rings(vdev, i);
>               }
>   
> +            if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +                vdev->vq[i].shadow_avail_idx = vdev->vq[i].last_avail_idx;
> +                vdev->vq[i].avail_wrap_counter =
> +                                        vdev->vq[i].last_avail_wrap_counter;
> +
> +                vdev->vq[i].used_idx = vdev->vq[i].last_avail_idx;
> +                vdev->vq[i].used_wrap_counter =
> +                                        vdev->vq[i].last_avail_wrap_counter;
> +                continue;
> +            }
> +
>               nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx;
>               /* Check it isn't doing strange things with descriptor numbers. */
>               if (nheads > vdev->vq[i].vring.num) {
> @@ -2955,17 +2997,34 @@ hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n)
>   
>   uint16_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n)
>   {
> -    return vdev->vq[n].last_avail_idx;
> +    uint16_t idx;
> +
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +        idx = vdev->vq[n].last_avail_idx;
> +        idx |= ((int)vdev->vq[n].avail_wrap_counter) << 15;
> +    } else {
> +        idx = (int)vdev->vq[n].last_avail_idx;
> +    }
> +    return idx;
>   }
>   
>   void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx)
>   {
> -    vdev->vq[n].last_avail_idx = idx;
> -    vdev->vq[n].shadow_avail_idx = idx;
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +        vdev->vq[n].last_avail_idx = idx & 0x7fff;
> +        vdev->vq[n].avail_wrap_counter = !!(idx & 0x8000);


Let's define some macros for those magic number.


> +    } else {
> +        vdev->vq[n].last_avail_idx = idx;
> +        vdev->vq[n].shadow_avail_idx = idx;
> +    }
>   }
>   
>   void virtio_queue_restore_last_avail_idx(VirtIODevice *vdev, int n)
>   {
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +        return;
> +    }


Why doesn't packed ring care about this?


> +
>       rcu_read_lock();
>       if (vdev->vq[n].vring.desc) {
>           vdev->vq[n].last_avail_idx = vring_used_idx(&vdev->vq[n]);
> @@ -2976,6 +3035,10 @@ void virtio_queue_restore_last_avail_idx(VirtIODevice *vdev, int n)
>   
>   void virtio_queue_update_used_idx(VirtIODevice *vdev, int n)
>   {
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +        return;
> +    }


And this?

Thanks


> +
>       rcu_read_lock();
>       if (vdev->vq[n].vring.desc) {
>           vdev->vq[n].used_idx = vring_used_idx(&vdev->vq[n]);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 11/11] virtio: CLI and provide packed ring feature bit by default
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 11/11] virtio: CLI and provide packed ring feature bit by default wexu
@ 2019-02-19  7:32   ` Jason Wang
  2019-02-19 11:23     ` Wei Xu
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2019-02-19  7:32 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: tiwei.bie, mst, jfreiman, maxime.coquelin


On 2019/2/14 下午12:26, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> Add userspace and vhost kernel/user support.
>
> Add CLI "ring_packed=true/false" to enable/disable packed ring provision.
> Usage:
>      -device virtio-net-pci,netdev=xx,mac=xx:xx:xx:xx:xx:xx,ring_packed=false
>
> By default it is provided.


Please compat this for old machine types.

Thanks


>
> Signed-off-by: Wei Xu <wexu@redhat.com>
> ---
>   hw/net/vhost_net.c         | 2 ++
>   include/hw/virtio/virtio.h | 4 +++-
>   2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index e037db6..f593086 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -53,6 +53,7 @@ static const int kernel_feature_bits[] = {
>       VIRTIO_F_VERSION_1,
>       VIRTIO_NET_F_MTU,
>       VIRTIO_F_IOMMU_PLATFORM,
> +    VIRTIO_F_RING_PACKED,
>       VHOST_INVALID_FEATURE_BIT
>   };
>   
> @@ -78,6 +79,7 @@ static const int user_feature_bits[] = {
>       VIRTIO_NET_F_MRG_RXBUF,
>       VIRTIO_NET_F_MTU,
>       VIRTIO_F_IOMMU_PLATFORM,
> +    VIRTIO_F_RING_PACKED,
>   
>       /* This bit implies RARP isn't sent by QEMU out of band */
>       VIRTIO_NET_F_GUEST_ANNOUNCE,
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 9c1fa07..2eb27d2 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -264,7 +264,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
>       DEFINE_PROP_BIT64("any_layout", _state, _field, \
>                         VIRTIO_F_ANY_LAYOUT, true), \
>       DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
> -                      VIRTIO_F_IOMMU_PLATFORM, false)
> +                      VIRTIO_F_IOMMU_PLATFORM, false), \
> +    DEFINE_PROP_BIT64("ring_packed", _state, _field, \
> +                      VIRTIO_F_RING_PACKED, true)
>   
>   hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
>   hwaddr virtio_queue_get_avail_addr(VirtIODevice *vdev, int n);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/11] packed ring virtio-net backends support
  2019-02-14  4:26 [Qemu-devel] [PATCH v4 00/11] packed ring virtio-net backends support wexu
                   ` (10 preceding siblings ...)
  2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 11/11] virtio: CLI and provide packed ring feature bit by default wexu
@ 2019-02-19  7:35 ` Jason Wang
  11 siblings, 0 replies; 41+ messages in thread
From: Jason Wang @ 2019-02-19  7:35 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: mst, jfreiman, maxime.coquelin, tiwei.bie


On 2019/2/14 下午12:26, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> https://github.com/Whishay/qemu.git
>
> Userspace and vhost-net backend test has been done with upstream kernel
> in guest.
>
> v3->v4:
>      - add version number to the subject of each patch.(mst)
>
> v2->v3:
>      v2/01 - drop it since the header has been synchronized from kernel.(mst & jason)
>      v3/01 - rename 'avail_wrap_counter' to 'last_avail_wrap_counter',
>              'event_wrap_counter' to 'avail_wrap_counter' to make it easier
>              to understand.(Jason)
>            - revise commit message.(Jason)
>      v3/02 - split packed ring areas size calculation to next patch.(Jason)
>              to not break bisect(Jason).
>      v3/03 - initialize packed ring region with correct size and attribute.
>            - remove unnecessary 'else' checks. (Jason)
>      v3/06 - add commit log.
>            - replace 'event_wrap-counter' with 'avail_wrap_counter'.
>            - merge common memory cache size check to virtqueue_get_avail_bytes().(Jason)
>            - revise memory barrier comment.(Jason)
>            - check indirect descriptors by desc.len/sizeof(desc).(Jason)
>            - flip wrap counter with '^=1'.(Jason)
>      v3/07 - move desc.id/len initialization to the declaration.(Jason)
>            - flip wrap counter '!' with '^=1'.(Jason)
>            - add memory barrier comments in commit message.
>      v3/08 - use offsetof() when writing cache.(Jason)
>            - avoid duplicated memory region write when turning off event_idx
>              supported notification.(Jason)
>            - add commit log.(Jason)
>            - add avail & last_avail wrap counter difference description in commit log.
>      v3/09 - remove unnecessary used/avail idx/wrap-counter from subsection.
>            - put new subsection to the end of vmstate_virtio.(Jason)
>            - squash the two userspace and vhost-net migration patches in v2.(Jason)
>      v3/10 - reword commit message.
>            - this is a help not a bug fix so I would like to keep it as a
>              separate patch still.(Proposed a merge it by Jason)
>            - the virtqueue_fill() is also not like an API so I would prefer not
>              to touch it, please correct me if I did not get it in the right
>              way.(Proposed a squash by Jason)
>      v3/11 - squash feature bits for user space and vhost kernel/user backends.
>            - enable packed ring feature bit provision on host by default.(Jason)
>
> Wei Xu (11):
>    virtio: rename structure for packed ring
>    virtio: device/driver area size calculation helper for split ring
>    virtio: initialize packed ring region
>    virtio: initialize wrap counter for packed ring
>    virtio: queue/descriptor check helpers for packed ring
>    virtio: get avail bytes check for packed ring
>    virtio: fill/flush/pop for packed ring
>    virtio: event suppression support for packed ring
>    virtio-net: update the head descriptor in a chain lastly
>    virtio: migration support for packed ring
>    virtio: CLI and provide packed ring feature bit by default
>
>   hw/net/vhost_net.c         |   2 +
>   hw/net/virtio-net.c        |  11 +-
>   hw/virtio/virtio.c         | 798 +++++++++++++++++++++++++++++++++++++++++----
>   include/hw/virtio/virtio.h |   4 +-
>   4 files changed, 757 insertions(+), 58 deletions(-)


Looks like there's something missed in the series. e.g the support of:

virtqueue_unpop()

virtqueue_rewind()

Thanks

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 07/11] virtio: fill/flush/pop for packed ring
  2019-02-19  6:49       ` Jason Wang
@ 2019-02-19  8:21         ` Wei Xu
  2019-02-19  9:33           ` Jason Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Wei Xu @ 2019-02-19  8:21 UTC (permalink / raw)
  To: Jason Wang; +Cc: qemu-devel, mst, jfreiman, maxime.coquelin, tiwei.bie

On Tue, Feb 19, 2019 at 02:49:42PM +0800, Jason Wang wrote:
> 
> On 2019/2/18 下午10:46, Wei Xu wrote:
> >>Do we allow chain more descriptors than vq size in the case of indirect?
> >>According to the spec:
> >>
> >>"
> >>
> >>The device limits the number of descriptors in a list through a
> >>transport-specific and/or device-specific value. If not limited,
> >>the maximum number of descriptors in a list is the virt queue
> >>size.
> >>"
> >>
> >>This looks possible, so the above is probably wrong if the max number of
> >>chained descriptors is negotiated through a device specific way.
> >>
> >OK, I will remove this check, while it is necessary to have a limitation
> >for indirect descriptor anyway, otherwise it is possible to hit an overflow
> >since presumably u16 is used for most number/size in the spec.
> >
> 
> Please try to test block and scsi device for you changes as well.

Any idea about what kind of test should be covered? AFAICT, currently
packed ring is targeted for virtio-net as discussed during voting spec.

Wei

> 
> Thanks
> 
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 06/11] virtio: get avail bytes check for packed ring
  2019-02-19  6:24       ` Jason Wang
@ 2019-02-19  8:24         ` Wei Xu
  0 siblings, 0 replies; 41+ messages in thread
From: Wei Xu @ 2019-02-19  8:24 UTC (permalink / raw)
  To: Jason Wang; +Cc: qemu-devel, maxime.coquelin, tiwei.bie, jfreiman, mst

On Tue, Feb 19, 2019 at 02:24:11PM +0800, Jason Wang wrote:
> 
> On 2019/2/19 上午1:07, Wei Xu wrote:
> >On Mon, Feb 18, 2019 at 03:27:21PM +0800, Jason Wang wrote:
> >>On 2019/2/14 下午12:26, wexu@redhat.com wrote:
> >>>From: Wei Xu <wexu@redhat.com>
> >>>
> >>>Add packed ring headcount check.
> >>>
> >>>Common part of split/packed ring are kept.
> >>>
> >>>Signed-off-by: Wei Xu <wexu@redhat.com>
> >>>---
> >>>  hw/virtio/virtio.c | 197 ++++++++++++++++++++++++++++++++++++++++++++++++-----
> >>>  1 file changed, 179 insertions(+), 18 deletions(-)
> >>>
> >>>diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >>>index f2ff980..832287b 100644
> >>>--- a/hw/virtio/virtio.c
> >>>+++ b/hw/virtio/virtio.c
> >>>@@ -368,6 +368,17 @@ int virtio_queue_ready(VirtQueue *vq)
> >>>      return vq->vring.avail != 0;
> >>>  }
> >>>+static void vring_packed_desc_read(VirtIODevice *vdev, VRingPackedDesc *desc,
> >>>+                            MemoryRegionCache *cache, int i)
> >>>+{
> >>>+    address_space_read_cached(cache, i * sizeof(VRingPackedDesc),
> >>>+                              desc, sizeof(VRingPackedDesc));
> >>>+    virtio_tswap16s(vdev, &desc->flags);
> >>>+    virtio_tswap64s(vdev, &desc->addr);
> >>>+    virtio_tswap32s(vdev, &desc->len);
> >>>+    virtio_tswap16s(vdev, &desc->id);
> >>>+}
> >>>+
> >>>  static void vring_packed_desc_read_flags(VirtIODevice *vdev,
> >>>                      VRingPackedDesc *desc, MemoryRegionCache *cache, int i)
> >>>  {
> >>>@@ -667,9 +678,9 @@ static int virtqueue_read_next_desc(VirtIODevice *vdev, VRingDesc *desc,
> >>>      return VIRTQUEUE_READ_DESC_MORE;
> >>>  }
> >>>-void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> >>>-                               unsigned int *out_bytes,
> >>>-                               unsigned max_in_bytes, unsigned max_out_bytes)
> >>>+static void virtqueue_split_get_avail_bytes(VirtQueue *vq,
> >>>+                            unsigned int *in_bytes, unsigned int *out_bytes,
> >>>+                            unsigned max_in_bytes, unsigned max_out_bytes)
> >>>  {
> >>>      VirtIODevice *vdev = vq->vdev;
> >>>      unsigned int max, idx;
> >>>@@ -679,27 +690,12 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> >>>      int64_t len = 0;
> >>>      int rc;
> >>>-    if (unlikely(!vq->vring.desc)) {
> >>>-        if (in_bytes) {
> >>>-            *in_bytes = 0;
> >>>-        }
> >>>-        if (out_bytes) {
> >>>-            *out_bytes = 0;
> >>>-        }
> >>>-        return;
> >>>-    }
> >>>-
> >>>      rcu_read_lock();
> >>>      idx = vq->last_avail_idx;
> >>>      total_bufs = in_total = out_total = 0;
> >>>      max = vq->vring.num;
> >>>      caches = vring_get_region_caches(vq);
> >>>-    if (caches->desc.len < max * sizeof(VRingDesc)) {
> >>>-        virtio_error(vdev, "Cannot map descriptor ring");
> >>>-        goto err;
> >>>-    }
> >>>-
> >>>      while ((rc = virtqueue_num_heads(vq, idx)) > 0) {
> >>>          MemoryRegionCache *desc_cache = &caches->desc;
> >>>          unsigned int num_bufs;
> >>>@@ -792,6 +788,171 @@ err:
> >>>      goto done;
> >>>  }
> >>>+static void virtqueue_packed_get_avail_bytes(VirtQueue *vq,
> >>>+                            unsigned int *in_bytes, unsigned int *out_bytes,
> >>>+                            unsigned max_in_bytes, unsigned max_out_bytes)
> >>>+{
> >>>+    VirtIODevice *vdev = vq->vdev;
> >>>+    unsigned int max, idx;
> >>>+    unsigned int total_bufs, in_total, out_total;
> >>>+    MemoryRegionCache *desc_cache;
> >>>+    VRingMemoryRegionCaches *caches;
> >>>+    MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
> >>>+    int64_t len = 0;
> >>>+    VRingPackedDesc desc;
> >>>+    bool wrap_counter;
> >>>+
> >>>+    rcu_read_lock();
> >>>+    idx = vq->last_avail_idx;
> >>>+    wrap_counter = vq->last_avail_wrap_counter;
> >>>+    total_bufs = in_total = out_total = 0;
> >>>+
> >>>+    max = vq->vring.num;
> >>>+    caches = vring_get_region_caches(vq);
> >>>+    desc_cache = &caches->desc;
> >>>+    vring_packed_desc_read_flags(vdev, &desc, desc_cache, idx);
> >>>+    while (is_desc_avail(&desc, wrap_counter)) {
> >>>+        unsigned int num_bufs;
> >>>+        unsigned int i = 0;
> >>>+
> >>>+        num_bufs = total_bufs;
> >>>+
> >>>+        /* Make sure flags has been read before all the fields. */
> >>>+        smp_rmb();
> >>>+        vring_packed_desc_read(vdev, &desc, desc_cache, idx);
> >>
> >>It's better to have single function to deal with reading flags and
> >>descriptors and check its availability like packed ring.
> >There is something different between split and packed ring here.
> >For split ring, 'avail_idx' and descriptor are separately used so the
> >interfaces of them are straightforward, while the flag and data fields
> >of the descriptors for packed ring are mixed and independent accesses to
> >them have been brought in, it is good to use them as what they are supposed
> >to work. :)
> >
> >Another neat way is to pack the two operations to a new one, but it would
> >introduce memory cache parameter passing.
> >
> >So personally I prefer to keep it unchanged, still want to sort it out?
> 
> 
> It's as simple as another helper that call read_flags() and desc_read()?

OK, will try one.

> 
> Btw, it's better to have a consistent naming for the function like
> vring_packed_flags_read().

OK, thanks.

Wei
> 
> Thanks
> 
> 
> >
> >>
> >>>+
> >>>+        if (desc.flags & VRING_DESC_F_INDIRECT) {
> >>>+            if (desc.len % sizeof(VRingPackedDesc)) {
> >>>+                virtio_error(vdev, "Invalid size for indirect buffer table");
> >>>+                goto err;
> >>>+            }
> >>>+
> >>>+            /* If we've got too many, that implies a descriptor loop. */
> >>>+            if (num_bufs >= max) {
> >>>+                virtio_error(vdev, "Looped descriptor");
> >>>+                goto err;
> >>>+            }
> >>>+
> >>>+            /* loop over the indirect descriptor table */
> >>>+            len = address_space_cache_init(&indirect_desc_cache,
> >>>+                                           vdev->dma_as,
> >>>+                                           desc.addr, desc.len, false);
> >>>+            desc_cache = &indirect_desc_cache;
> >>>+            if (len < desc.len) {
> >>>+                virtio_error(vdev, "Cannot map indirect buffer");
> >>>+                goto err;
> >>>+            }
> >>>+
> >>>+            max = desc.len / sizeof(VRingPackedDesc);
> >>>+            num_bufs = i = 0;
> >>>+            vring_packed_desc_read(vdev, &desc, desc_cache, i);
> >>>+        }
> >>>+
> >>>+        do {
> >>>+            /* If we've got too many, that implies a descriptor loop. */
> >>>+            if (++num_bufs > max) {
> >>>+                virtio_error(vdev, "Looped descriptor");
> >>>+                goto err;
> >>>+            }
> >>>+
> >>>+            if (desc.flags & VRING_DESC_F_WRITE) {
> >>>+                in_total += desc.len;
> >>>+            } else {
> >>>+                out_total += desc.len;
> >>>+            }
> >>>+            if (in_total >= max_in_bytes && out_total >= max_out_bytes) {
> >>>+                goto done;
> >>>+            }
> >>>+
> >>>+            if (desc_cache == &indirect_desc_cache) {
> >>>+                if (++i >= max) {
> >>>+                    break;
> >>>+                }
> >>>+                vring_packed_desc_read(vdev, &desc, desc_cache, i);
> >>>+            } else {
> >>>+                if (++idx >= vq->vring.num) {
> >>>+                    idx -= vq->vring.num;
> >>>+                    wrap_counter ^= 1;
> >>>+                }
> >>>+                vring_packed_desc_read(vdev, &desc, desc_cache, idx);
> >>>+            }
> >>>+            /* Make sure flags has been read after all the other fields */
> >>>+            smp_rmb();
> >>
> >>I don't think we need this according to the spec:
> >>
> >>"
> >>
> >>The driver always makes the first descriptor in the list
> >>available after the rest of the list has been written out into
> >>the ring. This guarantees that the device will never observe a
> >>partial scatter/gather list in the ring.
> >>
> >>"
> >>
> >>So when the first is available, the rest of the chain should be available,
> >>otherwise device may see partial chain.
> >As always, I will remove it, thanks a lot.
> >
> >Wei
> >
> >>Thanks
> >>
> >>
> >>>+        } while (desc.flags & VRING_DESC_F_NEXT);
> >>>+
> >>>+        if (desc_cache == &indirect_desc_cache) {
> >>>+            address_space_cache_destroy(&indirect_desc_cache);
> >>>+            total_bufs++;
> >>>+            /* We missed one step on for indirect desc */
> >>>+            idx++;
> >>>+            if (++idx >= vq->vring.num) {
> >>>+                idx -= vq->vring.num;
> >>>+                wrap_counter ^= 1;
> >>>+            }
> >>>+        } else {
> >>>+            total_bufs = num_bufs;
> >>>+        }
> >>>+
> >>>+        desc_cache = &caches->desc;
> >>>+        vring_packed_desc_read_flags(vdev, &desc, desc_cache, idx);
> >>>+    }
> >>>+
> >>>+    /* Record the index and wrap counter for a kick we want */
> >>>+    vq->shadow_avail_idx = idx;
> >>>+    vq->avail_wrap_counter = wrap_counter;
> >>>+done:
> >>>+    address_space_cache_destroy(&indirect_desc_cache);
> >>>+    if (in_bytes) {
> >>>+        *in_bytes = in_total;
> >>>+    }
> >>>+    if (out_bytes) {
> >>>+        *out_bytes = out_total;
> >>>+    }
> >>>+    rcu_read_unlock();
> >>>+    return;
> >>>+
> >>>+err:
> >>>+    in_total = out_total = 0;
> >>>+    goto done;
> >>>+}
> >>>+
> >>>+void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> >>>+                               unsigned int *out_bytes,
> >>>+                               unsigned max_in_bytes, unsigned max_out_bytes)
> >>>+{
> >>>+    uint16_t desc_size;
> >>>+    VRingMemoryRegionCaches *caches;
> >>>+
> >>>+    if (unlikely(!vq->vring.desc)) {
> >>>+        goto err;
> >>>+    }
> >>>+
> >>>+    caches = vring_get_region_caches(vq);
> >>>+    desc_size = virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED) ?
> >>>+                                sizeof(VRingPackedDesc) : sizeof(VRingDesc);
> >>>+    if (caches->desc.len < vq->vring.num * desc_size) {
> >>>+        virtio_error(vq->vdev, "Cannot map descriptor ring");
> >>>+        goto err;
> >>>+    }
> >>>+
> >>>+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> >>>+        virtqueue_packed_get_avail_bytes(vq, in_bytes, out_bytes,
> >>>+                                         max_in_bytes, max_out_bytes);
> >>>+    } else {
> >>>+        virtqueue_split_get_avail_bytes(vq, in_bytes, out_bytes,
> >>>+                                        max_in_bytes, max_out_bytes);
> >>>+    }
> >>>+
> >>>+    return;
> >>>+err:
> >>>+    if (in_bytes) {
> >>>+        *in_bytes = 0;
> >>>+    }
> >>>+    if (out_bytes) {
> >>>+        *out_bytes = 0;
> >>>+    }
> >>>+}
> >>>+
> >>>  int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
> >>>                            unsigned int out_bytes)
> >>>  {

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 07/11] virtio: fill/flush/pop for packed ring
  2019-02-19  8:21         ` Wei Xu
@ 2019-02-19  9:33           ` Jason Wang
  2019-02-19 11:34             ` Wei Xu
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2019-02-19  9:33 UTC (permalink / raw)
  To: Wei Xu; +Cc: tiwei.bie, maxime.coquelin, qemu-devel, jfreiman, mst


On 2019/2/19 下午4:21, Wei Xu wrote:
> On Tue, Feb 19, 2019 at 02:49:42PM +0800, Jason Wang wrote:
>> On 2019/2/18 下午10:46, Wei Xu wrote:
>>>> Do we allow chain more descriptors than vq size in the case of indirect?
>>>> According to the spec:
>>>>
>>>> "
>>>>
>>>> The device limits the number of descriptors in a list through a
>>>> transport-specific and/or device-specific value. If not limited,
>>>> the maximum number of descriptors in a list is the virt queue
>>>> size.
>>>> "
>>>>
>>>> This looks possible, so the above is probably wrong if the max number of
>>>> chained descriptors is negotiated through a device specific way.
>>>>
>>> OK, I will remove this check, while it is necessary to have a limitation
>>> for indirect descriptor anyway, otherwise it is possible to hit an overflow
>>> since presumably u16 is used for most number/size in the spec.
>>>
>> Please try to test block and scsi device for you changes as well.
> Any idea about what kind of test should be covered? AFAICT, currently
> packed ring is targeted for virtio-net as discussed during voting spec.
>
> Wei


Well it's not only for net for sure, it should support all kinds of 
device. For testing, you should test basic function plus migration.

Thanks


>
>> Thanks
>>
>>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/11] virtio: event suppression support for packed ring
  2019-02-19  7:19   ` Jason Wang
@ 2019-02-19 10:40     ` Wei Xu
  2019-02-19 13:06       ` Jason Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Wei Xu @ 2019-02-19 10:40 UTC (permalink / raw)
  To: Jason Wang; +Cc: qemu-devel, maxime.coquelin, tiwei.bie, jfreiman, mst

On Tue, Feb 19, 2019 at 03:19:58PM +0800, Jason Wang wrote:
> 
> On 2019/2/14 下午12:26, wexu@redhat.com wrote:
> >From: Wei Xu <wexu@redhat.com>
> >
> >Difference between 'avail_wrap_counter' and 'last_avail_wrap_counter':
> >For Tx(guest transmitting), they are the same after each pop of a desc.
> >
> >For Rx(guest receiving), they are also the same when there are enough
> >descriptors to carry the payload for a packet(e.g. usually 16 descs are
> >needed for a 64k packet in typical iperf tcp connection with tso enabled),
> >however, when the ring is running out of descriptors while there are
> >still a few free ones, e.g. 6 descriptors are available which is not
> >enough to carry an entire packet which needs 16 descriptors, in this
> >case the 'avail_wrap_counter' should be set as the first one pending
> >being handled by guest driver in order to get a notification, and the
> >'last_avail_wrap_counter' should stay unchanged to the head of available
> >descriptors, like below:
> >
> >Mark meaning:
> >     | | -- available
> >     |*| -- used
> >
> >A Snapshot of the queue:
> >                                       last_avail_idx = 253
> >                                       last_avail_wrap_counter = 1
> >                                              |
> >     +---------------------------------------------+
> >  0  | | | |*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*| | | | 255
> >     +---------------------------------------------+
> >            |
> >   shadow_avail_idx = 3
> >   avail_wrap_counter = 0
> 
> 
> Well this might not be the good place to describe the difference between
> shadow_avail_idx and last_avail_idx. And the comments above their definition
> looks good enough?

Sorry, I do not get it, can you elaborate more? 

This is one of the buggy parts of packed ring, it is good to make it clear here.

> 
>     /* Next head to pop */
>     uint16_t last_avail_idx;
> 
>     /* Last avail_idx read from VQ. */
>     uint16_t shadow_avail_idx;
> 

What is the meaning of these comment? Do you mean I should replace put them 
to the comments also? thanks.

> Instead, how or why need event suppress is not mentioned ...

Yes, but presumably this knowledge has been acquired from reading through the
spec, so I skipped this part.

Wei

> 
> 
> 
> >
> >Signed-off-by: Wei Xu <wexu@redhat.com>
> >---
> >  hw/virtio/virtio.c | 137 +++++++++++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 128 insertions(+), 9 deletions(-)
> >
> >diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >index 7e276b4..8cfc7b6 100644
> >--- a/hw/virtio/virtio.c
> >+++ b/hw/virtio/virtio.c
> >@@ -234,6 +234,34 @@ static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
> >      virtio_tswap16s(vdev, &desc->next);
> >  }
> >+static void vring_packed_event_read(VirtIODevice *vdev,
> >+                            MemoryRegionCache *cache, VRingPackedDescEvent *e)
> >+{
> >+    address_space_read_cached(cache, 0, e, sizeof(*e));
> >+    virtio_tswap16s(vdev, &e->off_wrap);
> >+    virtio_tswap16s(vdev, &e->flags);
> >+}
> >+
> >+static void vring_packed_off_wrap_write(VirtIODevice *vdev,
> >+                            MemoryRegionCache *cache, uint16_t off_wrap)
> >+{
> >+    virtio_tswap16s(vdev, &off_wrap);
> >+    address_space_write_cached(cache, offsetof(VRingPackedDescEvent, off_wrap),
> >+                            &off_wrap, sizeof(off_wrap));
> >+    address_space_cache_invalidate(cache,
> >+                offsetof(VRingPackedDescEvent, off_wrap), sizeof(off_wrap));
> >+}
> >+
> >+static void vring_packed_flags_write(VirtIODevice *vdev,
> >+                            MemoryRegionCache *cache, uint16_t flags)
> >+{
> >+    virtio_tswap16s(vdev, &flags);
> >+    address_space_write_cached(cache, offsetof(VRingPackedDescEvent, flags),
> >+                            &flags, sizeof(flags));
> >+    address_space_cache_invalidate(cache,
> >+                        offsetof(VRingPackedDescEvent, flags), sizeof(flags));
> >+}
> >+
> >  static VRingMemoryRegionCaches *vring_get_region_caches(struct VirtQueue *vq)
> >  {
> >      VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
> >@@ -340,14 +368,8 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
> >      address_space_cache_invalidate(&caches->used, pa, sizeof(val));
> >  }
> >-void virtio_queue_set_notification(VirtQueue *vq, int enable)
> >+static void virtio_queue_set_notification_split(VirtQueue *vq, int enable)
> >  {
> >-    vq->notification = enable;
> >-
> >-    if (!vq->vring.desc) {
> >-        return;
> >-    }
> >-
> >      rcu_read_lock();
> >      if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> >          vring_set_avail_event(vq, vring_avail_idx(vq));
> >@@ -363,6 +385,57 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable)
> >      rcu_read_unlock();
> >  }
> >+static void virtio_queue_set_notification_packed(VirtQueue *vq, int enable)
> >+{
> >+    VRingPackedDescEvent e;
> >+    VRingMemoryRegionCaches *caches;
> >+
> >+    rcu_read_lock();
> >+    caches  = vring_get_region_caches(vq);
> >+    vring_packed_event_read(vq->vdev, &caches->used, &e);
> >+
> >+    if (!enable) {
> >+        if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> >+            /* no need to write device area since this is outdated. */
> 
> 
> We should advertise off and wrap in this case as well, otherwise we may get
> notifications earlier than expected.

Is it necessary? Supposing offset & wrap_counter are always set before update
notification flags, it is reliable to skip this for disabling, isn't it?

While the logic here is unclear, I did a concise one like below
which doesn't use the 'vq->notification' as in your comment for v2,
I think this should work for packed ring as well, anything I missed?

    if (!enable) {
        e.flags = VRING_PACKED_EVENT_FLAG_DISABLE;
    } else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
        uint16_t off_wrap;

        off_wrap = vq->shadow_avail_idx | vq->shadow_avail_wrap_counter << 15;
        vring_packed_off_wrap_write(vq->vdev, &caches->used, off_wrap);

        /* Make sure off_wrap is wrote before flags */
        smp_wmb();
        e.flags = VRING_PACKED_EVENT_FLAG_DESC;
    } else {
        e.flags = VRING_PACKED_EVENT_FLAG_ENABLE;
    }

    vring_packed_flags_write(vq->vdev, &caches->used, e.flags);

> 
> 
> >+            goto out;
> >+        }
> >+
> >+        e.flags = VRING_PACKED_EVENT_FLAG_DISABLE;
> >+        goto update;
> >+    }
> >+
> >+    e.flags = VRING_PACKED_EVENT_FLAG_ENABLE;
> 
> 
> Here and the above goto could be eliminated by:
> 
> if (even idx) {
> 
> ...
> 
> } else if (enable) {
> 
> ...
> 
> } else {
> 
> ...
> 
> }
> 

thanks, I have removed it in above snippet.

Wei
> 
> Thanks
> 
> 
> >+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> >+        uint16_t off_wrap = vq->shadow_avail_idx | vq->avail_wrap_counter << 15;
> >+
> >+        vring_packed_off_wrap_write(vq->vdev, &caches->used, off_wrap);
> >+        /* Make sure off_wrap is wrote before flags */
> >+        smp_wmb();
> >+
> >+        e.flags = VRING_PACKED_EVENT_FLAG_DESC;
> >+    }
> >+
> >+update:
> >+    vring_packed_flags_write(vq->vdev, &caches->used, e.flags);
> >+out:
> >+    rcu_read_unlock();
> >+}
> >+
> >+void virtio_queue_set_notification(VirtQueue *vq, int enable)
> >+{
> >+    vq->notification = enable;
> >+
> >+    if (!vq->vring.desc) {
> >+        return;
> >+    }
> >+
> >+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> >+        virtio_queue_set_notification_packed(vq, enable);
> >+    } else {
> >+        virtio_queue_set_notification_split(vq, enable);
> >+    }
> >+}
> >+
> >  int virtio_queue_ready(VirtQueue *vq)
> >  {
> >      return vq->vring.avail != 0;
> >@@ -2117,8 +2190,7 @@ static void virtio_set_isr(VirtIODevice *vdev, int value)
> >      }
> >  }
> >-/* Called within rcu_read_lock().  */
> >-static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
> >+static bool virtio_split_should_notify(VirtIODevice *vdev, VirtQueue *vq)
> >  {
> >      uint16_t old, new;
> >      bool v;
> >@@ -2141,6 +2213,53 @@ static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
> >      return !v || vring_need_event(vring_get_used_event(vq), new, old);
> >  }
> >+static bool vring_packed_need_event(VirtQueue *vq, bool wrap,
> >+                            uint16_t off_wrap, uint16_t new, uint16_t old)
> >+{
> >+    int off = off_wrap & ~(1 << 15);
> >+
> >+    if (wrap != off_wrap >> 15) {
> >+        off -= vq->vring.num;
> >+    }
> >+
> >+    return vring_need_event(off, new, old);
> >+}
> >+
> >+static bool virtio_packed_should_notify(VirtIODevice *vdev, VirtQueue *vq)
> >+{
> >+    VRingPackedDescEvent e;
> >+    uint16_t old, new;
> >+    bool v;
> >+    VRingMemoryRegionCaches *caches;
> >+
> >+    caches  = vring_get_region_caches(vq);
> >+    vring_packed_event_read(vdev, &caches->avail, &e);
> >+
> >+    old = vq->signalled_used;
> >+    new = vq->signalled_used = vq->used_idx;
> >+    v = vq->signalled_used_valid;
> >+    vq->signalled_used_valid = true;
> >+
> >+    if (e.flags == VRING_PACKED_EVENT_FLAG_DISABLE) {
> >+        return false;
> >+    } else if (e.flags == VRING_PACKED_EVENT_FLAG_ENABLE) {
> >+        return true;
> >+    }
> >+
> >+    return !v || vring_packed_need_event(vq,
> >+        vq->used_wrap_counter, e.off_wrap, new, old);
> >+}
> >+
> >+/* Called within rcu_read_lock().  */
> >+static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
> >+{
> >+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> >+        return virtio_packed_should_notify(vdev, vq);
> >+    } else {
> >+        return virtio_split_should_notify(vdev, vq);
> >+    }
> >+}
> >+
> >  void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
> >  {
> >      bool should_notify;
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 09/11] virtio-net: update the head descriptor in a chain lastly
  2019-02-19  7:23   ` Jason Wang
@ 2019-02-19 10:51     ` Wei Xu
  2019-02-19 13:09       ` Jason Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Wei Xu @ 2019-02-19 10:51 UTC (permalink / raw)
  To: Jason Wang; +Cc: qemu-devel, maxime.coquelin, tiwei.bie, jfreiman, mst

On Tue, Feb 19, 2019 at 03:23:01PM +0800, Jason Wang wrote:
> 
> On 2019/2/14 下午12:26, wexu@redhat.com wrote:
> >From: Wei Xu <wexu@redhat.com>
> >
> >This is a helper for packed ring.
> >
> >To support packed ring, the head descriptor in a chain should be updated
> >lastly since no 'avail_idx' like in packed ring to explicitly tell the
> >driver side that all payload is ready after having done the chain, so
> >the head is always visible immediately.
> >
> >This patch fills the header after done all the other ones.
> >
> >Signed-off-by: Wei Xu <wexu@redhat.com>
> 
> 
> It's really odd to workaround API issue in the implementation of device.
> Please introduce batched used updating helpers instead.
Can you elaborate a bit more? I don't get it as well.

The exact batch as vhost-net or dpdk pmd is not supported by userspace
backend. The change here is to keep the header descriptor updated at
last in case of a chaining descriptors and the helper might not help
too much.

Wei
> 
> Thanks
> 
> 
> >---
> >  hw/net/virtio-net.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> >
> >diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >index 3f319ef..330abea 100644
> >--- a/hw/net/virtio-net.c
> >+++ b/hw/net/virtio-net.c
> >@@ -1251,6 +1251,8 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
> >      struct virtio_net_hdr_mrg_rxbuf mhdr;
> >      unsigned mhdr_cnt = 0;
> >      size_t offset, i, guest_offset;
> >+    VirtQueueElement head;
> >+    int head_len = 0;
> >      if (!virtio_net_can_receive(nc)) {
> >          return -1;
> >@@ -1328,7 +1330,13 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
> >          }
> >          /* signal other side */
> >-        virtqueue_fill(q->rx_vq, elem, total, i++);
> >+        if (i == 0) {
> >+            head_len = total;
> >+            head = *elem;
> >+        } else {
> >+            virtqueue_fill(q->rx_vq, elem, len, i);
> >+        }
> >+        i++;
> >          g_free(elem);
> >      }
> >@@ -1339,6 +1347,7 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
> >                       &mhdr.num_buffers, sizeof mhdr.num_buffers);
> >      }
> >+    virtqueue_fill(q->rx_vq, &head, head_len, 0);
> >      virtqueue_flush(q->rx_vq, i);
> >      virtio_notify(vdev, q->rx_vq);
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/11] virtio: migration support for packed ring
  2019-02-19  7:30   ` Jason Wang
@ 2019-02-19 11:00     ` Wei Xu
  2019-02-19 13:12       ` Jason Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Wei Xu @ 2019-02-19 11:00 UTC (permalink / raw)
  To: Jason Wang; +Cc: qemu-devel, tiwei.bie, mst, jfreiman, maxime.coquelin

On Tue, Feb 19, 2019 at 03:30:41PM +0800, Jason Wang wrote:
> 
> On 2019/2/14 下午12:26, wexu@redhat.com wrote:
> >From: Wei Xu <wexu@redhat.com>
> >
> >Both userspace and vhost-net/user are supported with this patch.
> >
> >A new subsection is introduced for packed ring, only 'last_avail_idx'
> >and 'last_avail_wrap_counter' are saved/loaded presumably based on
> >all the others relevant data(inuse, used/avail index and wrap count
> >should be the same.
> 
> 
> This is probably only true for net device, see comment in virtio_load():
> 
>             /*
>              * Some devices migrate VirtQueueElements that have been popped
>              * from the avail ring but not yet returned to the used ring.
>              * Since max ring size < UINT16_MAX it's safe to use modulo
>              * UINT16_MAX + 1 subtraction.
>              */
>             vdev->vq[i].inuse = (uint16_t)(vdev->vq[i].last_avail_idx -
>                                 vdev->vq[i].used_idx);
> 
> 
> So you need to migrate used_idx and used_wrap_counter since we don't have
> used idx.

This is trying to align with vhost-net/user as we discussed, since all we
have done is to support  virtio-net device for packed ring, maybe we can
consider supporting other devices after we have got it verified.

> 
> 
> >
> >Signed-off-by: Wei Xu <wexu@redhat.com>
> >---
> >  hw/virtio/virtio.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 66 insertions(+), 3 deletions(-)
> >
> >diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >index 8cfc7b6..7c5de07 100644
> >--- a/hw/virtio/virtio.c
> >+++ b/hw/virtio/virtio.c
> >@@ -2349,6 +2349,13 @@ static bool virtio_virtqueue_needed(void *opaque)
> >      return virtio_host_has_feature(vdev, VIRTIO_F_VERSION_1);
> >  }
> >+static bool virtio_packed_virtqueue_needed(void *opaque)
> >+{
> >+    VirtIODevice *vdev = opaque;
> >+
> >+    return virtio_host_has_feature(vdev, VIRTIO_F_RING_PACKED);
> >+}
> >+
> >  static bool virtio_ringsize_needed(void *opaque)
> >  {
> >      VirtIODevice *vdev = opaque;
> >@@ -2390,6 +2397,17 @@ static const VMStateDescription vmstate_virtqueue = {
> >      }
> >  };
> >+static const VMStateDescription vmstate_packed_virtqueue = {
> >+    .name = "packed_virtqueue_state",
> >+    .version_id = 1,
> >+    .minimum_version_id = 1,
> >+    .fields = (VMStateField[]) {
> >+        VMSTATE_UINT16(last_avail_idx, struct VirtQueue),
> >+        VMSTATE_BOOL(last_avail_wrap_counter, struct VirtQueue),
> >+        VMSTATE_END_OF_LIST()
> >+    }
> >+};
> >+
> >  static const VMStateDescription vmstate_virtio_virtqueues = {
> >      .name = "virtio/virtqueues",
> >      .version_id = 1,
> >@@ -2402,6 +2420,18 @@ static const VMStateDescription vmstate_virtio_virtqueues = {
> >      }
> >  };
> >+static const VMStateDescription vmstate_virtio_packed_virtqueues = {
> >+    .name = "virtio/packed_virtqueues",
> >+    .version_id = 1,
> >+    .minimum_version_id = 1,
> >+    .needed = &virtio_packed_virtqueue_needed,
> >+    .fields = (VMStateField[]) {
> >+        VMSTATE_STRUCT_VARRAY_POINTER_KNOWN(vq, struct VirtIODevice,
> >+                      VIRTIO_QUEUE_MAX, 0, vmstate_packed_virtqueue, VirtQueue),
> >+        VMSTATE_END_OF_LIST()
> >+    }
> >+};
> >+
> >  static const VMStateDescription vmstate_ringsize = {
> >      .name = "ringsize_state",
> >      .version_id = 1,
> >@@ -2522,6 +2552,7 @@ static const VMStateDescription vmstate_virtio = {
> >          &vmstate_virtio_ringsize,
> >          &vmstate_virtio_broken,
> >          &vmstate_virtio_extra_state,
> >+        &vmstate_virtio_packed_virtqueues,
> >          NULL
> >      }
> >  };
> >@@ -2794,6 +2825,17 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
> >                  virtio_queue_update_rings(vdev, i);
> >              }
> >+            if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> >+                vdev->vq[i].shadow_avail_idx = vdev->vq[i].last_avail_idx;
> >+                vdev->vq[i].avail_wrap_counter =
> >+                                        vdev->vq[i].last_avail_wrap_counter;
> >+
> >+                vdev->vq[i].used_idx = vdev->vq[i].last_avail_idx;
> >+                vdev->vq[i].used_wrap_counter =
> >+                                        vdev->vq[i].last_avail_wrap_counter;
> >+                continue;
> >+            }
> >+
> >              nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx;
> >              /* Check it isn't doing strange things with descriptor numbers. */
> >              if (nheads > vdev->vq[i].vring.num) {
> >@@ -2955,17 +2997,34 @@ hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n)
> >  uint16_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n)
> >  {
> >-    return vdev->vq[n].last_avail_idx;
> >+    uint16_t idx;
> >+
> >+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> >+        idx = vdev->vq[n].last_avail_idx;
> >+        idx |= ((int)vdev->vq[n].avail_wrap_counter) << 15;
> >+    } else {
> >+        idx = (int)vdev->vq[n].last_avail_idx;
> >+    }
> >+    return idx;
> >  }
> >  void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx)
> >  {
> >-    vdev->vq[n].last_avail_idx = idx;
> >-    vdev->vq[n].shadow_avail_idx = idx;
> >+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> >+        vdev->vq[n].last_avail_idx = idx & 0x7fff;
> >+        vdev->vq[n].avail_wrap_counter = !!(idx & 0x8000);
> 
> 
> Let's define some macros for those magic number.

OK.

> 
> 
> >+    } else {
> >+        vdev->vq[n].last_avail_idx = idx;
> >+        vdev->vq[n].shadow_avail_idx = idx;
> >+    }
> >  }
> >  void virtio_queue_restore_last_avail_idx(VirtIODevice *vdev, int n)
> >  {
> >+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> >+        return;
> >+    }
> 
> 
> Why doesn't packed ring care about this?

As elaborated above, used idx/wrap_counter are supposed to be the
same with avail ones for vhost-net/user.

> 
> 
> >+
> >      rcu_read_lock();
> >      if (vdev->vq[n].vring.desc) {
> >          vdev->vq[n].last_avail_idx = vring_used_idx(&vdev->vq[n]);
> >@@ -2976,6 +3035,10 @@ void virtio_queue_restore_last_avail_idx(VirtIODevice *vdev, int n)
> >  void virtio_queue_update_used_idx(VirtIODevice *vdev, int n)
> >  {
> >+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> >+        return;
> >+    }
> 
> 
> And this?

Same as above.

Wei

> 
> Thanks
> 
> 
> >+
> >      rcu_read_lock();
> >      if (vdev->vq[n].vring.desc) {
> >          vdev->vq[n].used_idx = vring_used_idx(&vdev->vq[n]);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 11/11] virtio: CLI and provide packed ring feature bit by default
  2019-02-19  7:32   ` Jason Wang
@ 2019-02-19 11:23     ` Wei Xu
  2019-02-19 13:33       ` Jason Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Wei Xu @ 2019-02-19 11:23 UTC (permalink / raw)
  To: Jason Wang; +Cc: qemu-devel, maxime.coquelin, jfreiman, tiwei.bie, mst

On Tue, Feb 19, 2019 at 03:32:19PM +0800, Jason Wang wrote:
> 
> On 2019/2/14 下午12:26, wexu@redhat.com wrote:
> >From: Wei Xu <wexu@redhat.com>
> >
> >Add userspace and vhost kernel/user support.
> >
> >Add CLI "ring_packed=true/false" to enable/disable packed ring provision.
> >Usage:
> >     -device virtio-net-pci,netdev=xx,mac=xx:xx:xx:xx:xx:xx,ring_packed=false
> >
> >By default it is provided.
> 
> 
> Please compat this for old machine types.

It is provided by default, how to make it compatible for old machine types?
Hide or provide it?

Wei

> 
> Thanks
> 
> 
> >
> >Signed-off-by: Wei Xu <wexu@redhat.com>
> >---
> >  hw/net/vhost_net.c         | 2 ++
> >  include/hw/virtio/virtio.h | 4 +++-
> >  2 files changed, 5 insertions(+), 1 deletion(-)
> >
> >diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> >index e037db6..f593086 100644
> >--- a/hw/net/vhost_net.c
> >+++ b/hw/net/vhost_net.c
> >@@ -53,6 +53,7 @@ static const int kernel_feature_bits[] = {
> >      VIRTIO_F_VERSION_1,
> >      VIRTIO_NET_F_MTU,
> >      VIRTIO_F_IOMMU_PLATFORM,
> >+    VIRTIO_F_RING_PACKED,
> >      VHOST_INVALID_FEATURE_BIT
> >  };
> >@@ -78,6 +79,7 @@ static const int user_feature_bits[] = {
> >      VIRTIO_NET_F_MRG_RXBUF,
> >      VIRTIO_NET_F_MTU,
> >      VIRTIO_F_IOMMU_PLATFORM,
> >+    VIRTIO_F_RING_PACKED,
> >      /* This bit implies RARP isn't sent by QEMU out of band */
> >      VIRTIO_NET_F_GUEST_ANNOUNCE,
> >diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> >index 9c1fa07..2eb27d2 100644
> >--- a/include/hw/virtio/virtio.h
> >+++ b/include/hw/virtio/virtio.h
> >@@ -264,7 +264,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
> >      DEFINE_PROP_BIT64("any_layout", _state, _field, \
> >                        VIRTIO_F_ANY_LAYOUT, true), \
> >      DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
> >-                      VIRTIO_F_IOMMU_PLATFORM, false)
> >+                      VIRTIO_F_IOMMU_PLATFORM, false), \
> >+    DEFINE_PROP_BIT64("ring_packed", _state, _field, \
> >+                      VIRTIO_F_RING_PACKED, true)
> >  hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
> >  hwaddr virtio_queue_get_avail_addr(VirtIODevice *vdev, int n);
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 07/11] virtio: fill/flush/pop for packed ring
  2019-02-19  9:33           ` Jason Wang
@ 2019-02-19 11:34             ` Wei Xu
  0 siblings, 0 replies; 41+ messages in thread
From: Wei Xu @ 2019-02-19 11:34 UTC (permalink / raw)
  To: Jason Wang; +Cc: jfreiman, maxime.coquelin, qemu-devel, tiwei.bie, mst

On Tue, Feb 19, 2019 at 05:33:57PM +0800, Jason Wang wrote:
> 
> On 2019/2/19 下午4:21, Wei Xu wrote:
> >On Tue, Feb 19, 2019 at 02:49:42PM +0800, Jason Wang wrote:
> >>On 2019/2/18 下午10:46, Wei Xu wrote:
> >>>>Do we allow chain more descriptors than vq size in the case of indirect?
> >>>>According to the spec:
> >>>>
> >>>>"
> >>>>
> >>>>The device limits the number of descriptors in a list through a
> >>>>transport-specific and/or device-specific value. If not limited,
> >>>>the maximum number of descriptors in a list is the virt queue
> >>>>size.
> >>>>"
> >>>>
> >>>>This looks possible, so the above is probably wrong if the max number of
> >>>>chained descriptors is negotiated through a device specific way.
> >>>>
> >>>OK, I will remove this check, while it is necessary to have a limitation
> >>>for indirect descriptor anyway, otherwise it is possible to hit an overflow
> >>>since presumably u16 is used for most number/size in the spec.
> >>>
> >>Please try to test block and scsi device for you changes as well.
> >Any idea about what kind of test should be covered? AFAICT, currently
> >packed ring is targeted for virtio-net as discussed during voting spec.
> >
> >Wei
> 
> 
> Well it's not only for net for sure, it should support all kinds of device.
> For testing, you should test basic function plus migration.

For sure we will support all the other devices, can we make it for
virtio-net device first and then move on to other devices?

Also, can anybody give me a CLI example for block and scsi devices?
I will give it a quick shot.

Wei

> 
> Thanks
> 
> 
> >
> >>Thanks
> >>
> >>
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/11] virtio: event suppression support for packed ring
  2019-02-19 10:40     ` Wei Xu
@ 2019-02-19 13:06       ` Jason Wang
  2019-02-20  2:17         ` Wei Xu
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2019-02-19 13:06 UTC (permalink / raw)
  To: Wei Xu; +Cc: qemu-devel, maxime.coquelin, tiwei.bie, jfreiman, mst


On 2019/2/19 下午6:40, Wei Xu wrote:
> On Tue, Feb 19, 2019 at 03:19:58PM +0800, Jason Wang wrote:
>> On 2019/2/14 下午12:26, wexu@redhat.com wrote:
>>> From: Wei Xu <wexu@redhat.com>
>>>
>>> Difference between 'avail_wrap_counter' and 'last_avail_wrap_counter':
>>> For Tx(guest transmitting), they are the same after each pop of a desc.
>>>
>>> For Rx(guest receiving), they are also the same when there are enough
>>> descriptors to carry the payload for a packet(e.g. usually 16 descs are
>>> needed for a 64k packet in typical iperf tcp connection with tso enabled),
>>> however, when the ring is running out of descriptors while there are
>>> still a few free ones, e.g. 6 descriptors are available which is not
>>> enough to carry an entire packet which needs 16 descriptors, in this
>>> case the 'avail_wrap_counter' should be set as the first one pending
>>> being handled by guest driver in order to get a notification, and the
>>> 'last_avail_wrap_counter' should stay unchanged to the head of available
>>> descriptors, like below:
>>>
>>> Mark meaning:
>>>      | | -- available
>>>      |*| -- used
>>>
>>> A Snapshot of the queue:
>>>                                        last_avail_idx = 253
>>>                                        last_avail_wrap_counter = 1
>>>                                               |
>>>      +---------------------------------------------+
>>>   0  | | | |*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*| | | | 255
>>>      +---------------------------------------------+
>>>             |
>>>    shadow_avail_idx = 3
>>>    avail_wrap_counter = 0
>>
>> Well this might not be the good place to describe the difference between
>> shadow_avail_idx and last_avail_idx. And the comments above their definition
>> looks good enough?
> Sorry, I do not get it, can you elaborate more?


I meant the comment is good enough to explain what it did. For packed 
ring, the only difference is the wrap counter. You can add e.g "The wrap 
counter of next head to pop" to above last_avail_wrap_counter. And so 
does shadow_avail_wrap_counter.


>
> This is one of the buggy parts of packed ring, it is good to make it clear here.
>
>>      /* Next head to pop */
>>      uint16_t last_avail_idx;
>>
>>      /* Last avail_idx read from VQ. */
>>      uint16_t shadow_avail_idx;
>>
> What is the meaning of these comment?


It's pretty easy to be understood, isn't it?


>   Do you mean I should replace put them
> to the comments also? thanks.
>
>> Instead, how or why need event suppress is not mentioned ...
> Yes, but presumably this knowledge has been acquired from reading through the
> spec, so I skipped this part.


You need at least add reference to the spec. Commit log is pretty 
important for starters to understand what has been done in the patch 
before reading the code. I'm pretty sure they will get confused for 
reading what you wrote here.


Thanks


>
> Wei
>
>>
>>
>>> Signed-off-by: Wei Xu <wexu@redhat.com>
>>> ---
>>>   hw/virtio/virtio.c | 137 +++++++++++++++++++++++++++++++++++++++++++++++++----
>>>   1 file changed, 128 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>>> index 7e276b4..8cfc7b6 100644
>>> --- a/hw/virtio/virtio.c
>>> +++ b/hw/virtio/virtio.c
>>> @@ -234,6 +234,34 @@ static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
>>>       virtio_tswap16s(vdev, &desc->next);
>>>   }
>>> +static void vring_packed_event_read(VirtIODevice *vdev,
>>> +                            MemoryRegionCache *cache, VRingPackedDescEvent *e)
>>> +{
>>> +    address_space_read_cached(cache, 0, e, sizeof(*e));
>>> +    virtio_tswap16s(vdev, &e->off_wrap);
>>> +    virtio_tswap16s(vdev, &e->flags);
>>> +}
>>> +
>>> +static void vring_packed_off_wrap_write(VirtIODevice *vdev,
>>> +                            MemoryRegionCache *cache, uint16_t off_wrap)
>>> +{
>>> +    virtio_tswap16s(vdev, &off_wrap);
>>> +    address_space_write_cached(cache, offsetof(VRingPackedDescEvent, off_wrap),
>>> +                            &off_wrap, sizeof(off_wrap));
>>> +    address_space_cache_invalidate(cache,
>>> +                offsetof(VRingPackedDescEvent, off_wrap), sizeof(off_wrap));
>>> +}
>>> +
>>> +static void vring_packed_flags_write(VirtIODevice *vdev,
>>> +                            MemoryRegionCache *cache, uint16_t flags)
>>> +{
>>> +    virtio_tswap16s(vdev, &flags);
>>> +    address_space_write_cached(cache, offsetof(VRingPackedDescEvent, flags),
>>> +                            &flags, sizeof(flags));
>>> +    address_space_cache_invalidate(cache,
>>> +                        offsetof(VRingPackedDescEvent, flags), sizeof(flags));
>>> +}
>>> +
>>>   static VRingMemoryRegionCaches *vring_get_region_caches(struct VirtQueue *vq)
>>>   {
>>>       VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
>>> @@ -340,14 +368,8 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
>>>       address_space_cache_invalidate(&caches->used, pa, sizeof(val));
>>>   }
>>> -void virtio_queue_set_notification(VirtQueue *vq, int enable)
>>> +static void virtio_queue_set_notification_split(VirtQueue *vq, int enable)
>>>   {
>>> -    vq->notification = enable;
>>> -
>>> -    if (!vq->vring.desc) {
>>> -        return;
>>> -    }
>>> -
>>>       rcu_read_lock();
>>>       if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
>>>           vring_set_avail_event(vq, vring_avail_idx(vq));
>>> @@ -363,6 +385,57 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable)
>>>       rcu_read_unlock();
>>>   }
>>> +static void virtio_queue_set_notification_packed(VirtQueue *vq, int enable)
>>> +{
>>> +    VRingPackedDescEvent e;
>>> +    VRingMemoryRegionCaches *caches;
>>> +
>>> +    rcu_read_lock();
>>> +    caches  = vring_get_region_caches(vq);
>>> +    vring_packed_event_read(vq->vdev, &caches->used, &e);
>>> +
>>> +    if (!enable) {
>>> +        if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
>>> +            /* no need to write device area since this is outdated. */
>>
>> We should advertise off and wrap in this case as well, otherwise we may get
>> notifications earlier than expected.
> Is it necessary? Supposing offset & wrap_counter are always set before update
> notification flags, it is reliable to skip this for disabling, isn't it?


You should either:

- advertise the EVENT_FLAG_DISABLE

or

- advertise the new off wrap otherwise you may get notified early.

Both you none of above did by your code.


>
> While the logic here is unclear, I did a concise one like below
> which doesn't use the 'vq->notification' as in your comment for v2,
> I think this should work for packed ring as well, anything I missed?
>
>      if (!enable) {
>          e.flags = VRING_PACKED_EVENT_FLAG_DISABLE;
>      } else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
>          uint16_t off_wrap;
>
>          off_wrap = vq->shadow_avail_idx | vq->shadow_avail_wrap_counter << 15;
>          vring_packed_off_wrap_write(vq->vdev, &caches->used, off_wrap);
>
>          /* Make sure off_wrap is wrote before flags */
>          smp_wmb();
>          e.flags = VRING_PACKED_EVENT_FLAG_DESC;
>      } else {
>          e.flags = VRING_PACKED_EVENT_FLAG_ENABLE;
>      }
>
>      vring_packed_flags_write(vq->vdev, &caches->used, e.flags);


Looks good to me.

Thanks


>
>>
>>> +            goto out;
>>> +        }
>>> +
>>> +        e.flags = VRING_PACKED_EVENT_FLAG_DISABLE;
>>> +        goto update;
>>> +    }
>>> +
>>> +    e.flags = VRING_PACKED_EVENT_FLAG_ENABLE;
>>
>> Here and the above goto could be eliminated by:
>>
>> if (even idx) {
>>
>> ...
>>
>> } else if (enable) {
>>
>> ...
>>
>> } else {
>>
>> ...
>>
>> }
>>
> thanks, I have removed it in above snippet.
>
> Wei
>> Thanks
>>
>>
>>> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
>>> +        uint16_t off_wrap = vq->shadow_avail_idx | vq->avail_wrap_counter << 15;
>>> +
>>> +        vring_packed_off_wrap_write(vq->vdev, &caches->used, off_wrap);
>>> +        /* Make sure off_wrap is wrote before flags */
>>> +        smp_wmb();
>>> +
>>> +        e.flags = VRING_PACKED_EVENT_FLAG_DESC;
>>> +    }
>>> +
>>> +update:
>>> +    vring_packed_flags_write(vq->vdev, &caches->used, e.flags);
>>> +out:
>>> +    rcu_read_unlock();
>>> +}
>>> +
>>> +void virtio_queue_set_notification(VirtQueue *vq, int enable)
>>> +{
>>> +    vq->notification = enable;
>>> +
>>> +    if (!vq->vring.desc) {
>>> +        return;
>>> +    }
>>> +
>>> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
>>> +        virtio_queue_set_notification_packed(vq, enable);
>>> +    } else {
>>> +        virtio_queue_set_notification_split(vq, enable);
>>> +    }
>>> +}
>>> +
>>>   int virtio_queue_ready(VirtQueue *vq)
>>>   {
>>>       return vq->vring.avail != 0;
>>> @@ -2117,8 +2190,7 @@ static void virtio_set_isr(VirtIODevice *vdev, int value)
>>>       }
>>>   }
>>> -/* Called within rcu_read_lock().  */
>>> -static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
>>> +static bool virtio_split_should_notify(VirtIODevice *vdev, VirtQueue *vq)
>>>   {
>>>       uint16_t old, new;
>>>       bool v;
>>> @@ -2141,6 +2213,53 @@ static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
>>>       return !v || vring_need_event(vring_get_used_event(vq), new, old);
>>>   }
>>> +static bool vring_packed_need_event(VirtQueue *vq, bool wrap,
>>> +                            uint16_t off_wrap, uint16_t new, uint16_t old)
>>> +{
>>> +    int off = off_wrap & ~(1 << 15);
>>> +
>>> +    if (wrap != off_wrap >> 15) {
>>> +        off -= vq->vring.num;
>>> +    }
>>> +
>>> +    return vring_need_event(off, new, old);
>>> +}
>>> +
>>> +static bool virtio_packed_should_notify(VirtIODevice *vdev, VirtQueue *vq)
>>> +{
>>> +    VRingPackedDescEvent e;
>>> +    uint16_t old, new;
>>> +    bool v;
>>> +    VRingMemoryRegionCaches *caches;
>>> +
>>> +    caches  = vring_get_region_caches(vq);
>>> +    vring_packed_event_read(vdev, &caches->avail, &e);
>>> +
>>> +    old = vq->signalled_used;
>>> +    new = vq->signalled_used = vq->used_idx;
>>> +    v = vq->signalled_used_valid;
>>> +    vq->signalled_used_valid = true;
>>> +
>>> +    if (e.flags == VRING_PACKED_EVENT_FLAG_DISABLE) {
>>> +        return false;
>>> +    } else if (e.flags == VRING_PACKED_EVENT_FLAG_ENABLE) {
>>> +        return true;
>>> +    }
>>> +
>>> +    return !v || vring_packed_need_event(vq,
>>> +        vq->used_wrap_counter, e.off_wrap, new, old);
>>> +}
>>> +
>>> +/* Called within rcu_read_lock().  */
>>> +static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
>>> +{
>>> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
>>> +        return virtio_packed_should_notify(vdev, vq);
>>> +    } else {
>>> +        return virtio_split_should_notify(vdev, vq);
>>> +    }
>>> +}
>>> +
>>>   void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
>>>   {
>>>       bool should_notify;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 09/11] virtio-net: update the head descriptor in a chain lastly
  2019-02-19 10:51     ` Wei Xu
@ 2019-02-19 13:09       ` Jason Wang
  2019-02-20  1:54         ` Wei Xu
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2019-02-19 13:09 UTC (permalink / raw)
  To: Wei Xu; +Cc: qemu-devel, maxime.coquelin, tiwei.bie, jfreiman, mst


On 2019/2/19 下午6:51, Wei Xu wrote:
> On Tue, Feb 19, 2019 at 03:23:01PM +0800, Jason Wang wrote:
>> On 2019/2/14 下午12:26, wexu@redhat.com wrote:
>>> From: Wei Xu <wexu@redhat.com>
>>>
>>> This is a helper for packed ring.
>>>
>>> To support packed ring, the head descriptor in a chain should be updated
>>> lastly since no 'avail_idx' like in packed ring to explicitly tell the
>>> driver side that all payload is ready after having done the chain, so
>>> the head is always visible immediately.
>>>
>>> This patch fills the header after done all the other ones.
>>>
>>> Signed-off-by: Wei Xu <wexu@redhat.com>
>>
>> It's really odd to workaround API issue in the implementation of device.
>> Please introduce batched used updating helpers instead.
> Can you elaborate a bit more? I don't get it as well.
>
> The exact batch as vhost-net or dpdk pmd is not supported by userspace
> backend. The change here is to keep the header descriptor updated at
> last in case of a chaining descriptors and the helper might not help
> too much.
>
> Wei


Of course we can add batching support why not?

Your code assumes the device know the virtio layout specific assumption 
which breaks the layer. Device should not care about the actual layout.

Thanks


>> Thanks
>>
>>
>>> ---
>>>   hw/net/virtio-net.c | 11 ++++++++++-
>>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>> index 3f319ef..330abea 100644
>>> --- a/hw/net/virtio-net.c
>>> +++ b/hw/net/virtio-net.c
>>> @@ -1251,6 +1251,8 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
>>>       struct virtio_net_hdr_mrg_rxbuf mhdr;
>>>       unsigned mhdr_cnt = 0;
>>>       size_t offset, i, guest_offset;
>>> +    VirtQueueElement head;
>>> +    int head_len = 0;
>>>       if (!virtio_net_can_receive(nc)) {
>>>           return -1;
>>> @@ -1328,7 +1330,13 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
>>>           }
>>>           /* signal other side */
>>> -        virtqueue_fill(q->rx_vq, elem, total, i++);
>>> +        if (i == 0) {
>>> +            head_len = total;
>>> +            head = *elem;
>>> +        } else {
>>> +            virtqueue_fill(q->rx_vq, elem, len, i);
>>> +        }
>>> +        i++;
>>>           g_free(elem);
>>>       }
>>> @@ -1339,6 +1347,7 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
>>>                        &mhdr.num_buffers, sizeof mhdr.num_buffers);
>>>       }
>>> +    virtqueue_fill(q->rx_vq, &head, head_len, 0);
>>>       virtqueue_flush(q->rx_vq, i);
>>>       virtio_notify(vdev, q->rx_vq);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 10/11] virtio: migration support for packed ring
  2019-02-19 11:00     ` Wei Xu
@ 2019-02-19 13:12       ` Jason Wang
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Wang @ 2019-02-19 13:12 UTC (permalink / raw)
  To: Wei Xu; +Cc: qemu-devel, tiwei.bie, mst, jfreiman, maxime.coquelin


On 2019/2/19 下午7:00, Wei Xu wrote:
> On Tue, Feb 19, 2019 at 03:30:41PM +0800, Jason Wang wrote:
>> On 2019/2/14 下午12:26,wexu@redhat.com  wrote:
>>> From: Wei Xu<wexu@redhat.com>
>>>
>>> Both userspace and vhost-net/user are supported with this patch.
>>>
>>> A new subsection is introduced for packed ring, only 'last_avail_idx'
>>> and 'last_avail_wrap_counter' are saved/loaded presumably based on
>>> all the others relevant data(inuse, used/avail index and wrap count
>>> should be the same.
>> This is probably only true for net device, see comment in virtio_load():
>>
>>              /*
>>               * Some devices migrate VirtQueueElements that have been popped
>>               * from the avail ring but not yet returned to the used ring.
>>               * Since max ring size < UINT16_MAX it's safe to use modulo
>>               * UINT16_MAX + 1 subtraction.
>>               */
>>              vdev->vq[i].inuse = (uint16_t)(vdev->vq[i].last_avail_idx -
>>                                  vdev->vq[i].used_idx);
>>
>>
>> So you need to migrate used_idx and used_wrap_counter since we don't have
>> used idx.
> This is trying to align with vhost-net/user as we discussed, since all we
> have done is to support  virtio-net device for packed ring,


Well, what you did probably work for virtio-net. But again, virtio-net 
is not the only device we want to support.


>   maybe we can
> consider supporting other devices after we have got it verified.
>

Please don't and fix the issue, packed virtqueue is a generic layout 
changes that should work for all kinds of virtio devices. It should be 
specially optimized for net, but it doesn't mean it was only designed 
for net.

Thanks

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 11/11] virtio: CLI and provide packed ring feature bit by default
  2019-02-19 11:23     ` Wei Xu
@ 2019-02-19 13:33       ` Jason Wang
  2019-02-20  0:46         ` Wei Xu
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2019-02-19 13:33 UTC (permalink / raw)
  To: Wei Xu; +Cc: qemu-devel, maxime.coquelin, jfreiman, tiwei.bie, mst


On 2019/2/19 下午7:23, Wei Xu wrote:
> On Tue, Feb 19, 2019 at 03:32:19PM +0800, Jason Wang wrote:
>> On 2019/2/14 下午12:26,wexu@redhat.com  wrote:
>>> From: Wei Xu<wexu@redhat.com>
>>>
>>> Add userspace and vhost kernel/user support.
>>>
>>> Add CLI "ring_packed=true/false" to enable/disable packed ring provision.
>>> Usage:
>>>      -device virtio-net-pci,netdev=xx,mac=xx:xx:xx:xx:xx:xx,ring_packed=false
>>>
>>> By default it is provided.
>> Please compat this for old machine types.
> It is provided by default, how to make it compatible for old machine types?
> Hide or provide it?
>
> Wei
>

Take a look at e.g how pc_compat_3_1 and hw_compat_3_1 was used.

Thanks

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 11/11] virtio: CLI and provide packed ring feature bit by default
  2019-02-19 13:33       ` Jason Wang
@ 2019-02-20  0:46         ` Wei Xu
  0 siblings, 0 replies; 41+ messages in thread
From: Wei Xu @ 2019-02-20  0:46 UTC (permalink / raw)
  To: Jason Wang; +Cc: tiwei.bie, maxime.coquelin, qemu-devel, jfreiman, mst

On Tue, Feb 19, 2019 at 09:33:40PM +0800, Jason Wang wrote:
> 
> On 2019/2/19 下午7:23, Wei Xu wrote:
> >On Tue, Feb 19, 2019 at 03:32:19PM +0800, Jason Wang wrote:
> >>On 2019/2/14 下午12:26,wexu@redhat.com  wrote:
> >>>From: Wei Xu<wexu@redhat.com>
> >>>
> >>>Add userspace and vhost kernel/user support.
> >>>
> >>>Add CLI "ring_packed=true/false" to enable/disable packed ring provision.
> >>>Usage:
> >>>     -device virtio-net-pci,netdev=xx,mac=xx:xx:xx:xx:xx:xx,ring_packed=false
> >>>
> >>>By default it is provided.
> >>Please compat this for old machine types.
> >It is provided by default, how to make it compatible for old machine types?
> >Hide or provide it?
> >
> >Wei
> >
> 
> Take a look at e.g how pc_compat_3_1 and hw_compat_3_1 was used.

OK, thanks.

Wei
> 
> Thanks
> 
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 09/11] virtio-net: update the head descriptor in a chain lastly
  2019-02-19 13:09       ` Jason Wang
@ 2019-02-20  1:54         ` Wei Xu
  2019-02-20  2:34           ` Jason Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Wei Xu @ 2019-02-20  1:54 UTC (permalink / raw)
  To: Jason Wang; +Cc: jfreiman, maxime.coquelin, qemu-devel, tiwei.bie, mst

On Tue, Feb 19, 2019 at 09:09:33PM +0800, Jason Wang wrote:
> 
> On 2019/2/19 下午6:51, Wei Xu wrote:
> >On Tue, Feb 19, 2019 at 03:23:01PM +0800, Jason Wang wrote:
> >>On 2019/2/14 下午12:26, wexu@redhat.com wrote:
> >>>From: Wei Xu <wexu@redhat.com>
> >>>
> >>>This is a helper for packed ring.
> >>>
> >>>To support packed ring, the head descriptor in a chain should be updated
> >>>lastly since no 'avail_idx' like in packed ring to explicitly tell the
> >>>driver side that all payload is ready after having done the chain, so
> >>>the head is always visible immediately.
> >>>
> >>>This patch fills the header after done all the other ones.
> >>>
> >>>Signed-off-by: Wei Xu <wexu@redhat.com>
> >>
> >>It's really odd to workaround API issue in the implementation of device.
> >>Please introduce batched used updating helpers instead.
> >Can you elaborate a bit more? I don't get it as well.
> >
> >The exact batch as vhost-net or dpdk pmd is not supported by userspace
> >backend. The change here is to keep the header descriptor updated at
> >last in case of a chaining descriptors and the helper might not help
> >too much.
> >
> >Wei
> 
> 
> Of course we can add batching support why not?

It is always good to improve performance with anything, while probably
this could be done in another separate batch, also we need to bear
in mind that usually qemu userspace backend is not the first option for
performance oriented user.

AFAICT, virtqueue_fill() is a generic API for all relevant userspace virtio
devices that do not support batching , without touching virtqueue_fill(),
supporting batching changes the meaning of the parameter 'idx' which should 
be kept overall.

To fix it, I got two proposals so far:
1). batching support(two APIs needed to keep compatibility)
2). save a head elem for a vq instead of caching an array of elems like vhost,
    and introduce a new API(virtqueue_chain_fill()) functioning with an
    additional parameter 'more' to the current virtqueue_fill() to indicate if
    there are more descriptor(s) coming in a chain.

Either way it changes the API somehow and it does not seem to be clean and clear
as wanted.

Any better idea?

> 
> Your code assumes the device know the virtio layout specific assumption
> whih breaks the layer. Device should not care about the actual layout.
>

Good point, but anyway, change to virtio-net receiving code path is
unavoidable to support split and packed rings, and batching is like a new
feature somehow.

Wei
 
> Thanks
> 
> 
> >>Thanks
> >>
> >>
> >>>---
> >>>  hw/net/virtio-net.c | 11 ++++++++++-
> >>>  1 file changed, 10 insertions(+), 1 deletion(-)
> >>>
> >>>diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >>>index 3f319ef..330abea 100644
> >>>--- a/hw/net/virtio-net.c
> >>>+++ b/hw/net/virtio-net.c
> >>>@@ -1251,6 +1251,8 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
> >>>      struct virtio_net_hdr_mrg_rxbuf mhdr;
> >>>      unsigned mhdr_cnt = 0;
> >>>      size_t offset, i, guest_offset;
> >>>+    VirtQueueElement head;
> >>>+    int head_len = 0;
> >>>      if (!virtio_net_can_receive(nc)) {
> >>>          return -1;
> >>>@@ -1328,7 +1330,13 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
> >>>          }
> >>>          /* signal other side */
> >>>-        virtqueue_fill(q->rx_vq, elem, total, i++);
> >>>+        if (i == 0) {
> >>>+            head_len = total;
> >>>+            head = *elem;
> >>>+        } else {
> >>>+            virtqueue_fill(q->rx_vq, elem, len, i);
> >>>+        }
> >>>+        i++;
> >>>          g_free(elem);
> >>>      }
> >>>@@ -1339,6 +1347,7 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
> >>>                       &mhdr.num_buffers, sizeof mhdr.num_buffers);
> >>>      }
> >>>+    virtqueue_fill(q->rx_vq, &head, head_len, 0);
> >>>      virtqueue_flush(q->rx_vq, i);
> >>>      virtio_notify(vdev, q->rx_vq);
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/11] virtio: event suppression support for packed ring
  2019-02-19 13:06       ` Jason Wang
@ 2019-02-20  2:17         ` Wei Xu
  0 siblings, 0 replies; 41+ messages in thread
From: Wei Xu @ 2019-02-20  2:17 UTC (permalink / raw)
  To: Jason Wang; +Cc: qemu-devel, maxime.coquelin, tiwei.bie, jfreiman, mst

On Tue, Feb 19, 2019 at 09:06:42PM +0800, Jason Wang wrote:
> 
> On 2019/2/19 下午6:40, Wei Xu wrote:
> >On Tue, Feb 19, 2019 at 03:19:58PM +0800, Jason Wang wrote:
> >>On 2019/2/14 下午12:26, wexu@redhat.com wrote:
> >>>From: Wei Xu <wexu@redhat.com>
> >>>
> >>>Difference between 'avail_wrap_counter' and 'last_avail_wrap_counter':
> >>>For Tx(guest transmitting), they are the same after each pop of a desc.
> >>>
> >>>For Rx(guest receiving), they are also the same when there are enough
> >>>descriptors to carry the payload for a packet(e.g. usually 16 descs are
> >>>needed for a 64k packet in typical iperf tcp connection with tso enabled),
> >>>however, when the ring is running out of descriptors while there are
> >>>still a few free ones, e.g. 6 descriptors are available which is not
> >>>enough to carry an entire packet which needs 16 descriptors, in this
> >>>case the 'avail_wrap_counter' should be set as the first one pending
> >>>being handled by guest driver in order to get a notification, and the
> >>>'last_avail_wrap_counter' should stay unchanged to the head of available
> >>>descriptors, like below:
> >>>
> >>>Mark meaning:
> >>>     | | -- available
> >>>     |*| -- used
> >>>
> >>>A Snapshot of the queue:
> >>>                                       last_avail_idx = 253
> >>>                                       last_avail_wrap_counter = 1
> >>>                                              |
> >>>     +---------------------------------------------+
> >>>  0  | | | |*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*| | | | 255
> >>>     +---------------------------------------------+
> >>>            |
> >>>   shadow_avail_idx = 3
> >>>   avail_wrap_counter = 0
> >>
> >>Well this might not be the good place to describe the difference between
> >>shadow_avail_idx and last_avail_idx. And the comments above their definition
> >>looks good enough?
> >Sorry, I do not get it, can you elaborate more?
> 
> 
> I meant the comment is good enough to explain what it did. For packed ring,
> the only difference is the wrap counter. You can add e.g "The wrap counter
> of next head to pop" to above last_avail_wrap_counter. And so does
> shadow_avail_wrap_counter.

OK, I will remove the example part.

> 
> 
> >
> >This is one of the buggy parts of packed ring, it is good to make it clear here.
> >
> >>     /* Next head to pop */
> >>     uint16_t last_avail_idx;
> >>
> >>     /* Last avail_idx read from VQ. */
> >>     uint16_t shadow_avail_idx;
> >>
> >What is the meaning of these comment?
> 
> 
> It's pretty easy to be understood, isn't it?

:)

> 
> 
> >  Do you mean I should replace put them
> >to the comments also? thanks.
> >
> >>Instead, how or why need event suppress is not mentioned ...
> >Yes, but presumably this knowledge has been acquired from reading through the
> >spec, so I skipped this part.
> 
> 
> You need at least add reference to the spec. Commit log is pretty important
> for starters to understand what has been done in the patch before reading
> the code. I'm pretty sure they will get confused for reading what you wrote
> here.
> 

OK.

> 
> Thanks
> 
> 
> >
> >Wei
> >
> >>
> >>
> >>>Signed-off-by: Wei Xu <wexu@redhat.com>
> >>>---
> >>>  hw/virtio/virtio.c | 137 +++++++++++++++++++++++++++++++++++++++++++++++++----
> >>>  1 file changed, 128 insertions(+), 9 deletions(-)
> >>>
> >>>diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >>>index 7e276b4..8cfc7b6 100644
> >>>--- a/hw/virtio/virtio.c
> >>>+++ b/hw/virtio/virtio.c
> >>>@@ -234,6 +234,34 @@ static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
> >>>      virtio_tswap16s(vdev, &desc->next);
> >>>  }
> >>>+static void vring_packed_event_read(VirtIODevice *vdev,
> >>>+                            MemoryRegionCache *cache, VRingPackedDescEvent *e)
> >>>+{
> >>>+    address_space_read_cached(cache, 0, e, sizeof(*e));
> >>>+    virtio_tswap16s(vdev, &e->off_wrap);
> >>>+    virtio_tswap16s(vdev, &e->flags);
> >>>+}
> >>>+
> >>>+static void vring_packed_off_wrap_write(VirtIODevice *vdev,
> >>>+                            MemoryRegionCache *cache, uint16_t off_wrap)
> >>>+{
> >>>+    virtio_tswap16s(vdev, &off_wrap);
> >>>+    address_space_write_cached(cache, offsetof(VRingPackedDescEvent, off_wrap),
> >>>+                            &off_wrap, sizeof(off_wrap));
> >>>+    address_space_cache_invalidate(cache,
> >>>+                offsetof(VRingPackedDescEvent, off_wrap), sizeof(off_wrap));
> >>>+}
> >>>+
> >>>+static void vring_packed_flags_write(VirtIODevice *vdev,
> >>>+                            MemoryRegionCache *cache, uint16_t flags)
> >>>+{
> >>>+    virtio_tswap16s(vdev, &flags);
> >>>+    address_space_write_cached(cache, offsetof(VRingPackedDescEvent, flags),
> >>>+                            &flags, sizeof(flags));
> >>>+    address_space_cache_invalidate(cache,
> >>>+                        offsetof(VRingPackedDescEvent, flags), sizeof(flags));
> >>>+}
> >>>+
> >>>  static VRingMemoryRegionCaches *vring_get_region_caches(struct VirtQueue *vq)
> >>>  {
> >>>      VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
> >>>@@ -340,14 +368,8 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
> >>>      address_space_cache_invalidate(&caches->used, pa, sizeof(val));
> >>>  }
> >>>-void virtio_queue_set_notification(VirtQueue *vq, int enable)
> >>>+static void virtio_queue_set_notification_split(VirtQueue *vq, int enable)
> >>>  {
> >>>-    vq->notification = enable;
> >>>-
> >>>-    if (!vq->vring.desc) {
> >>>-        return;
> >>>-    }
> >>>-
> >>>      rcu_read_lock();
> >>>      if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> >>>          vring_set_avail_event(vq, vring_avail_idx(vq));
> >>>@@ -363,6 +385,57 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable)
> >>>      rcu_read_unlock();
> >>>  }
> >>>+static void virtio_queue_set_notification_packed(VirtQueue *vq, int enable)
> >>>+{
> >>>+    VRingPackedDescEvent e;
> >>>+    VRingMemoryRegionCaches *caches;
> >>>+
> >>>+    rcu_read_lock();
> >>>+    caches  = vring_get_region_caches(vq);
> >>>+    vring_packed_event_read(vq->vdev, &caches->used, &e);
> >>>+
> >>>+    if (!enable) {
> >>>+        if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> >>>+            /* no need to write device area since this is outdated. */
> >>
> >>We should advertise off and wrap in this case as well, otherwise we may get
> >>notifications earlier than expected.
> >Is it necessary? Supposing offset & wrap_counter are always set before update
> >notification flags, it is reliable to skip this for disabling, isn't it?
> 
> 
> You should either:
> 
> - advertise the EVENT_FLAG_DISABLE
> 
> or
> 
> - advertise the new off wrap otherwise you may get notified early.
> 
> Both you none of above did by your code.

Right.

> 
> 
> >
> >While the logic here is unclear, I did a concise one like below
> >which doesn't use the 'vq->notification' as in your comment for v2,
> >I think this should work for packed ring as well, anything I missed?
> >
> >     if (!enable) {
> >         e.flags = VRING_PACKED_EVENT_FLAG_DISABLE;
> >     } else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> >         uint16_t off_wrap;
> >
> >         off_wrap = vq->shadow_avail_idx | vq->shadow_avail_wrap_counter << 15;
> >         vring_packed_off_wrap_write(vq->vdev, &caches->used, off_wrap);
> >
> >         /* Make sure off_wrap is wrote before flags */
> >         smp_wmb();
> >         e.flags = VRING_PACKED_EVENT_FLAG_DESC;
> >     } else {
> >         e.flags = VRING_PACKED_EVENT_FLAG_ENABLE;
> >     }
> >
> >     vring_packed_flags_write(vq->vdev, &caches->used, e.flags);
> 
> 
> Looks good to me.

Thanks.

> 
> Thanks
> 
> 
> >
> >>
> >>>+            goto out;
> >>>+        }
> >>>+
> >>>+        e.flags = VRING_PACKED_EVENT_FLAG_DISABLE;
> >>>+        goto update;
> >>>+    }
> >>>+
> >>>+    e.flags = VRING_PACKED_EVENT_FLAG_ENABLE;
> >>
> >>Here and the above goto could be eliminated by:
> >>
> >>if (even idx) {
> >>
> >>...
> >>
> >>} else if (enable) {
> >>
> >>...
> >>
> >>} else {
> >>
> >>...
> >>
> >>}
> >>
> >thanks, I have removed it in above snippet.
> >
> >Wei
> >>Thanks
> >>
> >>
> >>>+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> >>>+        uint16_t off_wrap = vq->shadow_avail_idx | vq->avail_wrap_counter << 15;
> >>>+
> >>>+        vring_packed_off_wrap_write(vq->vdev, &caches->used, off_wrap);
> >>>+        /* Make sure off_wrap is wrote before flags */
> >>>+        smp_wmb();
> >>>+
> >>>+        e.flags = VRING_PACKED_EVENT_FLAG_DESC;
> >>>+    }
> >>>+
> >>>+update:
> >>>+    vring_packed_flags_write(vq->vdev, &caches->used, e.flags);
> >>>+out:
> >>>+    rcu_read_unlock();
> >>>+}
> >>>+
> >>>+void virtio_queue_set_notification(VirtQueue *vq, int enable)
> >>>+{
> >>>+    vq->notification = enable;
> >>>+
> >>>+    if (!vq->vring.desc) {
> >>>+        return;
> >>>+    }
> >>>+
> >>>+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> >>>+        virtio_queue_set_notification_packed(vq, enable);
> >>>+    } else {
> >>>+        virtio_queue_set_notification_split(vq, enable);
> >>>+    }
> >>>+}
> >>>+
> >>>  int virtio_queue_ready(VirtQueue *vq)
> >>>  {
> >>>      return vq->vring.avail != 0;
> >>>@@ -2117,8 +2190,7 @@ static void virtio_set_isr(VirtIODevice *vdev, int value)
> >>>      }
> >>>  }
> >>>-/* Called within rcu_read_lock().  */
> >>>-static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
> >>>+static bool virtio_split_should_notify(VirtIODevice *vdev, VirtQueue *vq)
> >>>  {
> >>>      uint16_t old, new;
> >>>      bool v;
> >>>@@ -2141,6 +2213,53 @@ static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
> >>>      return !v || vring_need_event(vring_get_used_event(vq), new, old);
> >>>  }
> >>>+static bool vring_packed_need_event(VirtQueue *vq, bool wrap,
> >>>+                            uint16_t off_wrap, uint16_t new, uint16_t old)
> >>>+{
> >>>+    int off = off_wrap & ~(1 << 15);
> >>>+
> >>>+    if (wrap != off_wrap >> 15) {
> >>>+        off -= vq->vring.num;
> >>>+    }
> >>>+
> >>>+    return vring_need_event(off, new, old);
> >>>+}
> >>>+
> >>>+static bool virtio_packed_should_notify(VirtIODevice *vdev, VirtQueue *vq)
> >>>+{
> >>>+    VRingPackedDescEvent e;
> >>>+    uint16_t old, new;
> >>>+    bool v;
> >>>+    VRingMemoryRegionCaches *caches;
> >>>+
> >>>+    caches  = vring_get_region_caches(vq);
> >>>+    vring_packed_event_read(vdev, &caches->avail, &e);
> >>>+
> >>>+    old = vq->signalled_used;
> >>>+    new = vq->signalled_used = vq->used_idx;
> >>>+    v = vq->signalled_used_valid;
> >>>+    vq->signalled_used_valid = true;
> >>>+
> >>>+    if (e.flags == VRING_PACKED_EVENT_FLAG_DISABLE) {
> >>>+        return false;
> >>>+    } else if (e.flags == VRING_PACKED_EVENT_FLAG_ENABLE) {
> >>>+        return true;
> >>>+    }
> >>>+
> >>>+    return !v || vring_packed_need_event(vq,
> >>>+        vq->used_wrap_counter, e.off_wrap, new, old);
> >>>+}
> >>>+
> >>>+/* Called within rcu_read_lock().  */
> >>>+static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
> >>>+{
> >>>+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> >>>+        return virtio_packed_should_notify(vdev, vq);
> >>>+    } else {
> >>>+        return virtio_split_should_notify(vdev, vq);
> >>>+    }
> >>>+}
> >>>+
> >>>  void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
> >>>  {
> >>>      bool should_notify;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 09/11] virtio-net: update the head descriptor in a chain lastly
  2019-02-20  1:54         ` Wei Xu
@ 2019-02-20  2:34           ` Jason Wang
  2019-02-20  4:01             ` Wei Xu
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2019-02-20  2:34 UTC (permalink / raw)
  To: Wei Xu; +Cc: jfreiman, maxime.coquelin, qemu-devel, tiwei.bie, mst


On 2019/2/20 上午9:54, Wei Xu wrote:
> On Tue, Feb 19, 2019 at 09:09:33PM +0800, Jason Wang wrote:
>> On 2019/2/19 下午6:51, Wei Xu wrote:
>>> On Tue, Feb 19, 2019 at 03:23:01PM +0800, Jason Wang wrote:
>>>> On 2019/2/14 下午12:26, wexu@redhat.com wrote:
>>>>> From: Wei Xu <wexu@redhat.com>
>>>>>
>>>>> This is a helper for packed ring.
>>>>>
>>>>> To support packed ring, the head descriptor in a chain should be updated
>>>>> lastly since no 'avail_idx' like in packed ring to explicitly tell the
>>>>> driver side that all payload is ready after having done the chain, so
>>>>> the head is always visible immediately.
>>>>>
>>>>> This patch fills the header after done all the other ones.
>>>>>
>>>>> Signed-off-by: Wei Xu <wexu@redhat.com>
>>>> It's really odd to workaround API issue in the implementation of device.
>>>> Please introduce batched used updating helpers instead.
>>> Can you elaborate a bit more? I don't get it as well.
>>>
>>> The exact batch as vhost-net or dpdk pmd is not supported by userspace
>>> backend. The change here is to keep the header descriptor updated at
>>> last in case of a chaining descriptors and the helper might not help
>>> too much.
>>>
>>> Wei
>>
>> Of course we can add batching support why not?
> It is always good to improve performance with anything, while probably
> this could be done in another separate batch, also we need to bear
> in mind that usually qemu userspace backend is not the first option for
> performance oriented user.


The point is to hide layout specific things from device emulation. If it 
helps for performance, it could be treated as a good byproduct.


>
> AFAICT, virtqueue_fill() is a generic API for all relevant userspace virtio
> devices that do not support batching , without touching virtqueue_fill(),
> supporting batching changes the meaning of the parameter 'idx' which should
> be kept overall.
>
> To fix it, I got two proposals so far:
> 1). batching support(two APIs needed to keep compatibility)
> 2). save a head elem for a vq instead of caching an array of elems like vhost,
>      and introduce a new API(virtqueue_chain_fill()) functioning with an
>      additional parameter 'more' to the current virtqueue_fill() to indicate if
>      there are more descriptor(s) coming in a chain.
>
> Either way it changes the API somehow and it does not seem to be clean and clear
> as wanted.


It's as simple as accepting an array of elems in e.g 
virtqueue_fill_batched()?


>
> Any better idea?
>
>> Your code assumes the device know the virtio layout specific assumption
>> whih breaks the layer. Device should not care about the actual layout.
>>
> Good point, but anyway, change to virtio-net receiving code path is
> unavoidable to support split and packed rings, and batching is like a new
> feature somehow.


It's ok to change the code as a result of introducing of generic helper 
but it's bad to change to code for working around a bad API.

Thanks


>
> Wei
>   
>> Thanks
>>
>>
>>>> Thanks
>>>>
>>>>
>>>>> ---
>>>>>   hw/net/virtio-net.c | 11 ++++++++++-
>>>>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>>> index 3f319ef..330abea 100644
>>>>> --- a/hw/net/virtio-net.c
>>>>> +++ b/hw/net/virtio-net.c
>>>>> @@ -1251,6 +1251,8 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
>>>>>       struct virtio_net_hdr_mrg_rxbuf mhdr;
>>>>>       unsigned mhdr_cnt = 0;
>>>>>       size_t offset, i, guest_offset;
>>>>> +    VirtQueueElement head;
>>>>> +    int head_len = 0;
>>>>>       if (!virtio_net_can_receive(nc)) {
>>>>>           return -1;
>>>>> @@ -1328,7 +1330,13 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
>>>>>           }
>>>>>           /* signal other side */
>>>>> -        virtqueue_fill(q->rx_vq, elem, total, i++);
>>>>> +        if (i == 0) {
>>>>> +            head_len = total;
>>>>> +            head = *elem;
>>>>> +        } else {
>>>>> +            virtqueue_fill(q->rx_vq, elem, len, i);
>>>>> +        }
>>>>> +        i++;
>>>>>           g_free(elem);
>>>>>       }
>>>>> @@ -1339,6 +1347,7 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
>>>>>                        &mhdr.num_buffers, sizeof mhdr.num_buffers);
>>>>>       }
>>>>> +    virtqueue_fill(q->rx_vq, &head, head_len, 0);
>>>>>       virtqueue_flush(q->rx_vq, i);
>>>>>       virtio_notify(vdev, q->rx_vq);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 09/11] virtio-net: update the head descriptor in a chain lastly
  2019-02-20  2:34           ` Jason Wang
@ 2019-02-20  4:01             ` Wei Xu
  2019-02-20  7:53               ` Jason Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Wei Xu @ 2019-02-20  4:01 UTC (permalink / raw)
  To: Jason Wang; +Cc: tiwei.bie, maxime.coquelin, qemu-devel, jfreiman, mst

On Wed, Feb 20, 2019 at 10:34:32AM +0800, Jason Wang wrote:
> 
> On 2019/2/20 上午9:54, Wei Xu wrote:
> >On Tue, Feb 19, 2019 at 09:09:33PM +0800, Jason Wang wrote:
> >>On 2019/2/19 下午6:51, Wei Xu wrote:
> >>>On Tue, Feb 19, 2019 at 03:23:01PM +0800, Jason Wang wrote:
> >>>>On 2019/2/14 下午12:26, wexu@redhat.com wrote:
> >>>>>From: Wei Xu <wexu@redhat.com>
> >>>>>
> >>>>>This is a helper for packed ring.
> >>>>>
> >>>>>To support packed ring, the head descriptor in a chain should be updated
> >>>>>lastly since no 'avail_idx' like in packed ring to explicitly tell the
> >>>>>driver side that all payload is ready after having done the chain, so
> >>>>>the head is always visible immediately.
> >>>>>
> >>>>>This patch fills the header after done all the other ones.
> >>>>>
> >>>>>Signed-off-by: Wei Xu <wexu@redhat.com>
> >>>>It's really odd to workaround API issue in the implementation of device.
> >>>>Please introduce batched used updating helpers instead.
> >>>Can you elaborate a bit more? I don't get it as well.
> >>>
> >>>The exact batch as vhost-net or dpdk pmd is not supported by userspace
> >>>backend. The change here is to keep the header descriptor updated at
> >>>last in case of a chaining descriptors and the helper might not help
> >>>too much.
> >>>
> >>>Wei
> >>
> >>Of course we can add batching support why not?
> >It is always good to improve performance with anything, while probably
> >this could be done in another separate batch, also we need to bear
> >in mind that usually qemu userspace backend is not the first option for
> >performance oriented user.
> 
> 
> The point is to hide layout specific things from device emulation. If it
> helps for performance, it could be treated as a good byproduct.
> 
> 
> >
> >AFAICT, virtqueue_fill() is a generic API for all relevant userspace virtio
> >devices that do not support batching , without touching virtqueue_fill(),
> >supporting batching changes the meaning of the parameter 'idx' which should
> >be kept overall.
> >
> >To fix it, I got two proposals so far:
> >1). batching support(two APIs needed to keep compatibility)
> >2). save a head elem for a vq instead of caching an array of elems like vhost,
> >     and introduce a new API(virtqueue_chain_fill()) functioning with an
> >     additional parameter 'more' to the current virtqueue_fill() to indicate if
> >     there are more descriptor(s) coming in a chain.
> >
> >Either way it changes the API somehow and it does not seem to be clean and clear
> >as wanted.
> 
> 
> It's as simple as accepting an array of elems in e.g
> virtqueue_fill_batched()?

It is trivial for both, my concern is an array elements need to be allocated dynamically
due to vq size which no any other devices are using, a head would be enough for 2.

> 
> 
> >
> >Any better idea?
> >
> >>Your code assumes the device know the virtio layout specific assumption
> >>whih breaks the layer. Device should not care about the actual layout.
> >>
> >Good point, but anyway, change to virtio-net receiving code path is
> >unavoidable to support split and packed rings, and batching is like a new
> >feature somehow.
> 
> 
> It's ok to change the code as a result of introducing of generic helper but
> it's bad to change to code for working around a bad API.

Agree.

Wei

> 
> Thanks
> 
> 
> >
> >Wei
> >>Thanks
> >>
> >>
> >>>>Thanks
> >>>>
> >>>>
> >>>>>---
> >>>>>  hw/net/virtio-net.c | 11 ++++++++++-
> >>>>>  1 file changed, 10 insertions(+), 1 deletion(-)
> >>>>>
> >>>>>diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >>>>>index 3f319ef..330abea 100644
> >>>>>--- a/hw/net/virtio-net.c
> >>>>>+++ b/hw/net/virtio-net.c
> >>>>>@@ -1251,6 +1251,8 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
> >>>>>      struct virtio_net_hdr_mrg_rxbuf mhdr;
> >>>>>      unsigned mhdr_cnt = 0;
> >>>>>      size_t offset, i, guest_offset;
> >>>>>+    VirtQueueElement head;
> >>>>>+    int head_len = 0;
> >>>>>      if (!virtio_net_can_receive(nc)) {
> >>>>>          return -1;
> >>>>>@@ -1328,7 +1330,13 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
> >>>>>          }
> >>>>>          /* signal other side */
> >>>>>-        virtqueue_fill(q->rx_vq, elem, total, i++);
> >>>>>+        if (i == 0) {
> >>>>>+            head_len = total;
> >>>>>+            head = *elem;
> >>>>>+        } else {
> >>>>>+            virtqueue_fill(q->rx_vq, elem, len, i);
> >>>>>+        }
> >>>>>+        i++;
> >>>>>          g_free(elem);
> >>>>>      }
> >>>>>@@ -1339,6 +1347,7 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
> >>>>>                       &mhdr.num_buffers, sizeof mhdr.num_buffers);
> >>>>>      }
> >>>>>+    virtqueue_fill(q->rx_vq, &head, head_len, 0);
> >>>>>      virtqueue_flush(q->rx_vq, i);
> >>>>>      virtio_notify(vdev, q->rx_vq);
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v4 09/11] virtio-net: update the head descriptor in a chain lastly
  2019-02-20  4:01             ` Wei Xu
@ 2019-02-20  7:53               ` Jason Wang
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Wang @ 2019-02-20  7:53 UTC (permalink / raw)
  To: Wei Xu; +Cc: jfreiman, maxime.coquelin, qemu-devel, tiwei.bie, mst


On 2019/2/20 下午12:01, Wei Xu wrote:
>>
>>> AFAICT, virtqueue_fill() is a generic API for all relevant userspace virtio
>>> devices that do not support batching , without touching virtqueue_fill(),
>>> supporting batching changes the meaning of the parameter 'idx' which should
>>> be kept overall.
>>>
>>> To fix it, I got two proposals so far:
>>> 1). batching support(two APIs needed to keep compatibility)
>>> 2). save a head elem for a vq instead of caching an array of elems like vhost,
>>>      and introduce a new API(virtqueue_chain_fill()) functioning with an
>>>      additional parameter 'more' to the current virtqueue_fill() to indicate if
>>>      there are more descriptor(s) coming in a chain.
>>>
>>> Either way it changes the API somehow and it does not seem to be clean and clear
>>> as wanted.
>> It's as simple as accepting an array of elems in e.g
>> virtqueue_fill_batched()?
> It is trivial for both, my concern is an array elements need to be allocated dynamically
> due to vq size which no any other devices are using, a head would be enough for 2.
>

Do you see any issue for this?

Thanks

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2019-02-20  7:54 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-14  4:26 [Qemu-devel] [PATCH v4 00/11] packed ring virtio-net backends support wexu
2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 01/11] virtio: rename structure for packed ring wexu
2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 02/11] virtio: device/driver area size calculation helper for split ring wexu
2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 03/11] virtio: initialize packed ring region wexu
2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 04/11] virtio: initialize wrap counter for packed ring wexu
2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 05/11] virtio: queue/descriptor check helpers " wexu
2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 06/11] virtio: get avail bytes check " wexu
2019-02-18  7:27   ` Jason Wang
2019-02-18 17:07     ` Wei Xu
2019-02-19  6:24       ` Jason Wang
2019-02-19  8:24         ` Wei Xu
2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 07/11] virtio: fill/flush/pop " wexu
2019-02-18  7:51   ` Jason Wang
2019-02-18 14:46     ` Wei Xu
2019-02-19  6:49       ` Jason Wang
2019-02-19  8:21         ` Wei Xu
2019-02-19  9:33           ` Jason Wang
2019-02-19 11:34             ` Wei Xu
2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 08/11] virtio: event suppression support " wexu
2019-02-19  7:19   ` Jason Wang
2019-02-19 10:40     ` Wei Xu
2019-02-19 13:06       ` Jason Wang
2019-02-20  2:17         ` Wei Xu
2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 09/11] virtio-net: update the head descriptor in a chain lastly wexu
2019-02-19  7:23   ` Jason Wang
2019-02-19 10:51     ` Wei Xu
2019-02-19 13:09       ` Jason Wang
2019-02-20  1:54         ` Wei Xu
2019-02-20  2:34           ` Jason Wang
2019-02-20  4:01             ` Wei Xu
2019-02-20  7:53               ` Jason Wang
2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 10/11] virtio: migration support for packed ring wexu
2019-02-19  7:30   ` Jason Wang
2019-02-19 11:00     ` Wei Xu
2019-02-19 13:12       ` Jason Wang
2019-02-14  4:26 ` [Qemu-devel] [PATCH v4 11/11] virtio: CLI and provide packed ring feature bit by default wexu
2019-02-19  7:32   ` Jason Wang
2019-02-19 11:23     ` Wei Xu
2019-02-19 13:33       ` Jason Wang
2019-02-20  0:46         ` Wei Xu
2019-02-19  7:35 ` [Qemu-devel] [PATCH v4 00/11] packed ring virtio-net backends support Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.