All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/10] virtio/vring: optimization patches
@ 2016-01-15 12:41 Paolo Bonzini
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 01/10] virtio: move VirtQueueElement at the beginning of the structs Paolo Bonzini
                   ` (9 more replies)
  0 siblings, 10 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-01-15 12:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vincenzo Maffione, mst

This includes two optimization of virtio:

- "slimming down" VirtQueueElements by not including room for
  1024 buffers.  This makes malloc much faster.

- optimizations to limit the number of address_space_translate
  calls in virtio.c from Vincenzo and myself.

Thanks,

Paolo

Paolo Bonzini (7):
  virtio: move VirtQueueElement at the beginning of the structs
  virtio: move allocation to virtqueue_pop/vring_pop
  virtio: introduce qemu_get/put_virtqueue_element
  virtio: introduce virtqueue_alloc_element
  virtio: slim down allocation of VirtQueueElements
  vring: slim down allocation of VirtQueueElements
  virtio: combine the read of a descriptor

Vincenzo Maffione (3):
  virtio: cache used_idx in a VirtQueue field
  virtio: read avail_idx from VQ only when necessary
  virtio: combine write of an entry into used ring

 hw/9pfs/9p.c                        |   2 +-
 hw/9pfs/virtio-9p-device.c          |  16 +-
 hw/9pfs/virtio-9p.h                 |   2 +-
 hw/block/dataplane/virtio-blk.c     |  11 +-
 hw/block/virtio-blk.c               |  23 +--
 hw/char/virtio-serial-bus.c         |  78 +++++----
 hw/display/virtio-gpu.c             |  25 ++-
 hw/input/virtio-input.c             |  24 ++-
 hw/net/virtio-net.c                 |  69 +++++---
 hw/scsi/virtio-scsi-dataplane.c     |  15 +-
 hw/scsi/virtio-scsi.c               |  26 ++-
 hw/virtio/dataplane/vring.c         |  62 ++++---
 hw/virtio/virtio-balloon.c          |  22 ++-
 hw/virtio/virtio-rng.c              |  10 +-
 hw/virtio/virtio.c                  | 339 +++++++++++++++++++++++++-----------
 include/hw/virtio/dataplane/vring.h |   2 +-
 include/hw/virtio/virtio-balloon.h  |   2 +-
 include/hw/virtio/virtio-blk.h      |   5 +-
 include/hw/virtio/virtio-net.h      |   2 +-
 include/hw/virtio/virtio-scsi.h     |  15 +-
 include/hw/virtio/virtio-serial.h   |   2 +-
 include/hw/virtio/virtio.h          |  13 +-
 22 files changed, 484 insertions(+), 282 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 01/10] virtio: move VirtQueueElement at the beginning of the structs
  2016-01-15 12:41 [Qemu-devel] [PATCH 00/10] virtio/vring: optimization patches Paolo Bonzini
@ 2016-01-15 12:41 ` Paolo Bonzini
  2016-01-19 12:09   ` Cornelia Huck
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 02/10] virtio: move allocation to virtqueue_pop/vring_pop Paolo Bonzini
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2016-01-15 12:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

The next patch will make virtqueue_pop/vring_pop allocate memory for a
"subclass" of VirtQueueElement.  For this to work, VirtQueueElement
must be the first field in the containing struct.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/scsi/virtio-scsi.c           |  3 +--
 include/hw/virtio/virtio-blk.h  |  2 +-
 include/hw/virtio/virtio-scsi.h | 13 ++++++-------
 3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 3a4f520..1567da8 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -44,8 +44,7 @@ VirtIOSCSIReq *virtio_scsi_init_req(VirtIOSCSI *s, VirtQueue *vq)
 {
     VirtIOSCSIReq *req;
     VirtIOSCSICommon *vs = (VirtIOSCSICommon *)s;
-    const size_t zero_skip = offsetof(VirtIOSCSIReq, elem)
-                             + sizeof(VirtQueueElement);
+    const size_t zero_skip = offsetof(VirtIOSCSIReq, vring);
 
     req = g_malloc(sizeof(*req) + vs->cdb_size);
     req->vq = vq;
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index ae11a63..403ab86 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -60,9 +60,9 @@ typedef struct VirtIOBlock {
 } VirtIOBlock;
 
 typedef struct VirtIOBlockReq {
+    VirtQueueElement elem;
     int64_t sector_num;
     VirtIOBlock *dev;
-    VirtQueueElement elem;
     struct virtio_blk_inhdr *in;
     struct virtio_blk_outhdr out;
     QEMUIOVector qiov;
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 088fe9f..63f5b51 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -102,18 +102,17 @@ typedef struct VirtIOSCSI {
 } VirtIOSCSI;
 
 typedef struct VirtIOSCSIReq {
+    /* Note:
+     * - fields up to resp_iov are initialized by virtio_scsi_init_req;
+     * - fields starting at vring are zeroed by virtio_scsi_init_req.
+     * */
+    VirtQueueElement elem;
+
     VirtIOSCSI *dev;
     VirtQueue *vq;
     QEMUSGList qsgl;
     QEMUIOVector resp_iov;
 
-    /* Note:
-     * - fields before elem are initialized by virtio_scsi_init_req;
-     * - elem is uninitialized at the time of allocation.
-     * - fields after elem are zeroed by virtio_scsi_init_req.
-     * */
-
-    VirtQueueElement elem;
     /* Set by dataplane code. */
     VirtIOSCSIVring *vring;
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 02/10] virtio: move allocation to virtqueue_pop/vring_pop
  2016-01-15 12:41 [Qemu-devel] [PATCH 00/10] virtio/vring: optimization patches Paolo Bonzini
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 01/10] virtio: move VirtQueueElement at the beginning of the structs Paolo Bonzini
@ 2016-01-15 12:41 ` Paolo Bonzini
  2016-01-19 12:22   ` Cornelia Huck
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 03/10] virtio: introduce qemu_get/put_virtqueue_element Paolo Bonzini
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2016-01-15 12:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

The return code of virtqueue_pop/vring_pop is unused except to check for
errors or 0.  We can thus easily move allocation inside the functions
and just return a pointer to the VirtQueueElement.

The advantage is that we will be able to allocate only the space that
is needed for the actual size of the s/g list instead of the full
VIRTQUEUE_MAX_SIZE items.  Currently VirtQueueElement takes about 48K
of memory, and this kind of allocation puts a lot of stress on malloc.
By cutting the size by two or three orders of magnitude, malloc can
use much more efficient algorithms.

The patch is pretty large, but changes to each device are testable
more or less independently.  Splitting it would mostly add churn.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/9pfs/9p.c                        |  2 +-
 hw/9pfs/virtio-9p-device.c          | 16 ++++----
 hw/9pfs/virtio-9p.h                 |  2 +-
 hw/block/dataplane/virtio-blk.c     | 11 +++--
 hw/block/virtio-blk.c               | 15 +++----
 hw/char/virtio-serial-bus.c         | 80 +++++++++++++++++++++++--------------
 hw/display/virtio-gpu.c             | 25 +++++++-----
 hw/input/virtio-input.c             | 24 +++++++----
 hw/net/virtio-net.c                 | 69 ++++++++++++++++++++------------
 hw/scsi/virtio-scsi-dataplane.c     | 15 +++----
 hw/scsi/virtio-scsi.c               | 18 ++++-----
 hw/virtio/dataplane/vring.c         | 17 ++++----
 hw/virtio/virtio-balloon.c          | 22 ++++++----
 hw/virtio/virtio-rng.c              | 10 +++--
 hw/virtio/virtio.c                  | 11 +++--
 include/hw/virtio/dataplane/vring.h |  2 +-
 include/hw/virtio/virtio-balloon.h  |  2 +-
 include/hw/virtio/virtio-blk.h      |  3 +-
 include/hw/virtio/virtio-net.h      |  2 +-
 include/hw/virtio/virtio-scsi.h     |  2 +-
 include/hw/virtio/virtio-serial.h   |  2 +-
 include/hw/virtio/virtio.h          |  2 +-
 22 files changed, 209 insertions(+), 143 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 3ff3106..ad1ae96 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -1586,7 +1586,7 @@ static int v9fs_xattr_read(V9fsState *s, V9fsPDU *pdu, V9fsFidState *fidp,
     int read_count;
     int64_t xattr_len;
     V9fsVirtioState *v = container_of(s, V9fsVirtioState, state);
-    VirtQueueElement *elem = &v->elems[pdu->idx];
+    VirtQueueElement *elem = v->elems[pdu->idx];
 
     xattr_len = fidp->fs.xattr.len;
     read_count = xattr_len - off;
diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index 5643fd5..e85ec96 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -25,10 +25,12 @@ void virtio_9p_push_and_notify(V9fsPDU *pdu)
 {
     V9fsState *s = pdu->s;
     V9fsVirtioState *v = container_of(s, V9fsVirtioState, state);
-    VirtQueueElement *elem = &v->elems[pdu->idx];
+    VirtQueueElement *elem = v->elems[pdu->idx];
 
     /* push onto queue and notify */
     virtqueue_push(v->vq, elem, pdu->size);
+    g_free(elem);
+    v->elems[pdu->idx] = NULL;
 
     /* FIXME: we should batch these completions */
     virtio_notify(VIRTIO_DEVICE(v), v->vq);
@@ -47,10 +48,10 @@ static void handle_9p_output(VirtIODevice *vdev, VirtQueue *vq)
             uint8_t id;
             uint16_t tag_le;
         } QEMU_PACKED out;
-        VirtQueueElement *elem = &v->elems[pdu->idx];
+        VirtQueueElement *elem;
 
-        len = virtqueue_pop(vq, elem);
-        if (!len) {
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+        if (!elem) {
             pdu_free(pdu);
             break;
         }
@@ -58,6 +59,7 @@ static void handle_9p_output(VirtIODevice *vdev, VirtQueue *vq)
         BUG_ON(elem->out_num == 0 || elem->in_num == 0);
         QEMU_BUILD_BUG_ON(sizeof out != 7);
 
+        v->elems[pdu->idx] = elem;
         len = iov_to_buf(elem->out_sg, elem->out_num, 0,
                          &out, sizeof out);
         BUG_ON(len != sizeof out);
@@ -140,7 +142,7 @@ ssize_t virtio_pdu_vmarshal(V9fsPDU *pdu, size_t offset,
 {
     V9fsState *s = pdu->s;
     V9fsVirtioState *v = container_of(s, V9fsVirtioState, state);
-    VirtQueueElement *elem = &v->elems[pdu->idx];
+    VirtQueueElement *elem = v->elems[pdu->idx];
 
     return v9fs_iov_vmarshal(elem->in_sg, elem->in_num, offset, 1, fmt, ap);
 }
@@ -150,7 +152,7 @@ ssize_t virtio_pdu_vunmarshal(V9fsPDU *pdu, size_t offset,
 {
     V9fsState *s = pdu->s;
     V9fsVirtioState *v = container_of(s, V9fsVirtioState, state);
-    VirtQueueElement *elem = &v->elems[pdu->idx];
+    VirtQueueElement *elem = v->elems[pdu->idx];
 
     return v9fs_iov_vunmarshal(elem->out_sg, elem->out_num, offset, 1, fmt, ap);
 }
@@ -160,7 +162,7 @@ void virtio_init_iov_from_pdu(V9fsPDU *pdu, struct iovec **piov,
 {
     V9fsState *s = pdu->s;
     V9fsVirtioState *v = container_of(s, V9fsVirtioState, state);
-    VirtQueueElement *elem = &v->elems[pdu->idx];
+    VirtQueueElement *elem = v->elems[pdu->idx];
 
     if (is_write) {
         *piov = elem->out_sg;
diff --git a/hw/9pfs/virtio-9p.h b/hw/9pfs/virtio-9p.h
index 1cdf0a2..7f6d885 100644
--- a/hw/9pfs/virtio-9p.h
+++ b/hw/9pfs/virtio-9p.h
@@ -11,7 +11,7 @@ typedef struct V9fsVirtioState
     VirtQueue *vq;
     size_t config_size;
     V9fsPDU pdus[MAX_REQ];
-    VirtQueueElement elems[MAX_REQ];
+    VirtQueueElement *elems[MAX_REQ];
     V9fsState state;
 } V9fsVirtioState;
 
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index b8ce6cd..5ea28ae 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -97,20 +97,19 @@ static void handle_notify(EventNotifier *e)
     blk_io_plug(s->conf->conf.blk);
     for (;;) {
         MultiReqBuffer mrb = {};
-        int ret;
 
         /* Disable guest->host notifies to avoid unnecessary vmexits */
         vring_disable_notification(s->vdev, &s->vring);
 
         for (;;) {
-            VirtIOBlockReq *req = virtio_blk_alloc_request(vblk);
+            VirtIOBlockReq *req = vring_pop(s->vdev, &s->vring,
+                                            sizeof(VirtIOBlockReq));
 
-            ret = vring_pop(s->vdev, &s->vring, &req->elem);
-            if (ret < 0) {
-                virtio_blk_free_request(req);
+            if (req == NULL) {
                 break; /* no more requests */
             }
 
+            virtio_blk_init_request(vblk, req);
             trace_virtio_blk_data_plane_process_request(s, req->elem.out_num,
                                                         req->elem.in_num,
                                                         req->elem.index);
@@ -122,7 +121,7 @@ static void handle_notify(EventNotifier *e)
             virtio_blk_submit_multireq(s->conf->conf.blk, &mrb);
         }
 
-        if (likely(ret == -EAGAIN)) { /* vring emptied */
+        if (likely(!vring_more_avail(s->vdev, &s->vring))) { /* vring emptied */
             /* Re-enable guest->host notifies and stop processing the vring.
              * But if the guest has snuck in more descriptors, keep processing.
              */
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 51f867b..a874cb7 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -28,15 +28,13 @@
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
 
-VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
+void virtio_blk_init_request(VirtIOBlock *s, VirtIOBlockReq *req)
 {
-    VirtIOBlockReq *req = g_new(VirtIOBlockReq, 1);
     req->dev = s;
     req->qiov.size = 0;
     req->in_len = 0;
     req->next = NULL;
     req->mr_next = NULL;
-    return req;
 }
 
 void virtio_blk_free_request(VirtIOBlockReq *req)
@@ -192,13 +190,11 @@ out:
 
 static VirtIOBlockReq *virtio_blk_get_request(VirtIOBlock *s)
 {
-    VirtIOBlockReq *req = virtio_blk_alloc_request(s);
+    VirtIOBlockReq *req = virtqueue_pop(s->vq, sizeof(VirtIOBlockReq));
 
-    if (!virtqueue_pop(s->vq, &req->elem)) {
-        virtio_blk_free_request(req);
-        return NULL;
+    if (req) {
+        virtio_blk_init_request(s, req);
     }
-
     return req;
 }
 
@@ -835,7 +831,8 @@ static int virtio_blk_load_device(VirtIODevice *vdev, QEMUFile *f,
     VirtIOBlock *s = VIRTIO_BLK(vdev);
 
     while (qemu_get_sbyte(f)) {
-        VirtIOBlockReq *req = virtio_blk_alloc_request(s);
+        VirtIOBlockReq *req = g_new(VirtIOBlockReq, 1);
+        virtio_blk_init_request(s, req);
         qemu_get_buffer(f, (unsigned char *)&req->elem,
                         sizeof(VirtQueueElement));
         req->next = s->rq;
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 2d2a659..c941d21 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -82,7 +82,7 @@ static bool use_multiport(VirtIOSerial *vser)
 static size_t write_to_port(VirtIOSerialPort *port,
                             const uint8_t *buf, size_t size)
 {
-    VirtQueueElement elem;
+    VirtQueueElement *elem;
     VirtQueue *vq;
     size_t offset;
 
@@ -95,15 +95,17 @@ static size_t write_to_port(VirtIOSerialPort *port,
     while (offset < size) {
         size_t len;
 
-        if (!virtqueue_pop(vq, &elem)) {
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+        if (!elem) {
             break;
         }
 
-        len = iov_from_buf(elem.in_sg, elem.in_num, 0,
+        len = iov_from_buf(elem->in_sg, elem->in_num, 0,
                            buf + offset, size - offset);
         offset += len;
 
-        virtqueue_push(vq, &elem, len);
+        virtqueue_push(vq, elem, len);
+        g_free(elem);
     }
 
     virtio_notify(VIRTIO_DEVICE(port->vser), vq);
@@ -112,13 +114,18 @@ static size_t write_to_port(VirtIOSerialPort *port,
 
 static void discard_vq_data(VirtQueue *vq, VirtIODevice *vdev)
 {
-    VirtQueueElement elem;
+    VirtQueueElement *elem;
 
     if (!virtio_queue_ready(vq)) {
         return;
     }
-    while (virtqueue_pop(vq, &elem)) {
-        virtqueue_push(vq, &elem, 0);
+    for (;;) {
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+        if (!elem) {
+            break;
+        }
+        virtqueue_push(vq, elem, 0);
+        g_free(elem);
     }
     virtio_notify(vdev, vq);
 }
@@ -137,21 +144,22 @@ static void do_flush_queued_data(VirtIOSerialPort *port, VirtQueue *vq,
         unsigned int i;
 
         /* Pop an elem only if we haven't left off a previous one mid-way */
-        if (!port->elem.out_num) {
-            if (!virtqueue_pop(vq, &port->elem)) {
+        if (!port->elem) {
+            port->elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+            if (!port->elem) {
                 break;
             }
             port->iov_idx = 0;
             port->iov_offset = 0;
         }
 
-        for (i = port->iov_idx; i < port->elem.out_num; i++) {
+        for (i = port->iov_idx; i < port->elem->out_num; i++) {
             size_t buf_size;
             ssize_t ret;
 
-            buf_size = port->elem.out_sg[i].iov_len - port->iov_offset;
+            buf_size = port->elem->out_sg[i].iov_len - port->iov_offset;
             ret = vsc->have_data(port,
-                                  port->elem.out_sg[i].iov_base
+                                  port->elem->out_sg[i].iov_base
                                   + port->iov_offset,
                                   buf_size);
             if (port->throttled) {
@@ -166,8 +174,9 @@ static void do_flush_queued_data(VirtIOSerialPort *port, VirtQueue *vq,
         if (port->throttled) {
             break;
         }
-        virtqueue_push(vq, &port->elem, 0);
-        port->elem.out_num = 0;
+        virtqueue_push(vq, port->elem, 0);
+        g_free(port->elem);
+        port->elem = NULL;
     }
     virtio_notify(vdev, vq);
 }
@@ -184,22 +193,26 @@ static void flush_queued_data(VirtIOSerialPort *port)
 
 static size_t send_control_msg(VirtIOSerial *vser, void *buf, size_t len)
 {
-    VirtQueueElement elem;
+    VirtQueueElement *elem;
     VirtQueue *vq;
 
     vq = vser->c_ivq;
     if (!virtio_queue_ready(vq)) {
         return 0;
     }
-    if (!virtqueue_pop(vq, &elem)) {
+
+    elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+    if (!elem) {
         return 0;
     }
 
     /* TODO: detect a buffer that's too short, set NEEDS_RESET */
-    iov_from_buf(elem.in_sg, elem.in_num, 0, buf, len);
+    iov_from_buf(elem->in_sg, elem->in_num, 0, buf, len);
 
-    virtqueue_push(vq, &elem, len);
+    virtqueue_push(vq, elem, len);
     virtio_notify(VIRTIO_DEVICE(vser), vq);
+    g_free(elem);
+
     return len;
 }
 
@@ -413,7 +426,7 @@ static void control_in(VirtIODevice *vdev, VirtQueue *vq)
 
 static void control_out(VirtIODevice *vdev, VirtQueue *vq)
 {
-    VirtQueueElement elem;
+    VirtQueueElement *elem;
     VirtIOSerial *vser;
     uint8_t *buf;
     size_t len;
@@ -422,10 +435,15 @@ static void control_out(VirtIODevice *vdev, VirtQueue *vq)
 
     len = 0;
     buf = NULL;
-    while (virtqueue_pop(vq, &elem)) {
+    for (;;) {
         size_t cur_len;
 
-        cur_len = iov_size(elem.out_sg, elem.out_num);
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+        if (!elem) {
+            break;
+        }
+
+        cur_len = iov_size(elem->out_sg, elem->out_num);
         /*
          * Allocate a new buf only if we didn't have one previously or
          * if the size of the buf differs
@@ -436,10 +454,11 @@ static void control_out(VirtIODevice *vdev, VirtQueue *vq)
             buf = g_malloc(cur_len);
             len = cur_len;
         }
-        iov_to_buf(elem.out_sg, elem.out_num, 0, buf, cur_len);
+        iov_to_buf(elem->out_sg, elem->out_num, 0, buf, cur_len);
 
         handle_control_message(vser, buf, cur_len);
-        virtqueue_push(vq, &elem, 0);
+        virtqueue_push(vq, elem, 0);
+        g_free(elem);
     }
     g_free(buf);
     virtio_notify(vdev, vq);
@@ -619,7 +638,7 @@ static void virtio_serial_save_device(VirtIODevice *vdev, QEMUFile *f)
         qemu_put_byte(f, port->host_connected);
 
 	elem_popped = 0;
-        if (port->elem.out_num) {
+        if (port->elem) {
             elem_popped = 1;
         }
         qemu_put_be32s(f, &elem_popped);
@@ -627,8 +646,8 @@ static void virtio_serial_save_device(VirtIODevice *vdev, QEMUFile *f)
             qemu_put_be32s(f, &port->iov_idx);
             qemu_put_be64s(f, &port->iov_offset);
 
-            qemu_put_buffer(f, (unsigned char *)&port->elem,
-                            sizeof(port->elem));
+            qemu_put_buffer(f, (unsigned char *)port->elem,
+                            sizeof(VirtQueueElement));
         }
     }
 }
@@ -703,9 +722,10 @@ static int fetch_active_ports_list(QEMUFile *f, int version_id,
                 qemu_get_be32s(f, &port->iov_idx);
                 qemu_get_be64s(f, &port->iov_offset);
 
-                qemu_get_buffer(f, (unsigned char *)&port->elem,
-                                sizeof(port->elem));
-                virtqueue_map(&port->elem);
+                port->elem = g_new(VirtQueueElement, 1);
+                qemu_get_buffer(f, (unsigned char *)port->elem,
+                                sizeof(VirtQueueElement));
+                virtqueue_map(port->elem);
 
                 /*
                  *  Port was throttled on source machine.  Let's
@@ -927,7 +947,7 @@ static void virtser_port_device_realize(DeviceState *dev, Error **errp)
         return;
     }
 
-    port->elem.out_num = 0;
+    port->elem = NULL;
 }
 
 static void virtser_port_device_plug(HotplugHandler *hotplug_dev,
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 7e79a9c..ea10603 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -770,8 +770,8 @@ static void virtio_gpu_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
     }
 #endif
 
-    cmd = g_new(struct virtio_gpu_ctrl_command, 1);
-    while (virtqueue_pop(vq, &cmd->elem)) {
+    cmd = virtqueue_pop(vq, sizeof(struct virtio_gpu_ctrl_command));
+    while (cmd) {
         cmd->vq = vq;
         cmd->error = 0;
         cmd->finished = false;
@@ -781,7 +781,9 @@ static void virtio_gpu_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
 
         VIRGL(g, virtio_gpu_virgl_process_cmd, virtio_gpu_simple_process_cmd,
               g, cmd);
-        if (!cmd->finished) {
+        if (cmd->finished) {
+            g_free(cmd);
+        } else {
             QTAILQ_INSERT_TAIL(&g->fenceq, cmd, next);
             g->inflight++;
             if (virtio_gpu_stats_enabled(g->conf)) {
@@ -790,10 +792,9 @@ static void virtio_gpu_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
                 }
                 fprintf(stderr, "inflight: %3d (+)\r", g->inflight);
             }
-            cmd = g_new(struct virtio_gpu_ctrl_command, 1);
         }
+        cmd = virtqueue_pop(vq, sizeof(struct virtio_gpu_ctrl_command));
     }
-    g_free(cmd);
 
 #ifdef CONFIG_VIRGL
     if (g->use_virgl_renderer) {
@@ -811,15 +812,20 @@ static void virtio_gpu_ctrl_bh(void *opaque)
 static void virtio_gpu_handle_cursor(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIOGPU *g = VIRTIO_GPU(vdev);
-    VirtQueueElement elem;
+    VirtQueueElement *elem;
     size_t s;
     struct virtio_gpu_update_cursor cursor_info;
 
     if (!virtio_queue_ready(vq)) {
         return;
     }
-    while (virtqueue_pop(vq, &elem)) {
-        s = iov_to_buf(elem.out_sg, elem.out_num, 0,
+    for (;;) {
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+        if (!elem) {
+            break;
+        }
+
+        s = iov_to_buf(elem->out_sg, elem->out_num, 0,
                        &cursor_info, sizeof(cursor_info));
         if (s != sizeof(cursor_info)) {
             qemu_log_mask(LOG_GUEST_ERROR,
@@ -828,8 +834,9 @@ static void virtio_gpu_handle_cursor(VirtIODevice *vdev, VirtQueue *vq)
         } else {
             update_cursor(g, &cursor_info);
         }
-        virtqueue_push(vq, &elem, 0);
+        virtqueue_push(vq, elem, 0);
         virtio_notify(vdev, vq);
+        g_free(elem);
     }
 }
 
diff --git a/hw/input/virtio-input.c b/hw/input/virtio-input.c
index 1f5a40d..bbae884 100644
--- a/hw/input/virtio-input.c
+++ b/hw/input/virtio-input.c
@@ -16,7 +16,7 @@
 
 void virtio_input_send(VirtIOInput *vinput, virtio_input_event *event)
 {
-    VirtQueueElement elem;
+    VirtQueueElement *elem;
     unsigned have, need;
     int i, len;
 
@@ -49,14 +49,16 @@ void virtio_input_send(VirtIOInput *vinput, virtio_input_event *event)
 
     /* ... and finally pass them to the guest */
     for (i = 0; i < vinput->qindex; i++) {
-        if (!virtqueue_pop(vinput->evt, &elem)) {
+        elem = virtqueue_pop(vinput->evt, sizeof(VirtQueueElement));
+        if (!elem) {
             /* should not happen, we've checked for space beforehand */
             fprintf(stderr, "%s: Huh?  No vq elem available ...\n", __func__);
             return;
         }
-        len = iov_from_buf(elem.in_sg, elem.in_num,
+        len = iov_from_buf(elem->in_sg, elem->in_num,
                            0, vinput->queue+i, sizeof(virtio_input_event));
-        virtqueue_push(vinput->evt, &elem, len);
+        virtqueue_push(vinput->evt, elem, len);
+        g_free(elem);
     }
     virtio_notify(VIRTIO_DEVICE(vinput), vinput->evt);
     vinput->qindex = 0;
@@ -72,17 +74,23 @@ static void virtio_input_handle_sts(VirtIODevice *vdev, VirtQueue *vq)
     VirtIOInputClass *vic = VIRTIO_INPUT_GET_CLASS(vdev);
     VirtIOInput *vinput = VIRTIO_INPUT(vdev);
     virtio_input_event event;
-    VirtQueueElement elem;
+    VirtQueueElement *elem;
     int len;
 
-    while (virtqueue_pop(vinput->sts, &elem)) {
+    for (;;) {
+        elem = virtqueue_pop(vinput->sts, sizeof(VirtQueueElement));
+        if (!elem) {
+            break;
+        }
+
         memset(&event, 0, sizeof(event));
-        len = iov_to_buf(elem.out_sg, elem.out_num,
+        len = iov_to_buf(elem->out_sg, elem->out_num,
                          0, &event, sizeof(event));
         if (vic->handle_status) {
             vic->handle_status(vinput, &event);
         }
-        virtqueue_push(vinput->sts, &elem, len);
+        virtqueue_push(vinput->sts, elem, len);
+        g_free(elem);
     }
     virtio_notify(vdev, vinput->sts);
 }
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index a877614..c03b568 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -818,20 +818,24 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
     VirtIONet *n = VIRTIO_NET(vdev);
     struct virtio_net_ctrl_hdr ctrl;
     virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
-    VirtQueueElement elem;
+    VirtQueueElement *elem;
     size_t s;
     struct iovec *iov, *iov2;
     unsigned int iov_cnt;
 
-    while (virtqueue_pop(vq, &elem)) {
-        if (iov_size(elem.in_sg, elem.in_num) < sizeof(status) ||
-            iov_size(elem.out_sg, elem.out_num) < sizeof(ctrl)) {
+    for (;;) {
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+        if (!elem) {
+            break;
+        }
+        if (iov_size(elem->in_sg, elem->in_num) < sizeof(status) ||
+            iov_size(elem->out_sg, elem->out_num) < sizeof(ctrl)) {
             error_report("virtio-net ctrl missing headers");
             exit(1);
         }
 
-        iov_cnt = elem.out_num;
-        iov2 = iov = g_memdup(elem.out_sg, sizeof(struct iovec) * elem.out_num);
+        iov_cnt = elem->out_num;
+        iov2 = iov = g_memdup(elem->out_sg, sizeof(struct iovec) * elem->out_num);
         s = iov_to_buf(iov, iov_cnt, 0, &ctrl, sizeof(ctrl));
         iov_discard_front(&iov, &iov_cnt, sizeof(ctrl));
         if (s != sizeof(ctrl)) {
@@ -850,12 +854,13 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
             status = virtio_net_handle_offloads(n, ctrl.cmd, iov, iov_cnt);
         }
 
-        s = iov_from_buf(elem.in_sg, elem.in_num, 0, &status, sizeof(status));
+        s = iov_from_buf(elem->in_sg, elem->in_num, 0, &status, sizeof(status));
         assert(s == sizeof(status));
 
-        virtqueue_push(vq, &elem, sizeof(status));
+        virtqueue_push(vq, elem, sizeof(status));
         virtio_notify(vdev, vq);
         g_free(iov2);
+        g_free(elem);
     }
 }
 
@@ -1044,13 +1049,14 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
     offset = i = 0;
 
     while (offset < size) {
-        VirtQueueElement elem;
+        VirtQueueElement *elem;
         int len, total;
-        const struct iovec *sg = elem.in_sg;
+        const struct iovec *sg;
 
         total = 0;
 
-        if (virtqueue_pop(q->rx_vq, &elem) == 0) {
+        elem = virtqueue_pop(q->rx_vq, sizeof(VirtQueueElement));
+        if (!elem) {
             if (i == 0)
                 return -1;
             error_report("virtio-net unexpected empty queue: "
@@ -1063,21 +1069,22 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
             exit(1);
         }
 
-        if (elem.in_num < 1) {
+        if (elem->in_num < 1) {
             error_report("virtio-net receive queue contains no in buffers");
             exit(1);
         }
 
+        sg = elem->in_sg;
         if (i == 0) {
             assert(offset == 0);
             if (n->mergeable_rx_bufs) {
                 mhdr_cnt = iov_copy(mhdr_sg, ARRAY_SIZE(mhdr_sg),
-                                    sg, elem.in_num,
+                                    sg, elem->in_num,
                                     offsetof(typeof(mhdr), num_buffers),
                                     sizeof(mhdr.num_buffers));
             }
 
-            receive_header(n, sg, elem.in_num, buf, size);
+            receive_header(n, sg, elem->in_num, buf, size);
             offset = n->host_hdr_len;
             total += n->guest_hdr_len;
             guest_offset = n->guest_hdr_len;
@@ -1086,7 +1093,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
         }
 
         /* copy in packet.  ugh */
-        len = iov_from_buf(sg, elem.in_num, guest_offset,
+        len = iov_from_buf(sg, elem->in_num, guest_offset,
                            buf + offset, size - offset);
         total += len;
         offset += len;
@@ -1094,12 +1101,14 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
          * must have consumed the complete packet.
          * Otherwise, drop it. */
         if (!n->mergeable_rx_bufs && offset < size) {
-            virtqueue_discard(q->rx_vq, &elem, total);
+            virtqueue_discard(q->rx_vq, elem, total);
+            g_free(elem);
             return size;
         }
 
         /* signal other side */
-        virtqueue_fill(q->rx_vq, &elem, total, i++);
+        virtqueue_fill(q->rx_vq, elem, total, i++);
+        g_free(elem);
     }
 
     if (mhdr_cnt) {
@@ -1123,10 +1132,11 @@ static void virtio_net_tx_complete(NetClientState *nc, ssize_t len)
     VirtIONetQueue *q = virtio_net_get_subqueue(nc);
     VirtIODevice *vdev = VIRTIO_DEVICE(n);
 
-    virtqueue_push(q->tx_vq, &q->async_tx.elem, 0);
+    virtqueue_push(q->tx_vq, q->async_tx.elem, 0);
     virtio_notify(vdev, q->tx_vq);
 
-    q->async_tx.elem.out_num = 0;
+    g_free(q->async_tx.elem);
+    q->async_tx.elem = NULL;
 
     virtio_queue_set_notification(q->tx_vq, 1);
     virtio_net_flush_tx(q);
@@ -1137,25 +1147,31 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 {
     VirtIONet *n = q->n;
     VirtIODevice *vdev = VIRTIO_DEVICE(n);
-    VirtQueueElement elem;
+    VirtQueueElement *elem;
     int32_t num_packets = 0;
     int queue_index = vq2q(virtio_get_queue_index(q->tx_vq));
     if (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
         return num_packets;
     }
 
-    if (q->async_tx.elem.out_num) {
+    if (q->async_tx.elem) {
         virtio_queue_set_notification(q->tx_vq, 0);
         return num_packets;
     }
 
-    while (virtqueue_pop(q->tx_vq, &elem)) {
+    for (;;) {
         ssize_t ret;
-        unsigned int out_num = elem.out_num;
-        struct iovec *out_sg = &elem.out_sg[0];
-        struct iovec sg[VIRTQUEUE_MAX_SIZE], sg2[VIRTQUEUE_MAX_SIZE + 1];
+        unsigned int out_num;
+        struct iovec sg[VIRTQUEUE_MAX_SIZE], sg2[VIRTQUEUE_MAX_SIZE + 1], *out_sg;
         struct virtio_net_hdr_mrg_rxbuf mhdr;
 
+        elem = virtqueue_pop(q->tx_vq, sizeof(VirtQueueElement));
+        if (!elem) {
+            break;
+        }
+
+        out_num = elem->out_num;
+        out_sg = elem->out_sg;
         if (out_num < 1) {
             error_report("virtio-net header not in first element");
             exit(1);
@@ -1207,8 +1223,9 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
         }
 
 drop:
-        virtqueue_push(q->tx_vq, &elem, 0);
+        virtqueue_push(q->tx_vq, elem, 0);
         virtio_notify(vdev, q->tx_vq);
+        g_free(elem);
 
         if (++num_packets >= n->tx_burst) {
             break;
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 0d8d71e..45ae2d2 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -80,15 +80,16 @@ fail_vring:
 VirtIOSCSIReq *virtio_scsi_pop_req_vring(VirtIOSCSI *s,
                                          VirtIOSCSIVring *vring)
 {
-    VirtIOSCSIReq *req = virtio_scsi_init_req(s, NULL);
-    int r;
+    VirtIOSCSICommon *vs = (VirtIOSCSICommon *)s;
+    VirtIOSCSIReq *req;
 
-    req->vring = vring;
-    r = vring_pop((VirtIODevice *)s, &vring->vring, &req->elem);
-    if (r < 0) {
-        virtio_scsi_free_req(req);
-        req = NULL;
+    req = vring_pop((VirtIODevice *)s, &vring->vring,
+                    sizeof(VirtIOSCSIReq) + vs->cdb_size);
+    if (!req) {
+        return NULL;
     }
+    virtio_scsi_init_req(s, NULL, req);
+    req->vring = vring;
     return req;
 }
 
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 1567da8..42a6b27 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -40,19 +40,15 @@ static inline SCSIDevice *virtio_scsi_device_find(VirtIOSCSI *s, uint8_t *lun)
     return scsi_device_find(&s->bus, 0, lun[1], virtio_scsi_get_lun(lun));
 }
 
-VirtIOSCSIReq *virtio_scsi_init_req(VirtIOSCSI *s, VirtQueue *vq)
+void virtio_scsi_init_req(VirtIOSCSI *s, VirtQueue *vq, VirtIOSCSIReq *req)
 {
-    VirtIOSCSIReq *req;
-    VirtIOSCSICommon *vs = (VirtIOSCSICommon *)s;
     const size_t zero_skip = offsetof(VirtIOSCSIReq, vring);
 
-    req = g_malloc(sizeof(*req) + vs->cdb_size);
     req->vq = vq;
     req->dev = s;
     qemu_sglist_init(&req->qsgl, DEVICE(s), 8, &address_space_memory);
     qemu_iovec_init(&req->resp_iov, 1);
     memset((uint8_t *)req + zero_skip, 0, sizeof(*req) - zero_skip);
-    return req;
 }
 
 void virtio_scsi_free_req(VirtIOSCSIReq *req)
@@ -173,11 +169,14 @@ static int virtio_scsi_parse_req(VirtIOSCSIReq *req,
 
 static VirtIOSCSIReq *virtio_scsi_pop_req(VirtIOSCSI *s, VirtQueue *vq)
 {
-    VirtIOSCSIReq *req = virtio_scsi_init_req(s, vq);
-    if (!virtqueue_pop(vq, &req->elem)) {
-        virtio_scsi_free_req(req);
+    VirtIOSCSICommon *vs = (VirtIOSCSICommon *)s;
+    VirtIOSCSIReq *req;
+
+    req = virtqueue_pop(vq, sizeof(VirtIOSCSIReq) + vs->cdb_size);
+    if (!req) {
         return NULL;
     }
+    virtio_scsi_init_req(s, vq, req);
     return req;
 }
 
@@ -202,8 +201,9 @@ static void *virtio_scsi_load_request(QEMUFile *f, SCSIRequest *sreq)
 
     qemu_get_be32s(f, &n);
     assert(n < vs->conf.num_queues);
-    req = virtio_scsi_init_req(s, vs->cmd_vqs[n]);
+    req = g_malloc(sizeof(VirtIOSCSIReq) + vs->cdb_size);
     qemu_get_buffer(f, (unsigned char *)&req->elem, sizeof(req->elem));
+    virtio_scsi_init_req(s, vs->cmd_vqs[n], req);
 
     virtqueue_map(&req->elem);
 
diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c
index 23f667e..1b900fc 100644
--- a/hw/virtio/dataplane/vring.c
+++ b/hw/virtio/dataplane/vring.c
@@ -388,23 +388,25 @@ static void vring_unmap_element(VirtQueueElement *elem)
  *
  * Stolen from linux/drivers/vhost/vhost.c.
  */
-int vring_pop(VirtIODevice *vdev, Vring *vring,
-              VirtQueueElement *elem)
+void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t sz)
 {
     struct vring_desc desc;
     unsigned int i, head, found = 0, num = vring->vr.num;
     uint16_t avail_idx, last_avail_idx;
+    VirtQueueElement *elem = NULL;
     int ret;
 
-    /* Initialize elem so it can be safely unmapped */
-    elem->in_num = elem->out_num = 0;
-
     /* If there was a fatal error then refuse operation */
     if (vring->broken) {
         ret = -EFAULT;
         goto out;
     }
 
+    elem = g_malloc(sz);
+
+    /* Initialize elem so it can be safely unmapped */
+    elem->in_num = elem->out_num = 0;
+
     /* Check it isn't doing very strange things with descriptor numbers. */
     last_avail_idx = vring->last_avail_idx;
     avail_idx = vring_get_avail_idx(vdev, vring);
@@ -480,7 +482,7 @@ int vring_pop(VirtIODevice *vdev, Vring *vring,
             virtio_tswap16(vdev, vring->last_avail_idx);
     }
 
-    return head;
+    return elem;
 
 out:
     assert(ret < 0);
@@ -488,7 +490,8 @@ out:
         vring->broken = true;
     }
     vring_unmap_element(elem);
-    return ret;
+    g_free(elem);
+    return NULL;
 }
 
 /* After we've used one of their buffers, we tell them about it.
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 9671635..2704680 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -106,8 +106,10 @@ static void balloon_stats_poll_cb(void *opaque)
         return;
     }
 
-    virtqueue_push(s->svq, &s->stats_vq_elem, s->stats_vq_offset);
+    virtqueue_push(s->svq, s->stats_vq_elem, s->stats_vq_offset);
     virtio_notify(vdev, s->svq);
+    g_free(s->stats_vq_elem);
+    s->stats_vq_elem = NULL;
 }
 
 static void balloon_stats_get_all(Object *obj, struct Visitor *v,
@@ -205,14 +207,18 @@ static void balloon_stats_set_poll_interval(Object *obj, struct Visitor *v,
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
-    VirtQueueElement elem;
+    VirtQueueElement *elem;
     MemoryRegionSection section;
 
-    while (virtqueue_pop(vq, &elem)) {
+    for (;;) {
         size_t offset = 0;
         uint32_t pfn;
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+        if (!elem) {
+            return;
+        }
 
-        while (iov_to_buf(elem.out_sg, elem.out_num, offset, &pfn, 4) == 4) {
+        while (iov_to_buf(elem->out_sg, elem->out_num, offset, &pfn, 4) == 4) {
             ram_addr_t pa;
             ram_addr_t addr;
             int p = virtio_ldl_p(vdev, &pfn);
@@ -235,20 +241,22 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
             memory_region_unref(section.mr);
         }
 
-        virtqueue_push(vq, &elem, offset);
+        virtqueue_push(vq, elem, offset);
         virtio_notify(vdev, vq);
+        g_free(elem);
     }
 }
 
 static void virtio_balloon_receive_stats(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
-    VirtQueueElement *elem = &s->stats_vq_elem;
+    VirtQueueElement *elem;
     VirtIOBalloonStat stat;
     size_t offset = 0;
     qemu_timeval tv;
 
-    if (!virtqueue_pop(vq, elem)) {
+    s->stats_vq_elem = elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+    if (!elem) {
         goto out;
     }
 
diff --git a/hw/virtio/virtio-rng.c b/hw/virtio/virtio-rng.c
index 97d1541..5f92047 100644
--- a/hw/virtio/virtio-rng.c
+++ b/hw/virtio/virtio-rng.c
@@ -43,7 +43,7 @@ static void chr_read(void *opaque, const void *buf, size_t size)
 {
     VirtIORNG *vrng = opaque;
     VirtIODevice *vdev = VIRTIO_DEVICE(vrng);
-    VirtQueueElement elem;
+    VirtQueueElement *elem;
     size_t len;
     int offset;
 
@@ -55,15 +55,17 @@ static void chr_read(void *opaque, const void *buf, size_t size)
 
     offset = 0;
     while (offset < size) {
-        if (!virtqueue_pop(vrng->vq, &elem)) {
+        elem = virtqueue_pop(vrng->vq, sizeof(VirtQueueElement));
+        if (!elem) {
             break;
         }
-        len = iov_from_buf(elem.in_sg, elem.in_num,
+        len = iov_from_buf(elem->in_sg, elem->in_num,
                            0, buf + offset, size - offset);
         offset += len;
 
-        virtqueue_push(vrng->vq, &elem, len);
+        virtqueue_push(vrng->vq, elem, len);
         trace_virtio_rng_pushed(vrng, len);
+        g_free(elem);
     }
     virtio_notify(vdev, vrng->vq);
 }
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index bd6b4df..6c4c8bc 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -501,16 +501,19 @@ void virtqueue_map(VirtQueueElement *elem)
                         0);
 }
 
-int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem)
+void *virtqueue_pop(VirtQueue *vq, size_t sz)
 {
     unsigned int i, head, max;
     hwaddr desc_pa = vq->vring.desc;
     VirtIODevice *vdev = vq->vdev;
+    VirtQueueElement *elem;
 
-    if (!virtqueue_num_heads(vq, vq->last_avail_idx))
-        return 0;
+    if (!virtqueue_num_heads(vq, vq->last_avail_idx)) {
+        return NULL;
+    }
 
     /* When we start there are none of either input nor output. */
+    elem = g_malloc(sz);
     elem->out_num = elem->in_num = 0;
 
     max = vq->vring.num;
@@ -569,7 +572,7 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem)
     vq->inuse++;
 
     trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
-    return elem->in_num + elem->out_num;
+    return elem;
 }
 
 /* virtio device */
diff --git a/include/hw/virtio/dataplane/vring.h b/include/hw/virtio/dataplane/vring.h
index a596e4c..e80985e 100644
--- a/include/hw/virtio/dataplane/vring.h
+++ b/include/hw/virtio/dataplane/vring.h
@@ -44,7 +44,7 @@ void vring_teardown(Vring *vring, VirtIODevice *vdev, int n);
 void vring_disable_notification(VirtIODevice *vdev, Vring *vring);
 bool vring_enable_notification(VirtIODevice *vdev, Vring *vring);
 bool vring_should_notify(VirtIODevice *vdev, Vring *vring);
-int vring_pop(VirtIODevice *vdev, Vring *vring, VirtQueueElement *elem);
+void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t sz);
 void vring_push(VirtIODevice *vdev, Vring *vring, VirtQueueElement *elem,
                 int len);
 
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index 09c2ce4..35f62ac 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -37,7 +37,7 @@ typedef struct VirtIOBalloon {
     uint32_t num_pages;
     uint32_t actual;
     uint64_t stats[VIRTIO_BALLOON_S_NR];
-    VirtQueueElement stats_vq_elem;
+    VirtQueueElement *stats_vq_elem;
     size_t stats_vq_offset;
     QEMUTimer *stats_timer;
     int64_t stats_last_update;
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 403ab86..199bb0e 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -80,8 +80,7 @@ typedef struct MultiReqBuffer {
     bool is_write;
 } MultiReqBuffer;
 
-VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s);
-
+void virtio_blk_init_request(VirtIOBlock *s, VirtIOBlockReq *req);
 void virtio_blk_free_request(VirtIOBlockReq *req);
 
 void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb);
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index f3cc25f..2ce3b03 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -47,7 +47,7 @@ typedef struct VirtIONetQueue {
     QEMUBH *tx_bh;
     int tx_waiting;
     struct {
-        VirtQueueElement elem;
+        VirtQueueElement *elem;
     } async_tx;
     struct VirtIONet *n;
 } VirtIONetQueue;
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 63f5b51..d9ae76c 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -150,7 +150,7 @@ void virtio_scsi_common_unrealize(DeviceState *dev, Error **errp);
 void virtio_scsi_handle_ctrl_req(VirtIOSCSI *s, VirtIOSCSIReq *req);
 bool virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI *s, VirtIOSCSIReq *req);
 void virtio_scsi_handle_cmd_req_submit(VirtIOSCSI *s, VirtIOSCSIReq *req);
-VirtIOSCSIReq *virtio_scsi_init_req(VirtIOSCSI *s, VirtQueue *vq);
+void virtio_scsi_init_req(VirtIOSCSI *s, VirtQueue *vq, VirtIOSCSIReq *req);
 void virtio_scsi_free_req(VirtIOSCSIReq *req);
 void virtio_scsi_push_event(VirtIOSCSI *s, SCSIDevice *dev,
                             uint32_t event, uint32_t reason);
diff --git a/include/hw/virtio/virtio-serial.h b/include/hw/virtio/virtio-serial.h
index 527d0bf..12a55a1 100644
--- a/include/hw/virtio/virtio-serial.h
+++ b/include/hw/virtio/virtio-serial.h
@@ -122,7 +122,7 @@ struct VirtIOSerialPort {
      * element popped and continue consuming it once the backend
      * becomes writable again.
      */
-    VirtQueueElement elem;
+    VirtQueueElement *elem;
 
     /*
      * The index and the offset into the iov buffer that was popped in
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 205fadf..21fda17 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -152,7 +152,7 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len, unsigned int idx);
 
 void virtqueue_map(VirtQueueElement *elem);
-int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem);
+void *virtqueue_pop(VirtQueue *vq, size_t sz);
 int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
                           unsigned int out_bytes);
 void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 03/10] virtio: introduce qemu_get/put_virtqueue_element
  2016-01-15 12:41 [Qemu-devel] [PATCH 00/10] virtio/vring: optimization patches Paolo Bonzini
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 01/10] virtio: move VirtQueueElement at the beginning of the structs Paolo Bonzini
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 02/10] virtio: move allocation to virtqueue_pop/vring_pop Paolo Bonzini
@ 2016-01-15 12:41 ` Paolo Bonzini
  2016-01-19 12:30   ` Cornelia Huck
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 04/10] virtio: introduce virtqueue_alloc_element Paolo Bonzini
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2016-01-15 12:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Move allocation to virtio functions also when loading/saving a
VirtQueueElement.  This will also let the load/save functions
keep backwards compatibility when the VirtQueueElement layout
is changed.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/block/virtio-blk.c       | 10 +++-------
 hw/char/virtio-serial-bus.c | 10 +++-------
 hw/scsi/virtio-scsi.c       |  7 ++-----
 hw/virtio/virtio.c          | 13 +++++++++++++
 include/hw/virtio/virtio.h  |  2 ++
 5 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index a874cb7..75b2bfc 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -807,8 +807,7 @@ static void virtio_blk_save_device(VirtIODevice *vdev, QEMUFile *f)
 
     while (req) {
         qemu_put_sbyte(f, 1);
-        qemu_put_buffer(f, (unsigned char *)&req->elem,
-                        sizeof(VirtQueueElement));
+        qemu_put_virtqueue_element(f, &req->elem);
         req = req->next;
     }
     qemu_put_sbyte(f, 0);
@@ -831,14 +830,11 @@ static int virtio_blk_load_device(VirtIODevice *vdev, QEMUFile *f,
     VirtIOBlock *s = VIRTIO_BLK(vdev);
 
     while (qemu_get_sbyte(f)) {
-        VirtIOBlockReq *req = g_new(VirtIOBlockReq, 1);
+        VirtIOBlockReq *req;
+        req = qemu_get_virtqueue_element(f, sizeof(VirtIOBlockReq));
         virtio_blk_init_request(s, req);
-        qemu_get_buffer(f, (unsigned char *)&req->elem,
-                        sizeof(VirtQueueElement));
         req->next = s->rq;
         s->rq = req;
-
-        virtqueue_map(&req->elem);
     }
 
     return 0;
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index c941d21..956ec75 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -645,9 +645,7 @@ static void virtio_serial_save_device(VirtIODevice *vdev, QEMUFile *f)
         if (elem_popped) {
             qemu_put_be32s(f, &port->iov_idx);
             qemu_put_be64s(f, &port->iov_offset);
-
-            qemu_put_buffer(f, (unsigned char *)port->elem,
-                            sizeof(VirtQueueElement));
+            qemu_put_virtqueue_element(f, port->elem);
         }
     }
 }
@@ -722,10 +720,8 @@ static int fetch_active_ports_list(QEMUFile *f, int version_id,
                 qemu_get_be32s(f, &port->iov_idx);
                 qemu_get_be64s(f, &port->iov_offset);
 
-                port->elem = g_new(VirtQueueElement, 1);
-                qemu_get_buffer(f, (unsigned char *)port->elem,
-                                sizeof(VirtQueueElement));
-                virtqueue_map(port->elem);
+                port->elem =
+                    qemu_get_virtqueue_element(f, sizeof(VirtQueueElement));
 
                 /*
                  *  Port was throttled on source machine.  Let's
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 42a6b27..6921420 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -188,7 +188,7 @@ static void virtio_scsi_save_request(QEMUFile *f, SCSIRequest *sreq)
 
     assert(n < vs->conf.num_queues);
     qemu_put_be32s(f, &n);
-    qemu_put_buffer(f, (unsigned char *)&req->elem, sizeof(req->elem));
+    qemu_put_virtqueue_element(f, &req->elem);
 }
 
 static void *virtio_scsi_load_request(QEMUFile *f, SCSIRequest *sreq)
@@ -201,12 +201,9 @@ static void *virtio_scsi_load_request(QEMUFile *f, SCSIRequest *sreq)
 
     qemu_get_be32s(f, &n);
     assert(n < vs->conf.num_queues);
-    req = g_malloc(sizeof(VirtIOSCSIReq) + vs->cdb_size);
-    qemu_get_buffer(f, (unsigned char *)&req->elem, sizeof(req->elem));
+    req = qemu_get_virtqueue_element(f, sizeof(VirtIOSCSIReq) + vs->cdb_size);
     virtio_scsi_init_req(s, vs->cmd_vqs[n], req);
 
-    virtqueue_map(&req->elem);
-
     if (virtio_scsi_parse_req(req, sizeof(VirtIOSCSICmdReq) + vs->cdb_size,
                               sizeof(VirtIOSCSICmdResp) + vs->sense_size) < 0) {
         error_report("invalid SCSI request migration data");
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 6c4c8bc..ae3252f 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -575,6 +575,19 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     return elem;
 }
 
+void *qemu_get_virtqueue_element(QEMUFile *f, size_t sz)
+{
+    VirtQueueElement *elem = g_malloc(sz);
+    qemu_get_buffer(f, (uint8_t *)elem, sizeof(VirtQueueElement));
+    virtqueue_map(elem);
+    return elem;
+}
+
+void qemu_put_virtqueue_element(QEMUFile *f, VirtQueueElement *elem)
+{
+    qemu_put_buffer(f, (uint8_t *)elem, sizeof(VirtQueueElement));
+}
+
 /* virtio device */
 static void virtio_notify_vector(VirtIODevice *vdev, uint16_t vector)
 {
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 21fda17..44da9a8 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -153,6 +153,8 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
 
 void virtqueue_map(VirtQueueElement *elem);
 void *virtqueue_pop(VirtQueue *vq, size_t sz);
+void *qemu_get_virtqueue_element(QEMUFile *f, size_t sz);
+void qemu_put_virtqueue_element(QEMUFile *f, VirtQueueElement *elem);
 int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
                           unsigned int out_bytes);
 void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 04/10] virtio: introduce virtqueue_alloc_element
  2016-01-15 12:41 [Qemu-devel] [PATCH 00/10] virtio/vring: optimization patches Paolo Bonzini
                   ` (2 preceding siblings ...)
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 03/10] virtio: introduce qemu_get/put_virtqueue_element Paolo Bonzini
@ 2016-01-15 12:41 ` Paolo Bonzini
  2016-01-19 12:40   ` Cornelia Huck
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 05/10] virtio: slim down allocation of VirtQueueElements Paolo Bonzini
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2016-01-15 12:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Allocate the arrays for in_addr/out_addr/in_sg/out_sg outside the
VirtQueueElement.  For now, virtqueue_pop and vring_pop keep
allocating a very large VirtQueueElement.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/virtio/dataplane/vring.c |   2 +-
 hw/virtio/virtio.c          | 109 ++++++++++++++++++++++++++++++++++++++++----
 include/hw/virtio/virtio.h  |   9 ++--
 3 files changed, 105 insertions(+), 15 deletions(-)

diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c
index 1b900fc..c950caa 100644
--- a/hw/virtio/dataplane/vring.c
+++ b/hw/virtio/dataplane/vring.c
@@ -402,7 +402,7 @@ void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t sz)
         goto out;
     }
 
-    elem = g_malloc(sz);
+    elem = virtqueue_alloc_element(sz, VIRTQUEUE_MAX_SIZE, VIRTQUEUE_MAX_SIZE);
 
     /* Initialize elem so it can be safely unmapped */
     elem->in_num = elem->out_num = 0;
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index ae3252f..a292ac9 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -494,11 +494,29 @@ static void virtqueue_map_iovec(struct iovec *sg, hwaddr *addr,
 void virtqueue_map(VirtQueueElement *elem)
 {
     virtqueue_map_iovec(elem->in_sg, elem->in_addr, &elem->in_num,
-                        MIN(ARRAY_SIZE(elem->in_sg), ARRAY_SIZE(elem->in_addr)),
-                        1);
+                        VIRTQUEUE_MAX_SIZE, 1);
     virtqueue_map_iovec(elem->out_sg, elem->out_addr, &elem->out_num,
-                        MIN(ARRAY_SIZE(elem->out_sg), ARRAY_SIZE(elem->out_addr)),
-                        0);
+                        VIRTQUEUE_MAX_SIZE, 0);
+}
+
+void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_num)
+{
+    VirtQueueElement *elem;
+    size_t in_addr_ofs = QEMU_ALIGN_UP(sz, __alignof__(elem->in_addr[0]));
+    size_t out_addr_ofs = in_addr_ofs + in_num * sizeof(elem->in_addr[0]);
+    size_t out_addr_end = out_addr_ofs + out_num * sizeof(elem->out_addr[0]);
+    size_t in_sg_ofs = QEMU_ALIGN_UP(out_addr_end, __alignof__(elem->in_sg[0]));
+    size_t out_sg_ofs = in_sg_ofs + in_num * sizeof(elem->in_sg[0]);
+    size_t out_sg_end = out_sg_ofs + out_num * sizeof(elem->out_sg[0]);
+
+    elem = g_malloc(out_sg_end);
+    elem->out_num = out_num;
+    elem->in_num = in_num;
+    elem->in_addr = (void *)elem + in_addr_ofs;
+    elem->out_addr = (void *)elem + out_addr_ofs;
+    elem->in_sg = (void *)elem + in_sg_ofs;
+    elem->out_sg = (void *)elem + out_sg_ofs;
+    return elem;
 }
 
 void *virtqueue_pop(VirtQueue *vq, size_t sz)
@@ -513,7 +531,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     }
 
     /* When we start there are none of either input nor output. */
-    elem = g_malloc(sz);
+    elem = virtqueue_alloc_element(sz, VIRTQUEUE_MAX_SIZE, VIRTQUEUE_MAX_SIZE);
     elem->out_num = elem->in_num = 0;
 
     max = vq->vring.num;
@@ -540,14 +558,14 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
         struct iovec *sg;
 
         if (vring_desc_flags(vdev, desc_pa, i) & VRING_DESC_F_WRITE) {
-            if (elem->in_num >= ARRAY_SIZE(elem->in_sg)) {
+            if (elem->in_num >= VIRTQUEUE_MAX_SIZE) {
                 error_report("Too many write descriptors in indirect table");
                 exit(1);
             }
             elem->in_addr[elem->in_num] = vring_desc_addr(vdev, desc_pa, i);
             sg = &elem->in_sg[elem->in_num++];
         } else {
-            if (elem->out_num >= ARRAY_SIZE(elem->out_sg)) {
+            if (elem->out_num >= VIRTQUEUE_MAX_SIZE) {
                 error_report("Too many read descriptors in indirect table");
                 exit(1);
             }
@@ -575,17 +593,87 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     return elem;
 }
 
+/* Reading and writing a structure directly to QEMUFile is *awful*, but
+ * it is what QEMU has always done by mistake.  We can change it sooner
+ * or later by bumping the version number of the affected vm states.
+ * In the meanwhile, since the in-memory layout of VirtQueueElement
+ * has changed, we need to marshal to and from the layout that was
+ * used before the change.
+ */
+typedef struct VirtQueueElementOld {
+    unsigned int index;
+    unsigned int out_num;
+    unsigned int in_num;
+    hwaddr in_addr[VIRTQUEUE_MAX_SIZE];
+    hwaddr out_addr[VIRTQUEUE_MAX_SIZE];
+    struct iovec in_sg[VIRTQUEUE_MAX_SIZE];
+    struct iovec out_sg[VIRTQUEUE_MAX_SIZE];
+} VirtQueueElementOld;
+
 void *qemu_get_virtqueue_element(QEMUFile *f, size_t sz)
 {
-    VirtQueueElement *elem = g_malloc(sz);
-    qemu_get_buffer(f, (uint8_t *)elem, sizeof(VirtQueueElement));
+    VirtQueueElement *elem;
+    VirtQueueElementOld data;
+    int i;
+
+    qemu_get_buffer(f, (uint8_t *)&data, sizeof(VirtQueueElementOld));
+
+    elem = virtqueue_alloc_element(sz, data.out_num, data.in_num);
+    elem->index = data.index;
+
+    for (i = 0; i < elem->in_num; i++) {
+        elem->in_addr[i] = data.in_addr[i];
+    }
+
+    for (i = 0; i < elem->out_num; i++) {
+        elem->out_addr[i] = data.out_addr[i];
+    }
+
+    for (i = 0; i < elem->in_num; i++) {
+        /* Base is overwritten by virtqueue_map.  */
+        elem->in_sg[i].iov_base = 0;
+        elem->in_sg[i].iov_len = data.in_sg[i].iov_len;
+    }
+
+    for (i = 0; i < elem->out_num; i++) {
+        /* Base is overwritten by virtqueue_map.  */
+        elem->out_sg[i].iov_base = 0;
+        elem->out_sg[i].iov_len = data.out_sg[i].iov_len;
+    }
+
     virtqueue_map(elem);
     return elem;
 }
 
 void qemu_put_virtqueue_element(QEMUFile *f, VirtQueueElement *elem)
 {
-    qemu_put_buffer(f, (uint8_t *)elem, sizeof(VirtQueueElement));
+    VirtQueueElementOld data;
+    int i;
+
+    memset(&data, 0, sizeof(data));
+    data.index = elem->index;
+    data.in_num = elem->in_num;
+    data.out_num = elem->out_num;
+
+    for (i = 0; i < elem->in_num; i++) {
+        data.in_addr[i] = elem->in_addr[i];
+    }
+
+    for (i = 0; i < elem->out_num; i++) {
+        data.out_addr[i] = elem->out_addr[i];
+    }
+
+    for (i = 0; i < elem->in_num; i++) {
+        /* Base is overwritten by virtqueue_map when loading.  Do not
+         * save it, as it would leak the QEMU address space layout.  */
+        data.in_sg[i].iov_len = elem->in_sg[i].iov_len;
+    }
+
+    for (i = 0; i < elem->out_num; i++) {
+        /* Do not save iov_base as above.  */
+        data.out_sg[i].iov_len = elem->out_sg[i].iov_len;
+    }
+    qemu_put_buffer(f, (uint8_t *)&data, sizeof(VirtQueueElementOld));
 }
 
 /* virtio device */
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 44da9a8..108cdb0 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -46,10 +46,10 @@ typedef struct VirtQueueElement
     unsigned int index;
     unsigned int out_num;
     unsigned int in_num;
-    hwaddr in_addr[VIRTQUEUE_MAX_SIZE];
-    hwaddr out_addr[VIRTQUEUE_MAX_SIZE];
-    struct iovec in_sg[VIRTQUEUE_MAX_SIZE];
-    struct iovec out_sg[VIRTQUEUE_MAX_SIZE];
+    hwaddr *in_addr;
+    hwaddr *out_addr;
+    struct iovec *in_sg;
+    struct iovec *out_sg;
 } VirtQueueElement;
 
 #define VIRTIO_QUEUE_MAX 1024
@@ -143,6 +143,7 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
 
 void virtio_del_queue(VirtIODevice *vdev, int n);
 
+void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_num);
 void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len);
 void virtqueue_flush(VirtQueue *vq, unsigned int count);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 05/10] virtio: slim down allocation of VirtQueueElements
  2016-01-15 12:41 [Qemu-devel] [PATCH 00/10] virtio/vring: optimization patches Paolo Bonzini
                   ` (3 preceding siblings ...)
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 04/10] virtio: introduce virtqueue_alloc_element Paolo Bonzini
@ 2016-01-15 12:41 ` Paolo Bonzini
  2016-01-19 15:54   ` Cornelia Huck
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 06/10] vring: " Paolo Bonzini
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2016-01-15 12:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Build the addresses and s/g lists on the stack, and then copy them
to a VirtQueueElement that is just as big as required to contain this
particular s/g list.  The cost of the copy is minimal compared to that
of a large malloc.

When virtqueue_map is used on the destination side of migration or on
loadvm, the iovecs have already been split at memory region boundary,
so we can just reuse the out_num/in_num we find in the file.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/virtio/virtio.c | 82 +++++++++++++++++++++++++++++++++---------------------
 1 file changed, 51 insertions(+), 31 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index a292ac9..6e5e6aa 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -448,6 +448,32 @@ int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
     return in_bytes <= in_total && out_bytes <= out_total;
 }
 
+static void virtqueue_map_desc(unsigned int *p_num_sg, hwaddr *addr, struct iovec *iov,
+                               unsigned int max_num_sg, bool is_write,
+                               hwaddr pa, size_t sz)
+{
+    unsigned num_sg = *p_num_sg;
+    assert(num_sg <= max_num_sg);
+
+    while (sz) {
+        hwaddr len = sz;
+
+        if (num_sg == max_num_sg) {
+            error_report("virtio: too many write descriptors in indirect table");
+            exit(1);
+        }
+
+        iov[num_sg].iov_base = cpu_physical_memory_map(pa, &len, is_write);
+        iov[num_sg].iov_len = len;
+        addr[num_sg] = pa;
+
+        sz -= len;
+        pa += len;
+        num_sg++;
+    }
+    *p_num_sg = num_sg;
+}
+
 static void virtqueue_map_iovec(struct iovec *sg, hwaddr *addr,
                                 unsigned int *num_sg, unsigned int max_size,
                                 int is_write)
@@ -474,20 +500,10 @@ static void virtqueue_map_iovec(struct iovec *sg, hwaddr *addr,
             error_report("virtio: error trying to map MMIO memory");
             exit(1);
         }
-        if (len == sg[i].iov_len) {
-            continue;
-        }
-        if (*num_sg >= max_size) {
-            error_report("virtio: memory split makes iovec too large");
+        if (len != sg[i].iov_len) {
+            error_report("virtio: unexpected memory split");
             exit(1);
         }
-        memmove(sg + i + 1, sg + i, sizeof(*sg) * (*num_sg - i));
-        memmove(addr + i + 1, addr + i, sizeof(*addr) * (*num_sg - i));
-        assert(len < sg[i + 1].iov_len);
-        sg[i].iov_len = len;
-        addr[i + 1] += len;
-        sg[i + 1].iov_len -= len;
-        ++*num_sg;
     }
 }
 
@@ -525,14 +541,16 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     hwaddr desc_pa = vq->vring.desc;
     VirtIODevice *vdev = vq->vdev;
     VirtQueueElement *elem;
+    unsigned out_num, in_num;
+    hwaddr addr[VIRTQUEUE_MAX_SIZE];
+    struct iovec iov[VIRTQUEUE_MAX_SIZE];
 
     if (!virtqueue_num_heads(vq, vq->last_avail_idx)) {
         return NULL;
     }
 
     /* When we start there are none of either input nor output. */
-    elem = virtqueue_alloc_element(sz, VIRTQUEUE_MAX_SIZE, VIRTQUEUE_MAX_SIZE);
-    elem->out_num = elem->in_num = 0;
+    out_num = in_num = 0;
 
     max = vq->vring.num;
 
@@ -555,37 +573,39 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 
     /* Collect all the descriptors */
     do {
-        struct iovec *sg;
+        hwaddr pa = vring_desc_addr(vdev, desc_pa, i);
+        size_t len = vring_desc_len(vdev, desc_pa, i);
 
         if (vring_desc_flags(vdev, desc_pa, i) & VRING_DESC_F_WRITE) {
-            if (elem->in_num >= VIRTQUEUE_MAX_SIZE) {
-                error_report("Too many write descriptors in indirect table");
-                exit(1);
-            }
-            elem->in_addr[elem->in_num] = vring_desc_addr(vdev, desc_pa, i);
-            sg = &elem->in_sg[elem->in_num++];
+            virtqueue_map_desc(&in_num, addr + out_num, iov + out_num,
+                               VIRTQUEUE_MAX_SIZE - out_num, 1, pa, len);
         } else {
-            if (elem->out_num >= VIRTQUEUE_MAX_SIZE) {
-                error_report("Too many read descriptors in indirect table");
+            if (in_num) {
+                error_report("Incorrect order for descriptors");
                 exit(1);
             }
-            elem->out_addr[elem->out_num] = vring_desc_addr(vdev, desc_pa, i);
-            sg = &elem->out_sg[elem->out_num++];
+            virtqueue_map_desc(&out_num, addr, iov,
+                               VIRTQUEUE_MAX_SIZE, 0, pa, len);
         }
 
-        sg->iov_len = vring_desc_len(vdev, desc_pa, i);
-
         /* If we've got too many, that implies a descriptor loop. */
-        if ((elem->in_num + elem->out_num) > max) {
+        if ((in_num + out_num) > max) {
             error_report("Looped descriptor");
             exit(1);
         }
     } while ((i = virtqueue_next_desc(vdev, desc_pa, i, max)) != max);
 
-    /* Now map what we have collected */
-    virtqueue_map(elem);
-
+    /* Now copy what we have collected and mapped */
+    elem = virtqueue_alloc_element(sz, out_num, in_num);
     elem->index = head;
+    for (i = 0; i < out_num; i++) {
+        elem->out_addr[i] = addr[i];
+        elem->out_sg[i] = iov[i];
+    }
+    for (i = 0; i < in_num; i++) {
+        elem->in_addr[i] = addr[out_num + i];
+        elem->in_sg[i] = iov[out_num + i];
+    }
 
     vq->inuse++;
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 06/10] vring: slim down allocation of VirtQueueElements
  2016-01-15 12:41 [Qemu-devel] [PATCH 00/10] virtio/vring: optimization patches Paolo Bonzini
                   ` (4 preceding siblings ...)
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 05/10] virtio: slim down allocation of VirtQueueElements Paolo Bonzini
@ 2016-01-15 12:41 ` Paolo Bonzini
  2016-01-19 15:58   ` Cornelia Huck
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 07/10] virtio: combine the read of a descriptor Paolo Bonzini
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2016-01-15 12:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Build the addresses and s/g lists on the stack, and then copy them
to a VirtQueueElement that is just as big as required to contain this
particular s/g list.  The cost of the copy is minimal compared to that
of a large malloc.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/virtio/dataplane/vring.c | 53 ++++++++++++++++++++++++++++++---------------
 1 file changed, 36 insertions(+), 17 deletions(-)

diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c
index c950caa..d6b8ba9 100644
--- a/hw/virtio/dataplane/vring.c
+++ b/hw/virtio/dataplane/vring.c
@@ -217,8 +217,14 @@ bool vring_should_notify(VirtIODevice *vdev, Vring *vring)
                             new, old);
 }
 
-
-static int get_desc(Vring *vring, VirtQueueElement *elem,
+typedef struct VirtQueueCurrentElement {
+    unsigned in_num;
+    unsigned out_num;
+    hwaddr addr[VIRTQUEUE_MAX_SIZE];
+    struct iovec iov[VIRTQUEUE_MAX_SIZE];
+} VirtQueueCurrentElement;
+
+static int get_desc(Vring *vring, VirtQueueCurrentElement *elem,
                     struct vring_desc *desc)
 {
     unsigned *num;
@@ -229,12 +235,12 @@ static int get_desc(Vring *vring, VirtQueueElement *elem,
 
     if (desc->flags & VRING_DESC_F_WRITE) {
         num = &elem->in_num;
-        iov = &elem->in_sg[*num];
-        addr = &elem->in_addr[*num];
+        iov = &elem->iov[elem->out_num + *num];
+        addr = &elem->addr[elem->out_num + *num];
     } else {
         num = &elem->out_num;
-        iov = &elem->out_sg[*num];
-        addr = &elem->out_addr[*num];
+        iov = &elem->iov[*num];
+        addr = &elem->addr[*num];
 
         /* If it's an output descriptor, they're all supposed
          * to come before any input descriptors. */
@@ -298,7 +304,8 @@ static bool read_vring_desc(VirtIODevice *vdev,
 
 /* This is stolen from linux/drivers/vhost/vhost.c. */
 static int get_indirect(VirtIODevice *vdev, Vring *vring,
-                        VirtQueueElement *elem, struct vring_desc *indirect)
+                        VirtQueueCurrentElement *cur_elem,
+                        struct vring_desc *indirect)
 {
     struct vring_desc desc;
     unsigned int i = 0, count, found = 0;
@@ -350,7 +357,7 @@ static int get_indirect(VirtIODevice *vdev, Vring *vring,
             return -EFAULT;
         }
 
-        ret = get_desc(vring, elem, &desc);
+        ret = get_desc(vring, cur_elem, &desc);
         if (ret < 0) {
             vring->broken |= (ret == -EFAULT);
             return ret;
@@ -393,6 +400,7 @@ void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t sz)
     struct vring_desc desc;
     unsigned int i, head, found = 0, num = vring->vr.num;
     uint16_t avail_idx, last_avail_idx;
+    VirtQueueCurrentElement cur_elem;
     VirtQueueElement *elem = NULL;
     int ret;
 
@@ -402,10 +410,7 @@ void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t sz)
         goto out;
     }
 
-    elem = virtqueue_alloc_element(sz, VIRTQUEUE_MAX_SIZE, VIRTQUEUE_MAX_SIZE);
-
-    /* Initialize elem so it can be safely unmapped */
-    elem->in_num = elem->out_num = 0;
+    cur_elem.in_num = cur_elem.out_num = 0;
 
     /* Check it isn't doing very strange things with descriptor numbers. */
     last_avail_idx = vring->last_avail_idx;
@@ -432,8 +437,6 @@ void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t sz)
      * the index we've seen. */
     head = vring_get_avail_ring(vdev, vring, last_avail_idx % num);
 
-    elem->index = head;
-
     /* If their number is silly, that's an error. */
     if (unlikely(head >= num)) {
         error_report("Guest says index %u > %u is available", head, num);
@@ -460,14 +463,14 @@ void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t sz)
         barrier();
 
         if (desc.flags & VRING_DESC_F_INDIRECT) {
-            ret = get_indirect(vdev, vring, elem, &desc);
+            ret = get_indirect(vdev, vring, &cur_elem, &desc);
             if (ret < 0) {
                 goto out;
             }
             continue;
         }
 
-        ret = get_desc(vring, elem, &desc);
+        ret = get_desc(vring, &cur_elem, &desc);
         if (ret < 0) {
             goto out;
         }
@@ -482,6 +485,18 @@ void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t sz)
             virtio_tswap16(vdev, vring->last_avail_idx);
     }
 
+    /* Now copy what we have collected and mapped */
+    elem = virtqueue_alloc_element(sz, cur_elem.out_num, cur_elem.in_num);
+    elem->index = head;
+    for (i = 0; i < cur_elem.out_num; i++) {
+        elem->out_addr[i] = cur_elem.addr[i];
+        elem->out_sg[i] = cur_elem.iov[i];
+    }
+    for (i = 0; i < cur_elem.in_num; i++) {
+        elem->in_addr[i] = cur_elem.addr[cur_elem.out_num + i];
+        elem->in_sg[i] = cur_elem.iov[cur_elem.out_num + i];
+    }
+
     return elem;
 
 out:
@@ -489,7 +504,11 @@ out:
     if (ret == -EFAULT) {
         vring->broken = true;
     }
-    vring_unmap_element(elem);
+
+    for (i = 0; i < cur_elem.out_num + cur_elem.in_num; i++) {
+        vring_unmap(cur_elem.iov[i].iov_base, false);
+    }
+
     g_free(elem);
     return NULL;
 }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 07/10] virtio: combine the read of a descriptor
  2016-01-15 12:41 [Qemu-devel] [PATCH 00/10] virtio/vring: optimization patches Paolo Bonzini
                   ` (5 preceding siblings ...)
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 06/10] vring: " Paolo Bonzini
@ 2016-01-15 12:41 ` Paolo Bonzini
  2016-01-19 16:07   ` Cornelia Huck
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 08/10] virtio: cache used_idx in a VirtQueue field Paolo Bonzini
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2016-01-15 12:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Compared to vring, virtio has a performance penalty of 10%.  Fix it
by combining all the reads for a descriptor in a single address_space_read
call.  This also simplifies the code nicely.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/virtio/virtio.c | 86 ++++++++++++++++++++++--------------------------------
 1 file changed, 35 insertions(+), 51 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 6e5e6aa..6ae7bdd 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -107,35 +107,15 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n)
                               vring->align);
 }
 
-static inline uint64_t vring_desc_addr(VirtIODevice *vdev, hwaddr desc_pa,
-                                       int i)
+static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
+                            hwaddr desc_pa, int i)
 {
-    hwaddr pa;
-    pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, addr);
-    return virtio_ldq_phys(vdev, pa);
-}
-
-static inline uint32_t vring_desc_len(VirtIODevice *vdev, hwaddr desc_pa, int i)
-{
-    hwaddr pa;
-    pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, len);
-    return virtio_ldl_phys(vdev, pa);
-}
-
-static inline uint16_t vring_desc_flags(VirtIODevice *vdev, hwaddr desc_pa,
-                                        int i)
-{
-    hwaddr pa;
-    pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, flags);
-    return virtio_lduw_phys(vdev, pa);
-}
-
-static inline uint16_t vring_desc_next(VirtIODevice *vdev, hwaddr desc_pa,
-                                       int i)
-{
-    hwaddr pa;
-    pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, next);
-    return virtio_lduw_phys(vdev, pa);
+    address_space_read(&address_space_memory, desc_pa + i * sizeof(VRingDesc),
+                       MEMTXATTRS_UNSPECIFIED, (void *)desc, sizeof(VRingDesc));
+    virtio_tswap64s(vdev, &desc->addr);
+    virtio_tswap32s(vdev, &desc->len);
+    virtio_tswap16s(vdev, &desc->flags);
+    virtio_tswap16s(vdev, &desc->next);
 }
 
 static inline uint16_t vring_avail_flags(VirtQueue *vq)
@@ -345,18 +325,18 @@ static unsigned int virtqueue_get_head(VirtQueue *vq, unsigned int idx)
     return head;
 }
 
-static unsigned virtqueue_next_desc(VirtIODevice *vdev, hwaddr desc_pa,
-                                    unsigned int i, unsigned int max)
+static unsigned virtqueue_read_next_desc(VirtIODevice *vdev, VRingDesc *desc,
+                                         hwaddr desc_pa, unsigned int max)
 {
     unsigned int next;
 
     /* If this descriptor says it doesn't chain, we're done. */
-    if (!(vring_desc_flags(vdev, desc_pa, i) & VRING_DESC_F_NEXT)) {
+    if (!(desc->flags & VRING_DESC_F_NEXT)) {
         return max;
     }
 
     /* Check they're not leading us off end of descriptors. */
-    next = vring_desc_next(vdev, desc_pa, i);
+    next = desc->next;
     /* Make sure compiler knows to grab that: we don't want it changing! */
     smp_wmb();
 
@@ -365,6 +345,7 @@ static unsigned virtqueue_next_desc(VirtIODevice *vdev, hwaddr desc_pa,
         exit(1);
     }
 
+    vring_desc_read(vdev, desc, desc_pa, next);
     return next;
 }
 
@@ -381,6 +362,7 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
     while (virtqueue_num_heads(vq, idx)) {
         VirtIODevice *vdev = vq->vdev;
         unsigned int max, num_bufs, indirect = 0;
+        VRingDesc desc;
         hwaddr desc_pa;
         int i;
 
@@ -388,9 +370,10 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
         num_bufs = total_bufs;
         i = virtqueue_get_head(vq, idx++);
         desc_pa = vq->vring.desc;
+        vring_desc_read(vdev, &desc, desc_pa, i);
 
-        if (vring_desc_flags(vdev, desc_pa, i) & VRING_DESC_F_INDIRECT) {
-            if (vring_desc_len(vdev, desc_pa, i) % sizeof(VRingDesc)) {
+        if (desc.flags & VRING_DESC_F_INDIRECT) {
+            if (desc.len % sizeof(VRingDesc)) {
                 error_report("Invalid size for indirect buffer table");
                 exit(1);
             }
@@ -403,9 +386,10 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
 
             /* loop over the indirect descriptor table */
             indirect = 1;
-            max = vring_desc_len(vdev, desc_pa, i) / sizeof(VRingDesc);
-            desc_pa = vring_desc_addr(vdev, desc_pa, i);
+            max = desc.len / sizeof(VRingDesc);
+            desc_pa = desc.addr;
             num_bufs = i = 0;
+            vring_desc_read(vdev, &desc, desc_pa, i);
         }
 
         do {
@@ -415,15 +399,15 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
                 exit(1);
             }
 
-            if (vring_desc_flags(vdev, desc_pa, i) & VRING_DESC_F_WRITE) {
-                in_total += vring_desc_len(vdev, desc_pa, i);
+            if (desc.flags & VRING_DESC_F_WRITE) {
+                in_total += desc.len;
             } else {
-                out_total += vring_desc_len(vdev, desc_pa, i);
+                out_total += desc.len;
             }
             if (in_total >= max_in_bytes && out_total >= max_out_bytes) {
                 goto done;
             }
-        } while ((i = virtqueue_next_desc(vdev, desc_pa, i, max)) != max);
+        } while ((i = virtqueue_read_next_desc(vdev, &desc, desc_pa, max)) != max);
 
         if (!indirect)
             total_bufs = num_bufs;
@@ -544,6 +528,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     unsigned out_num, in_num;
     hwaddr addr[VIRTQUEUE_MAX_SIZE];
     struct iovec iov[VIRTQUEUE_MAX_SIZE];
+    VRingDesc desc;
 
     if (!virtqueue_num_heads(vq, vq->last_avail_idx)) {
         return NULL;
@@ -559,33 +544,32 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
         vring_set_avail_event(vq, vq->last_avail_idx);
     }
 
-    if (vring_desc_flags(vdev, desc_pa, i) & VRING_DESC_F_INDIRECT) {
-        if (vring_desc_len(vdev, desc_pa, i) % sizeof(VRingDesc)) {
+    vring_desc_read(vdev, &desc, desc_pa, i);
+    if (desc.flags & VRING_DESC_F_INDIRECT) {
+        if (desc.len % sizeof(VRingDesc)) {
             error_report("Invalid size for indirect buffer table");
             exit(1);
         }
 
         /* loop over the indirect descriptor table */
-        max = vring_desc_len(vdev, desc_pa, i) / sizeof(VRingDesc);
-        desc_pa = vring_desc_addr(vdev, desc_pa, i);
+        max = desc.len / sizeof(VRingDesc);
+        desc_pa = desc.addr;
         i = 0;
+        vring_desc_read(vdev, &desc, desc_pa, i);
     }
 
     /* Collect all the descriptors */
     do {
-        hwaddr pa = vring_desc_addr(vdev, desc_pa, i);
-        size_t len = vring_desc_len(vdev, desc_pa, i);
-
-        if (vring_desc_flags(vdev, desc_pa, i) & VRING_DESC_F_WRITE) {
+        if (desc.flags & VRING_DESC_F_WRITE) {
             virtqueue_map_desc(&in_num, addr + out_num, iov + out_num,
-                               VIRTQUEUE_MAX_SIZE - out_num, 1, pa, len);
+                               VIRTQUEUE_MAX_SIZE - out_num, 1, desc.addr, desc.len);
         } else {
             if (in_num) {
                 error_report("Incorrect order for descriptors");
                 exit(1);
             }
             virtqueue_map_desc(&out_num, addr, iov,
-                               VIRTQUEUE_MAX_SIZE, 0, pa, len);
+                               VIRTQUEUE_MAX_SIZE, 0, desc.addr, desc.len);
         }
 
         /* If we've got too many, that implies a descriptor loop. */
@@ -593,7 +577,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
             error_report("Looped descriptor");
             exit(1);
         }
-    } while ((i = virtqueue_next_desc(vdev, desc_pa, i, max)) != max);
+    } while ((i = virtqueue_read_next_desc(vdev, &desc, desc_pa, max)) != max);
 
     /* Now copy what we have collected and mapped */
     elem = virtqueue_alloc_element(sz, out_num, in_num);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 08/10] virtio: cache used_idx in a VirtQueue field
  2016-01-15 12:41 [Qemu-devel] [PATCH 00/10] virtio/vring: optimization patches Paolo Bonzini
                   ` (6 preceding siblings ...)
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 07/10] virtio: combine the read of a descriptor Paolo Bonzini
@ 2016-01-15 12:41 ` Paolo Bonzini
  2016-01-19 16:11   ` Cornelia Huck
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 09/10] virtio: read avail_idx from VQ only when necessary Paolo Bonzini
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 10/10] virtio: combine write of an entry into used ring Paolo Bonzini
  9 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2016-01-15 12:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vincenzo Maffione, mst

From: Vincenzo Maffione <v.maffione@gmail.com>

Accessing used_idx in the VQ requires an expensive access to
guest physical memory. Before this patch, 3 accesses are normally
done for each pop/push/notify call. However, since the used_idx is
only written by us, we can track it in our internal data structure.

Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
Message-Id: <3d062ec54e9a7bf9fb325c1fd693564951f2b319.1450218353.git.v.maffione@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/virtio/virtio.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 6ae7bdd..3e7c6bf 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -71,6 +71,9 @@ struct VirtQueue
 {
     VRing vring;
     uint16_t last_avail_idx;
+
+    uint16_t used_idx;
+
     /* Last used index value we have signalled on */
     uint16_t signalled_used;
 
@@ -170,6 +173,7 @@ static inline void vring_used_idx_set(VirtQueue *vq, uint16_t val)
     hwaddr pa;
     pa = vq->vring.used + offsetof(VRingUsed, idx);
     virtio_stw_phys(vq->vdev, pa, val);
+    vq->used_idx = val;
 }
 
 static inline void vring_used_flags_set_bit(VirtQueue *vq, int mask)
@@ -261,7 +265,7 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
 
     virtqueue_unmap_sg(vq, elem, len);
 
-    idx = (idx + vring_used_idx(vq)) % vq->vring.num;
+    idx = (idx + vq->used_idx) % vq->vring.num;
 
     /* Get a pointer to the next entry in the used ring. */
     vring_used_ring_id(vq, idx, elem->index);
@@ -274,7 +278,7 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
     /* Make sure buffer is written before we update index. */
     smp_wmb();
     trace_virtqueue_flush(vq, count);
-    old = vring_used_idx(vq);
+    old = vq->used_idx;
     new = old + count;
     vring_used_idx_set(vq, new);
     vq->inuse -= count;
@@ -782,6 +786,7 @@ void virtio_reset(void *opaque)
         vdev->vq[i].vring.avail = 0;
         vdev->vq[i].vring.used = 0;
         vdev->vq[i].last_avail_idx = 0;
+        vdev->vq[i].used_idx = 0;
         virtio_queue_set_vector(vdev, i, VIRTIO_NO_VECTOR);
         vdev->vq[i].signalled_used = 0;
         vdev->vq[i].signalled_used_valid = false;
@@ -1161,7 +1166,7 @@ static bool vring_notify(VirtIODevice *vdev, VirtQueue *vq)
     v = vq->signalled_used_valid;
     vq->signalled_used_valid = true;
     old = vq->signalled_used;
-    new = vq->signalled_used = vring_used_idx(vq);
+    new = vq->signalled_used = vq->used_idx;
     return !v || vring_need_event(vring_get_used_event(vq), new, old);
 }
 
@@ -1573,6 +1578,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
                              vdev->vq[i].last_avail_idx, nheads);
                 return -1;
             }
+            vdev->vq[i].used_idx = vring_used_idx(&vdev->vq[i]);
         }
     }
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 09/10] virtio: read avail_idx from VQ only when necessary
  2016-01-15 12:41 [Qemu-devel] [PATCH 00/10] virtio/vring: optimization patches Paolo Bonzini
                   ` (7 preceding siblings ...)
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 08/10] virtio: cache used_idx in a VirtQueue field Paolo Bonzini
@ 2016-01-15 12:41 ` Paolo Bonzini
  2016-01-19 16:20   ` Cornelia Huck
  2016-01-19 16:54   ` Michael S. Tsirkin
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 10/10] virtio: combine write of an entry into used ring Paolo Bonzini
  9 siblings, 2 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-01-15 12:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vincenzo Maffione, mst

From: Vincenzo Maffione <v.maffione@gmail.com>

The virtqueue_pop() implementation needs to check if the avail ring
contains some pending buffers. To perform this check, it is not
always necessary to fetch the avail_idx in the VQ memory, which is
expensive. This patch introduces a shadow variable tracking avail_idx
and modifies virtio_queue_empty() to access avail_idx in physical
memory only when necessary.

Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
Message-Id: <b617d6459902773d9f4ab843bfaca764f5af8eda.1450218353.git.v.maffione@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/virtio/virtio.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 3e7c6bf..01142da 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -70,8 +70,13 @@ typedef struct VRing
 struct VirtQueue
 {
     VRing vring;
+
+    /* Next head to pop */
     uint16_t last_avail_idx;
 
+    /* Last avail_idx read from VQ. */
+    uint16_t shadow_avail_idx;
+
     uint16_t used_idx;
 
     /* Last used index value we have signalled on */
@@ -132,7 +137,8 @@ static inline uint16_t vring_avail_idx(VirtQueue *vq)
 {
     hwaddr pa;
     pa = vq->vring.avail + offsetof(VRingAvail, idx);
-    return virtio_lduw_phys(vq->vdev, pa);
+    vq->shadow_avail_idx = virtio_lduw_phys(vq->vdev, pa);
+    return vq->shadow_avail_idx;
 }
 
 static inline uint16_t vring_avail_ring(VirtQueue *vq, int i)
@@ -223,8 +229,14 @@ int virtio_queue_ready(VirtQueue *vq)
     return vq->vring.avail != 0;
 }
 
+/* Fetch avail_idx from VQ memory only when we really need to know if
+ * guest has added some buffers. */
 int virtio_queue_empty(VirtQueue *vq)
 {
+    if (vq->shadow_avail_idx != vq->last_avail_idx) {
+        return 0;
+    }
+
     return vring_avail_idx(vq) == vq->last_avail_idx;
 }
 
@@ -300,7 +312,7 @@ static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
     /* Check it isn't doing very strange things with descriptor numbers. */
     if (num_heads > vq->vring.num) {
         error_report("Guest moved used index from %u to %u",
-                     idx, vring_avail_idx(vq));
+                     idx, vq->shadow_avail_idx);
         exit(1);
     }
     /* On success, callers read a descriptor at vq->last_avail_idx.
@@ -534,9 +546,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     struct iovec iov[VIRTQUEUE_MAX_SIZE];
     VRingDesc desc;
 
-    if (!virtqueue_num_heads(vq, vq->last_avail_idx)) {
+    if (virtio_queue_empty(vq)) {
         return NULL;
     }
+    /* Needed after virtio_queue_empty(), see comment in
+     * virtqueue_num_heads(). */
+    smp_rmb();
 
     /* When we start there are none of either input nor output. */
     out_num = in_num = 0;
@@ -786,6 +801,7 @@ void virtio_reset(void *opaque)
         vdev->vq[i].vring.avail = 0;
         vdev->vq[i].vring.used = 0;
         vdev->vq[i].last_avail_idx = 0;
+        vdev->vq[i].shadow_avail_idx = 0;
         vdev->vq[i].used_idx = 0;
         virtio_queue_set_vector(vdev, i, VIRTIO_NO_VECTOR);
         vdev->vq[i].signalled_used = 0;
@@ -1155,7 +1171,7 @@ static bool vring_notify(VirtIODevice *vdev, VirtQueue *vq)
     smp_mb();
     /* Always notify when queue is empty (when feature acknowledge) */
     if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFY_ON_EMPTY) &&
-        !vq->inuse && vring_avail_idx(vq) == vq->last_avail_idx) {
+        !vq->inuse && virtio_queue_empty(vq)) {
         return true;
     }
 
@@ -1579,6 +1595,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
                 return -1;
             }
             vdev->vq[i].used_idx = vring_used_idx(&vdev->vq[i]);
+            vdev->vq[i].shadow_avail_idx = vring_avail_idx(&vdev->vq[i]);
         }
     }
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 10/10] virtio: combine write of an entry into used ring
  2016-01-15 12:41 [Qemu-devel] [PATCH 00/10] virtio/vring: optimization patches Paolo Bonzini
                   ` (8 preceding siblings ...)
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 09/10] virtio: read avail_idx from VQ only when necessary Paolo Bonzini
@ 2016-01-15 12:41 ` Paolo Bonzini
  2016-01-19 16:26   ` Cornelia Huck
  9 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2016-01-15 12:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vincenzo Maffione, mst

From: Vincenzo Maffione <v.maffione@gmail.com>

Fill in an element of the used ring with a single combined access to the
guest physical memory, rather than using two separated accesses.
This reduces the overhead due to expensive address translation.

Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
Message-Id: <e4a89a767a4a92cbb6bcc551e151487eb36e1722.1450218353.git.v.maffione@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/virtio/virtio.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 01142da..96bd420 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -153,18 +153,15 @@ static inline uint16_t vring_get_used_event(VirtQueue *vq)
     return vring_avail_ring(vq, vq->vring.num);
 }
 
-static inline void vring_used_ring_id(VirtQueue *vq, int i, uint32_t val)
+static inline void vring_used_write(VirtQueue *vq, VRingUsedElem *uelem,
+                                    int i)
 {
     hwaddr pa;
-    pa = vq->vring.used + offsetof(VRingUsed, ring[i].id);
-    virtio_stl_phys(vq->vdev, pa, val);
-}
-
-static inline void vring_used_ring_len(VirtQueue *vq, int i, uint32_t val)
-{
-    hwaddr pa;
-    pa = vq->vring.used + offsetof(VRingUsed, ring[i].len);
-    virtio_stl_phys(vq->vdev, pa, val);
+    virtio_tswap32s(vq->vdev, &uelem->id);
+    virtio_tswap32s(vq->vdev, &uelem->len);
+    pa = vq->vring.used + offsetof(VRingUsed, ring[i]);
+    address_space_write(&address_space_memory, pa, MEMTXATTRS_UNSPECIFIED,
+                       (void *)uelem, sizeof(VRingUsedElem));
 }
 
 static uint16_t vring_used_idx(VirtQueue *vq)
@@ -273,15 +270,17 @@ void virtqueue_discard(VirtQueue *vq, const VirtQueueElement *elem,
 void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len, unsigned int idx)
 {
+    VRingUsedElem uelem;
+
     trace_virtqueue_fill(vq, elem, len, idx);
 
     virtqueue_unmap_sg(vq, elem, len);
 
     idx = (idx + vq->used_idx) % vq->vring.num;
 
-    /* Get a pointer to the next entry in the used ring. */
-    vring_used_ring_id(vq, idx, elem->index);
-    vring_used_ring_len(vq, idx, len);
+    uelem.id = elem->index;
+    uelem.len = len;
+    vring_used_write(vq, &uelem, idx);
 }
 
 void virtqueue_flush(VirtQueue *vq, unsigned int count)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] virtio: move VirtQueueElement at the beginning of the structs
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 01/10] virtio: move VirtQueueElement at the beginning of the structs Paolo Bonzini
@ 2016-01-19 12:09   ` Cornelia Huck
  2016-01-19 13:22     ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Cornelia Huck @ 2016-01-19 12:09 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, mst

On Fri, 15 Jan 2016 13:41:49 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> The next patch will make virtqueue_pop/vring_pop allocate memory for a

s/will make/will make it possible for/

?

I had to spend some time grepping through the code to find that blk and
scsi (and gpu, which already had elem at the beginning of its
structure) are the only ones that work like this and that other devices
do not need any change.

> "subclass" of VirtQueueElement.  For this to work, VirtQueueElement
> must be the first field in the containing struct.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/scsi/virtio-scsi.c           |  3 +--
>  include/hw/virtio/virtio-blk.h  |  2 +-
>  include/hw/virtio/virtio-scsi.h | 13 ++++++-------
>  3 files changed, 8 insertions(+), 10 deletions(-)

Otherwise,

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] virtio: move allocation to virtqueue_pop/vring_pop
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 02/10] virtio: move allocation to virtqueue_pop/vring_pop Paolo Bonzini
@ 2016-01-19 12:22   ` Cornelia Huck
  2016-01-19 13:16     ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Cornelia Huck @ 2016-01-19 12:22 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, mst

On Fri, 15 Jan 2016 13:41:50 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> The return code of virtqueue_pop/vring_pop is unused except to check for
> errors or 0.  We can thus easily move allocation inside the functions
> and just return a pointer to the VirtQueueElement.

Like this change.

> 
> The advantage is that we will be able to allocate only the space that
> is needed for the actual size of the s/g list instead of the full
> VIRTQUEUE_MAX_SIZE items.  Currently VirtQueueElement takes about 48K
> of memory, and this kind of allocation puts a lot of stress on malloc.
> By cutting the size by two or three orders of magnitude, malloc can
> use much more efficient algorithms.
> 
> The patch is pretty large, but changes to each device are testable
> more or less independently.  Splitting it would mostly add churn.

Would it help to add a no-frills virtqueue_pop() version that simply
allocates the base VirtQueueElement and use a _size() variant for those
callers that want an extended structure?

> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/9pfs/9p.c                        |  2 +-
>  hw/9pfs/virtio-9p-device.c          | 16 ++++----
>  hw/9pfs/virtio-9p.h                 |  2 +-
>  hw/block/dataplane/virtio-blk.c     | 11 +++--
>  hw/block/virtio-blk.c               | 15 +++----
>  hw/char/virtio-serial-bus.c         | 80 +++++++++++++++++++++++--------------
>  hw/display/virtio-gpu.c             | 25 +++++++-----
>  hw/input/virtio-input.c             | 24 +++++++----
>  hw/net/virtio-net.c                 | 69 ++++++++++++++++++++------------
>  hw/scsi/virtio-scsi-dataplane.c     | 15 +++----
>  hw/scsi/virtio-scsi.c               | 18 ++++-----
>  hw/virtio/dataplane/vring.c         | 17 ++++----
>  hw/virtio/virtio-balloon.c          | 22 ++++++----
>  hw/virtio/virtio-rng.c              | 10 +++--
>  hw/virtio/virtio.c                  | 11 +++--
>  include/hw/virtio/dataplane/vring.h |  2 +-
>  include/hw/virtio/virtio-balloon.h  |  2 +-
>  include/hw/virtio/virtio-blk.h      |  3 +-
>  include/hw/virtio/virtio-net.h      |  2 +-
>  include/hw/virtio/virtio-scsi.h     |  2 +-
>  include/hw/virtio/virtio-serial.h   |  2 +-
>  include/hw/virtio/virtio.h          |  2 +-
>  22 files changed, 209 insertions(+), 143 deletions(-)

(...)

> diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c
> index 23f667e..1b900fc 100644
> --- a/hw/virtio/dataplane/vring.c
> +++ b/hw/virtio/dataplane/vring.c
> @@ -388,23 +388,25 @@ static void vring_unmap_element(VirtQueueElement *elem)
>   *
>   * Stolen from linux/drivers/vhost/vhost.c.
>   */
> -int vring_pop(VirtIODevice *vdev, Vring *vring,
> -              VirtQueueElement *elem)
> +void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t sz)
>  {
>      struct vring_desc desc;
>      unsigned int i, head, found = 0, num = vring->vr.num;
>      uint16_t avail_idx, last_avail_idx;
> +    VirtQueueElement *elem = NULL;
>      int ret;

assert(sz >= sizeof(VirtQueueElement));

?

> 
> -    /* Initialize elem so it can be safely unmapped */
> -    elem->in_num = elem->out_num = 0;
> -
>      /* If there was a fatal error then refuse operation */
>      if (vring->broken) {
>          ret = -EFAULT;
>          goto out;
>      }
> 
> +    elem = g_malloc(sz);
> +
> +    /* Initialize elem so it can be safely unmapped */
> +    elem->in_num = elem->out_num = 0;
> +
>      /* Check it isn't doing very strange things with descriptor numbers. */
>      last_avail_idx = vring->last_avail_idx;
>      avail_idx = vring_get_avail_idx(vdev, vring);

(...)

> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index bd6b4df..6c4c8bc 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -501,16 +501,19 @@ void virtqueue_map(VirtQueueElement *elem)
>                          0);
>  }
> 
> -int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem)
> +void *virtqueue_pop(VirtQueue *vq, size_t sz)
>  {
>      unsigned int i, head, max;
>      hwaddr desc_pa = vq->vring.desc;
>      VirtIODevice *vdev = vq->vdev;
> +    VirtQueueElement *elem;

assert(sz >= sizeof(VirtQueueElement));

here as well?

> 
> -    if (!virtqueue_num_heads(vq, vq->last_avail_idx))
> -        return 0;
> +    if (!virtqueue_num_heads(vq, vq->last_avail_idx)) {
> +        return NULL;
> +    }
> 
>      /* When we start there are none of either input nor output. */
> +    elem = g_malloc(sz);
>      elem->out_num = elem->in_num = 0;
> 
>      max = vq->vring.num;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 03/10] virtio: introduce qemu_get/put_virtqueue_element
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 03/10] virtio: introduce qemu_get/put_virtqueue_element Paolo Bonzini
@ 2016-01-19 12:30   ` Cornelia Huck
  0 siblings, 0 replies; 29+ messages in thread
From: Cornelia Huck @ 2016-01-19 12:30 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, mst

On Fri, 15 Jan 2016 13:41:51 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Move allocation to virtio functions also when loading/saving a
> VirtQueueElement.  This will also let the load/save functions
> keep backwards compatibility when the VirtQueueElement layout
> is changed.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/block/virtio-blk.c       | 10 +++-------
>  hw/char/virtio-serial-bus.c | 10 +++-------
>  hw/scsi/virtio-scsi.c       |  7 ++-----
>  hw/virtio/virtio.c          | 13 +++++++++++++
>  include/hw/virtio/virtio.h  |  2 ++
>  5 files changed, 23 insertions(+), 19 deletions(-)

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 04/10] virtio: introduce virtqueue_alloc_element
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 04/10] virtio: introduce virtqueue_alloc_element Paolo Bonzini
@ 2016-01-19 12:40   ` Cornelia Huck
  0 siblings, 0 replies; 29+ messages in thread
From: Cornelia Huck @ 2016-01-19 12:40 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, mst

On Fri, 15 Jan 2016 13:41:52 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Allocate the arrays for in_addr/out_addr/in_sg/out_sg outside the
> VirtQueueElement.  For now, virtqueue_pop and vring_pop keep
> allocating a very large VirtQueueElement.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/virtio/dataplane/vring.c |   2 +-
>  hw/virtio/virtio.c          | 109 ++++++++++++++++++++++++++++++++++++++++----
>  include/hw/virtio/virtio.h  |   9 ++--
>  3 files changed, 105 insertions(+), 15 deletions(-)

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] virtio: move allocation to virtqueue_pop/vring_pop
  2016-01-19 12:22   ` Cornelia Huck
@ 2016-01-19 13:16     ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-01-19 13:16 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: qemu-devel, mst



On 19/01/2016 13:22, Cornelia Huck wrote:
> > The patch is pretty large, but changes to each device are testable
> > more or less independently.  Splitting it would mostly add churn.
> 
> Would it help to add a no-frills virtqueue_pop() version that simply
> allocates the base VirtQueueElement and use a _size() variant for those
> callers that want an extended structure?

Not too much, the churn mostly comes from the VQE* being returned from
virtqueue_pop, rather than passed to it.

> assert(sz >= sizeof(VirtQueueElement));

Good idea.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] virtio: move VirtQueueElement at the beginning of the structs
  2016-01-19 12:09   ` Cornelia Huck
@ 2016-01-19 13:22     ` Paolo Bonzini
  2016-01-19 14:01       ` Cornelia Huck
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2016-01-19 13:22 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: qemu-devel, mst



On 19/01/2016 13:09, Cornelia Huck wrote:
>> > The next patch will make virtqueue_pop/vring_pop allocate memory for a
> s/will make/will make it possible for/
> 
> ?

This patch will actually do that.  This patch makes it possible.

> I had to spend some time grepping through the code to find that blk and
> scsi (and gpu, which already had elem at the beginning of its
> structure) are the only ones that work like this and that other devices
> do not need any change.
> 
>> > "subclass" of VirtQueueElement.  For this to work, VirtQueueElement
>> > must be the first field in the containing struct.

So...

The next patch will make virtqueue_pop/vring_pop allocate memory for the
VirtQueueElement.  In some cases (blk, scsi, gpu) the device wants to
extend VirtQueueElement with device-specific fields and, until now, the
place of the VirtQueueElement within the containing struct didn't
matter.  When allocating the entire block in virtqueue_pop/vring_pop,
however, the containing struct must basically be a "subclass" of
VirtQueueElement, with the VirtQueueElement as the first field.  Make
that the case for blk and scsi; gpu is already doing it.

Paolo

>> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> > ---
>> >  hw/scsi/virtio-scsi.c           |  3 +--
>> >  include/hw/virtio/virtio-blk.h  |  2 +-
>> >  include/hw/virtio/virtio-scsi.h | 13 ++++++-------
>> >  3 files changed, 8 insertions(+), 10 deletions(-)
> Otherwise,
> 
> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> 
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] virtio: move VirtQueueElement at the beginning of the structs
  2016-01-19 13:22     ` Paolo Bonzini
@ 2016-01-19 14:01       ` Cornelia Huck
  0 siblings, 0 replies; 29+ messages in thread
From: Cornelia Huck @ 2016-01-19 14:01 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, mst

On Tue, 19 Jan 2016 14:22:37 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> The next patch will make virtqueue_pop/vring_pop allocate memory for the
> VirtQueueElement.  In some cases (blk, scsi, gpu) the device wants to
> extend VirtQueueElement with device-specific fields and, until now, the
> place of the VirtQueueElement within the containing struct didn't
> matter.  When allocating the entire block in virtqueue_pop/vring_pop,
> however, the containing struct must basically be a "subclass" of
> VirtQueueElement, with the VirtQueueElement as the first field.  Make
> that the case for blk and scsi; gpu is already doing it.

Sounds good!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 05/10] virtio: slim down allocation of VirtQueueElements
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 05/10] virtio: slim down allocation of VirtQueueElements Paolo Bonzini
@ 2016-01-19 15:54   ` Cornelia Huck
  0 siblings, 0 replies; 29+ messages in thread
From: Cornelia Huck @ 2016-01-19 15:54 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, mst

On Fri, 15 Jan 2016 13:41:53 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Build the addresses and s/g lists on the stack, and then copy them
> to a VirtQueueElement that is just as big as required to contain this
> particular s/g list.  The cost of the copy is minimal compared to that
> of a large malloc.
> 
> When virtqueue_map is used on the destination side of migration or on
> loadvm, the iovecs have already been split at memory region boundary,
> so we can just reuse the out_num/in_num we find in the file.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/virtio/virtio.c | 82 +++++++++++++++++++++++++++++++++---------------------
>  1 file changed, 51 insertions(+), 31 deletions(-)
> 

>      /* Collect all the descriptors */
>      do {
> -        struct iovec *sg;
> +        hwaddr pa = vring_desc_addr(vdev, desc_pa, i);
> +        size_t len = vring_desc_len(vdev, desc_pa, i);
> 
>          if (vring_desc_flags(vdev, desc_pa, i) & VRING_DESC_F_WRITE) {
> -            if (elem->in_num >= VIRTQUEUE_MAX_SIZE) {
> -                error_report("Too many write descriptors in indirect table");
> -                exit(1);
> -            }
> -            elem->in_addr[elem->in_num] = vring_desc_addr(vdev, desc_pa, i);
> -            sg = &elem->in_sg[elem->in_num++];
> +            virtqueue_map_desc(&in_num, addr + out_num, iov + out_num,
> +                               VIRTQUEUE_MAX_SIZE - out_num, 1, pa, len);

Minor nit: I'd probably use 'true' here...

>          } else {
> -            if (elem->out_num >= VIRTQUEUE_MAX_SIZE) {
> -                error_report("Too many read descriptors in indirect table");
> +            if (in_num) {
> +                error_report("Incorrect order for descriptors");
>                  exit(1);
>              }
> -            elem->out_addr[elem->out_num] = vring_desc_addr(vdev, desc_pa, i);
> -            sg = &elem->out_sg[elem->out_num++];
> +            virtqueue_map_desc(&out_num, addr, iov,
> +                               VIRTQUEUE_MAX_SIZE, 0, pa, len);

...and 'false' here.

>          }
> 
> -        sg->iov_len = vring_desc_len(vdev, desc_pa, i);
> -
>          /* If we've got too many, that implies a descriptor loop. */
> -        if ((elem->in_num + elem->out_num) > max) {
> +        if ((in_num + out_num) > max) {
>              error_report("Looped descriptor");
>              exit(1);
>          }
>      } while ((i = virtqueue_next_desc(vdev, desc_pa, i, max)) != max);

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 06/10] vring: slim down allocation of VirtQueueElements
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 06/10] vring: " Paolo Bonzini
@ 2016-01-19 15:58   ` Cornelia Huck
  0 siblings, 0 replies; 29+ messages in thread
From: Cornelia Huck @ 2016-01-19 15:58 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, mst

On Fri, 15 Jan 2016 13:41:54 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Build the addresses and s/g lists on the stack, and then copy them
> to a VirtQueueElement that is just as big as required to contain this
> particular s/g list.  The cost of the copy is minimal compared to that
> of a large malloc.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/virtio/dataplane/vring.c | 53 ++++++++++++++++++++++++++++++---------------
>  1 file changed, 36 insertions(+), 17 deletions(-)

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 07/10] virtio: combine the read of a descriptor
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 07/10] virtio: combine the read of a descriptor Paolo Bonzini
@ 2016-01-19 16:07   ` Cornelia Huck
  0 siblings, 0 replies; 29+ messages in thread
From: Cornelia Huck @ 2016-01-19 16:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, mst

On Fri, 15 Jan 2016 13:41:55 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Compared to vring, virtio has a performance penalty of 10%.  Fix it
> by combining all the reads for a descriptor in a single address_space_read
> call.  This also simplifies the code nicely.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/virtio/virtio.c | 86 ++++++++++++++++++++++--------------------------------
>  1 file changed, 35 insertions(+), 51 deletions(-)

Nice :)

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 08/10] virtio: cache used_idx in a VirtQueue field
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 08/10] virtio: cache used_idx in a VirtQueue field Paolo Bonzini
@ 2016-01-19 16:11   ` Cornelia Huck
  0 siblings, 0 replies; 29+ messages in thread
From: Cornelia Huck @ 2016-01-19 16:11 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Vincenzo Maffione, mst

On Fri, 15 Jan 2016 13:41:56 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> From: Vincenzo Maffione <v.maffione@gmail.com>
> 
> Accessing used_idx in the VQ requires an expensive access to
> guest physical memory. Before this patch, 3 accesses are normally
> done for each pop/push/notify call. However, since the used_idx is
> only written by us, we can track it in our internal data structure.
> 
> Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
> Message-Id: <3d062ec54e9a7bf9fb325c1fd693564951f2b319.1450218353.git.v.maffione@gmail.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/virtio/virtio.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 09/10] virtio: read avail_idx from VQ only when necessary
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 09/10] virtio: read avail_idx from VQ only when necessary Paolo Bonzini
@ 2016-01-19 16:20   ` Cornelia Huck
  2016-01-19 16:54   ` Michael S. Tsirkin
  1 sibling, 0 replies; 29+ messages in thread
From: Cornelia Huck @ 2016-01-19 16:20 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Vincenzo Maffione, mst

On Fri, 15 Jan 2016 13:41:57 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> From: Vincenzo Maffione <v.maffione@gmail.com>
> 
> The virtqueue_pop() implementation needs to check if the avail ring
> contains some pending buffers. To perform this check, it is not
> always necessary to fetch the avail_idx in the VQ memory, which is
> expensive. This patch introduces a shadow variable tracking avail_idx
> and modifies virtio_queue_empty() to access avail_idx in physical
> memory only when necessary.
> 
> Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
> Message-Id: <b617d6459902773d9f4ab843bfaca764f5af8eda.1450218353.git.v.maffione@gmail.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/virtio/virtio.c | 25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 10/10] virtio: combine write of an entry into used ring
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 10/10] virtio: combine write of an entry into used ring Paolo Bonzini
@ 2016-01-19 16:26   ` Cornelia Huck
  0 siblings, 0 replies; 29+ messages in thread
From: Cornelia Huck @ 2016-01-19 16:26 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Vincenzo Maffione, mst

On Fri, 15 Jan 2016 13:41:58 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> From: Vincenzo Maffione <v.maffione@gmail.com>
> 
> Fill in an element of the used ring with a single combined access to the
> guest physical memory, rather than using two separated accesses.
> This reduces the overhead due to expensive address translation.
> 
> Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
> Message-Id: <e4a89a767a4a92cbb6bcc551e151487eb36e1722.1450218353.git.v.maffione@gmail.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/virtio/virtio.c | 25 ++++++++++++-------------
>  1 file changed, 12 insertions(+), 13 deletions(-)

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 09/10] virtio: read avail_idx from VQ only when necessary
  2016-01-15 12:41 ` [Qemu-devel] [PATCH 09/10] virtio: read avail_idx from VQ only when necessary Paolo Bonzini
  2016-01-19 16:20   ` Cornelia Huck
@ 2016-01-19 16:54   ` Michael S. Tsirkin
  2016-01-19 18:48     ` Paolo Bonzini
  1 sibling, 1 reply; 29+ messages in thread
From: Michael S. Tsirkin @ 2016-01-19 16:54 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Vincenzo Maffione

On Fri, Jan 15, 2016 at 01:41:57PM +0100, Paolo Bonzini wrote:
> From: Vincenzo Maffione <v.maffione@gmail.com>
> 
> The virtqueue_pop() implementation needs to check if the avail ring
> contains some pending buffers. To perform this check, it is not
> always necessary to fetch the avail_idx in the VQ memory, which is
> expensive. This patch introduces a shadow variable tracking avail_idx
> and modifies virtio_queue_empty() to access avail_idx in physical
> memory only when necessary.
> 
> Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
> Message-Id: <b617d6459902773d9f4ab843bfaca764f5af8eda.1450218353.git.v.maffione@gmail.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Is the cost due to the page walk?

> ---
>  hw/virtio/virtio.c | 25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 3e7c6bf..01142da 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -70,8 +70,13 @@ typedef struct VRing
>  struct VirtQueue
>  {
>      VRing vring;
> +
> +    /* Next head to pop */
>      uint16_t last_avail_idx;
>  
> +    /* Last avail_idx read from VQ. */
> +    uint16_t shadow_avail_idx;
> +
>      uint16_t used_idx;
>  
>      /* Last used index value we have signalled on */
> @@ -132,7 +137,8 @@ static inline uint16_t vring_avail_idx(VirtQueue *vq)
>  {
>      hwaddr pa;
>      pa = vq->vring.avail + offsetof(VRingAvail, idx);
> -    return virtio_lduw_phys(vq->vdev, pa);
> +    vq->shadow_avail_idx = virtio_lduw_phys(vq->vdev, pa);
> +    return vq->shadow_avail_idx;
>  }
>  
>  static inline uint16_t vring_avail_ring(VirtQueue *vq, int i)
> @@ -223,8 +229,14 @@ int virtio_queue_ready(VirtQueue *vq)
>      return vq->vring.avail != 0;
>  }
>  
> +/* Fetch avail_idx from VQ memory only when we really need to know if
> + * guest has added some buffers. */
>  int virtio_queue_empty(VirtQueue *vq)
>  {
> +    if (vq->shadow_avail_idx != vq->last_avail_idx) {
> +        return 0;
> +    }
> +
>      return vring_avail_idx(vq) == vq->last_avail_idx;
>  }
>  
> @@ -300,7 +312,7 @@ static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
>      /* Check it isn't doing very strange things with descriptor numbers. */
>      if (num_heads > vq->vring.num) {
>          error_report("Guest moved used index from %u to %u",
> -                     idx, vring_avail_idx(vq));
> +                     idx, vq->shadow_avail_idx);
>          exit(1);
>      }
>      /* On success, callers read a descriptor at vq->last_avail_idx.
> @@ -534,9 +546,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
>      struct iovec iov[VIRTQUEUE_MAX_SIZE];
>      VRingDesc desc;
>  
> -    if (!virtqueue_num_heads(vq, vq->last_avail_idx)) {
> +    if (virtio_queue_empty(vq)) {
>          return NULL;
>      }
> +    /* Needed after virtio_queue_empty(), see comment in
> +     * virtqueue_num_heads(). */
> +    smp_rmb();
>  
>      /* When we start there are none of either input nor output. */
>      out_num = in_num = 0;
> @@ -786,6 +801,7 @@ void virtio_reset(void *opaque)
>          vdev->vq[i].vring.avail = 0;
>          vdev->vq[i].vring.used = 0;
>          vdev->vq[i].last_avail_idx = 0;
> +        vdev->vq[i].shadow_avail_idx = 0;
>          vdev->vq[i].used_idx = 0;
>          virtio_queue_set_vector(vdev, i, VIRTIO_NO_VECTOR);
>          vdev->vq[i].signalled_used = 0;
> @@ -1155,7 +1171,7 @@ static bool vring_notify(VirtIODevice *vdev, VirtQueue *vq)
>      smp_mb();
>      /* Always notify when queue is empty (when feature acknowledge) */
>      if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFY_ON_EMPTY) &&
> -        !vq->inuse && vring_avail_idx(vq) == vq->last_avail_idx) {
> +        !vq->inuse && virtio_queue_empty(vq)) {
>          return true;
>      }
>  
> @@ -1579,6 +1595,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
>                  return -1;
>              }
>              vdev->vq[i].used_idx = vring_used_idx(&vdev->vq[i]);
> +            vdev->vq[i].shadow_avail_idx = vring_avail_idx(&vdev->vq[i]);
>          }
>      }


shadow_avail_idx also should be updated on vhost stop,


>  
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 09/10] virtio: read avail_idx from VQ only when necessary
  2016-01-19 16:54   ` Michael S. Tsirkin
@ 2016-01-19 18:48     ` Paolo Bonzini
  2016-01-20 17:32       ` Cornelia Huck
  2016-01-21 21:40       ` Vincenzo Maffione
  0 siblings, 2 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-01-19 18:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel, Vincenzo Maffione



On 19/01/2016 17:54, Michael S. Tsirkin wrote:
> On Fri, Jan 15, 2016 at 01:41:57PM +0100, Paolo Bonzini wrote:
>> From: Vincenzo Maffione <v.maffione@gmail.com>
>>
>> The virtqueue_pop() implementation needs to check if the avail ring
>> contains some pending buffers. To perform this check, it is not
>> always necessary to fetch the avail_idx in the VQ memory, which is
>> expensive. This patch introduces a shadow variable tracking avail_idx
>> and modifies virtio_queue_empty() to access avail_idx in physical
>> memory only when necessary.
>>
>> Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
>> Message-Id: <b617d6459902773d9f4ab843bfaca764f5af8eda.1450218353.git.v.maffione@gmail.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> Is the cost due to the page walk?

Yes, as with all the other patches.  But unlike patches 7 and 10 where
we just reduce the number of walks, for patch 8 and 9 it's difficult to
beat a local cache. :)

>> @@ -1579,6 +1595,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
>>                  return -1;
>>              }
>>              vdev->vq[i].used_idx = vring_used_idx(&vdev->vq[i]);
>> +            vdev->vq[i].shadow_avail_idx = vring_avail_idx(&vdev->vq[i]);
>>          }
>>      }
> 
> 
> shadow_avail_idx also should be updated on vhost stop,

That's virtio_queue_set_last_avail_idx, right?

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 09/10] virtio: read avail_idx from VQ only when necessary
  2016-01-19 18:48     ` Paolo Bonzini
@ 2016-01-20 17:32       ` Cornelia Huck
  2016-01-21 21:40       ` Vincenzo Maffione
  1 sibling, 0 replies; 29+ messages in thread
From: Cornelia Huck @ 2016-01-20 17:32 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Vincenzo Maffione, Michael S. Tsirkin

On Tue, 19 Jan 2016 19:48:42 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> >> @@ -1579,6 +1595,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
> >>                  return -1;
> >>              }
> >>              vdev->vq[i].used_idx = vring_used_idx(&vdev->vq[i]);
> >> +            vdev->vq[i].shadow_avail_idx = vring_avail_idx(&vdev->vq[i]);
> >>          }
> >>      }
> > 
> > 
> > shadow_avail_idx also should be updated on vhost stop,
> 
> That's virtio_queue_set_last_avail_idx, right?

That should also take care of vring teardown.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 09/10] virtio: read avail_idx from VQ only when necessary
  2016-01-19 18:48     ` Paolo Bonzini
  2016-01-20 17:32       ` Cornelia Huck
@ 2016-01-21 21:40       ` Vincenzo Maffione
  1 sibling, 0 replies; 29+ messages in thread
From: Vincenzo Maffione @ 2016-01-21 21:40 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Michael S. Tsirkin

2016-01-19 19:48 GMT+01:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 19/01/2016 17:54, Michael S. Tsirkin wrote:
>> On Fri, Jan 15, 2016 at 01:41:57PM +0100, Paolo Bonzini wrote:
>>> From: Vincenzo Maffione <v.maffione@gmail.com>
>>>
>>> The virtqueue_pop() implementation needs to check if the avail ring
>>> contains some pending buffers. To perform this check, it is not
>>> always necessary to fetch the avail_idx in the VQ memory, which is
>>> expensive. This patch introduces a shadow variable tracking avail_idx
>>> and modifies virtio_queue_empty() to access avail_idx in physical
>>> memory only when necessary.
>>>
>>> Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
>>> Message-Id: <b617d6459902773d9f4ab843bfaca764f5af8eda.1450218353.git.v.maffione@gmail.com>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>
>> Is the cost due to the page walk?
>
> Yes, as with all the other patches.  But unlike patches 7 and 10 where
> we just reduce the number of walks, for patch 8 and 9 it's difficult to
> beat a local cache. :)
>
>>> @@ -1579,6 +1595,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
>>>                  return -1;
>>>              }
>>>              vdev->vq[i].used_idx = vring_used_idx(&vdev->vq[i]);
>>> +            vdev->vq[i].shadow_avail_idx = vring_avail_idx(&vdev->vq[i]);
>>>          }
>>>      }
>>
>>
>> shadow_avail_idx also should be updated on vhost stop,
>
> That's virtio_queue_set_last_avail_idx, right?
>
> Paolo

Right. Sorry, I missed that.

Vincenzo



-- 
Vincenzo Maffione

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 06/10] vring: slim down allocation of VirtQueueElements
  2016-01-31 10:28 [Qemu-devel] [PATCH v2 00/10] virtio/vring: optimization patches Paolo Bonzini
@ 2016-01-31 10:29 ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-01-31 10:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: cornelia.huck, mst

Build the addresses and s/g lists on the stack, and then copy them
to a VirtQueueElement that is just as big as required to contain this
particular s/g list.  The cost of the copy is minimal compared to that
of a large malloc.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/virtio/dataplane/vring.c | 53 ++++++++++++++++++++++++++++++---------------
 1 file changed, 36 insertions(+), 17 deletions(-)

diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c
index c950caa..d6b8ba9 100644
--- a/hw/virtio/dataplane/vring.c
+++ b/hw/virtio/dataplane/vring.c
@@ -217,8 +217,14 @@ bool vring_should_notify(VirtIODevice *vdev, Vring *vring)
                             new, old);
 }
 
-
-static int get_desc(Vring *vring, VirtQueueElement *elem,
+typedef struct VirtQueueCurrentElement {
+    unsigned in_num;
+    unsigned out_num;
+    hwaddr addr[VIRTQUEUE_MAX_SIZE];
+    struct iovec iov[VIRTQUEUE_MAX_SIZE];
+} VirtQueueCurrentElement;
+
+static int get_desc(Vring *vring, VirtQueueCurrentElement *elem,
                     struct vring_desc *desc)
 {
     unsigned *num;
@@ -229,12 +235,12 @@ static int get_desc(Vring *vring, VirtQueueElement *elem,
 
     if (desc->flags & VRING_DESC_F_WRITE) {
         num = &elem->in_num;
-        iov = &elem->in_sg[*num];
-        addr = &elem->in_addr[*num];
+        iov = &elem->iov[elem->out_num + *num];
+        addr = &elem->addr[elem->out_num + *num];
     } else {
         num = &elem->out_num;
-        iov = &elem->out_sg[*num];
-        addr = &elem->out_addr[*num];
+        iov = &elem->iov[*num];
+        addr = &elem->addr[*num];
 
         /* If it's an output descriptor, they're all supposed
          * to come before any input descriptors. */
@@ -298,7 +304,8 @@ static bool read_vring_desc(VirtIODevice *vdev,
 
 /* This is stolen from linux/drivers/vhost/vhost.c. */
 static int get_indirect(VirtIODevice *vdev, Vring *vring,
-                        VirtQueueElement *elem, struct vring_desc *indirect)
+                        VirtQueueCurrentElement *cur_elem,
+                        struct vring_desc *indirect)
 {
     struct vring_desc desc;
     unsigned int i = 0, count, found = 0;
@@ -350,7 +357,7 @@ static int get_indirect(VirtIODevice *vdev, Vring *vring,
             return -EFAULT;
         }
 
-        ret = get_desc(vring, elem, &desc);
+        ret = get_desc(vring, cur_elem, &desc);
         if (ret < 0) {
             vring->broken |= (ret == -EFAULT);
             return ret;
@@ -393,6 +400,7 @@ void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t sz)
     struct vring_desc desc;
     unsigned int i, head, found = 0, num = vring->vr.num;
     uint16_t avail_idx, last_avail_idx;
+    VirtQueueCurrentElement cur_elem;
     VirtQueueElement *elem = NULL;
     int ret;
 
@@ -402,10 +410,7 @@ void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t sz)
         goto out;
     }
 
-    elem = virtqueue_alloc_element(sz, VIRTQUEUE_MAX_SIZE, VIRTQUEUE_MAX_SIZE);
-
-    /* Initialize elem so it can be safely unmapped */
-    elem->in_num = elem->out_num = 0;
+    cur_elem.in_num = cur_elem.out_num = 0;
 
     /* Check it isn't doing very strange things with descriptor numbers. */
     last_avail_idx = vring->last_avail_idx;
@@ -432,8 +437,6 @@ void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t sz)
      * the index we've seen. */
     head = vring_get_avail_ring(vdev, vring, last_avail_idx % num);
 
-    elem->index = head;
-
     /* If their number is silly, that's an error. */
     if (unlikely(head >= num)) {
         error_report("Guest says index %u > %u is available", head, num);
@@ -460,14 +463,14 @@ void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t sz)
         barrier();
 
         if (desc.flags & VRING_DESC_F_INDIRECT) {
-            ret = get_indirect(vdev, vring, elem, &desc);
+            ret = get_indirect(vdev, vring, &cur_elem, &desc);
             if (ret < 0) {
                 goto out;
             }
             continue;
         }
 
-        ret = get_desc(vring, elem, &desc);
+        ret = get_desc(vring, &cur_elem, &desc);
         if (ret < 0) {
             goto out;
         }
@@ -482,6 +485,18 @@ void *vring_pop(VirtIODevice *vdev, Vring *vring, size_t sz)
             virtio_tswap16(vdev, vring->last_avail_idx);
     }
 
+    /* Now copy what we have collected and mapped */
+    elem = virtqueue_alloc_element(sz, cur_elem.out_num, cur_elem.in_num);
+    elem->index = head;
+    for (i = 0; i < cur_elem.out_num; i++) {
+        elem->out_addr[i] = cur_elem.addr[i];
+        elem->out_sg[i] = cur_elem.iov[i];
+    }
+    for (i = 0; i < cur_elem.in_num; i++) {
+        elem->in_addr[i] = cur_elem.addr[cur_elem.out_num + i];
+        elem->in_sg[i] = cur_elem.iov[cur_elem.out_num + i];
+    }
+
     return elem;
 
 out:
@@ -489,7 +504,11 @@ out:
     if (ret == -EFAULT) {
         vring->broken = true;
     }
-    vring_unmap_element(elem);
+
+    for (i = 0; i < cur_elem.out_num + cur_elem.in_num; i++) {
+        vring_unmap(cur_elem.iov[i].iov_base, false);
+    }
+
     g_free(elem);
     return NULL;
 }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-01-31 10:29 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-15 12:41 [Qemu-devel] [PATCH 00/10] virtio/vring: optimization patches Paolo Bonzini
2016-01-15 12:41 ` [Qemu-devel] [PATCH 01/10] virtio: move VirtQueueElement at the beginning of the structs Paolo Bonzini
2016-01-19 12:09   ` Cornelia Huck
2016-01-19 13:22     ` Paolo Bonzini
2016-01-19 14:01       ` Cornelia Huck
2016-01-15 12:41 ` [Qemu-devel] [PATCH 02/10] virtio: move allocation to virtqueue_pop/vring_pop Paolo Bonzini
2016-01-19 12:22   ` Cornelia Huck
2016-01-19 13:16     ` Paolo Bonzini
2016-01-15 12:41 ` [Qemu-devel] [PATCH 03/10] virtio: introduce qemu_get/put_virtqueue_element Paolo Bonzini
2016-01-19 12:30   ` Cornelia Huck
2016-01-15 12:41 ` [Qemu-devel] [PATCH 04/10] virtio: introduce virtqueue_alloc_element Paolo Bonzini
2016-01-19 12:40   ` Cornelia Huck
2016-01-15 12:41 ` [Qemu-devel] [PATCH 05/10] virtio: slim down allocation of VirtQueueElements Paolo Bonzini
2016-01-19 15:54   ` Cornelia Huck
2016-01-15 12:41 ` [Qemu-devel] [PATCH 06/10] vring: " Paolo Bonzini
2016-01-19 15:58   ` Cornelia Huck
2016-01-15 12:41 ` [Qemu-devel] [PATCH 07/10] virtio: combine the read of a descriptor Paolo Bonzini
2016-01-19 16:07   ` Cornelia Huck
2016-01-15 12:41 ` [Qemu-devel] [PATCH 08/10] virtio: cache used_idx in a VirtQueue field Paolo Bonzini
2016-01-19 16:11   ` Cornelia Huck
2016-01-15 12:41 ` [Qemu-devel] [PATCH 09/10] virtio: read avail_idx from VQ only when necessary Paolo Bonzini
2016-01-19 16:20   ` Cornelia Huck
2016-01-19 16:54   ` Michael S. Tsirkin
2016-01-19 18:48     ` Paolo Bonzini
2016-01-20 17:32       ` Cornelia Huck
2016-01-21 21:40       ` Vincenzo Maffione
2016-01-15 12:41 ` [Qemu-devel] [PATCH 10/10] virtio: combine write of an entry into used ring Paolo Bonzini
2016-01-19 16:26   ` Cornelia Huck
2016-01-31 10:28 [Qemu-devel] [PATCH v2 00/10] virtio/vring: optimization patches Paolo Bonzini
2016-01-31 10:29 ` [Qemu-devel] [PATCH 06/10] vring: slim down allocation of VirtQueueElements Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.