All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches
@ 2016-12-12 11:18 Paolo Bonzini
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 01/11] exec: optimize remaining address_space_* cases Paolo Bonzini
                   ` (12 more replies)
  0 siblings, 13 replies; 27+ messages in thread
From: Paolo Bonzini @ 2016-12-12 11:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, famz, mst, borntraeger

It is known that virtio's usage of ld_*_phys and st_*_phys functions
wastes time in address_space_translate, visiting the
AddressSpaceDispatch's radix tree.

This series introduces a small cache that changes these functions
to a simple range check and memory access.  The effect is a bit
underwhelming, because the improvement is only 1-2Kiops/second.
Nevertheless I'm throwing out the patches so that for example they
can be tested on s390.

Things to fix: handle address_space_cache_init failures
in virtio_init_region_cache.  Also, once virtio breaks free of
address_space_memoory, we'll need to handle invalidation in IOMMU regions.
For the latter, maybe it's worth introducing a new abstraction that
is higher-level than MemoryListener and covers both regular and IOMMU
memory regions.

Paolo

Paolo Bonzini (11):
  exec: optimize remaining address_space_* cases
  exec: introduce memory_ldst.inc.c
  exec: introduce address_space_extend_translation
  exec: introduce MemoryRegionCache
  virtio: make virtio_should_notify static
  virtio: add virtio_*_phys_cached
  virtio: use address_space_map/unmap to access descriptors
  virtio: use MemoryRegionCache to access descriptors
  virtio: add MemoryListener to cache ring translations
  virtio: use VRingMemoryRegionCaches for descriptor ring
  virtio: use VRingMemoryRegionCaches for avail and used rings

 exec.c                            | 687 +++++-------------------------------
 hw/net/virtio-net.c               |  14 +-
 hw/virtio/virtio.c                | 322 +++++++++++++----
 include/exec/cpu-all.h            |  23 ++
 include/exec/cpu-common.h         |  15 -
 include/exec/memory.h             | 166 +++++++++
 include/hw/virtio/virtio-access.h |  52 +++
 include/hw/virtio/virtio.h        |   2 +-
 include/qemu/typedefs.h           |   1 +
 memory_ldst.inc.c                 | 709 ++++++++++++++++++++++++++++++++++++++
 10 files changed, 1316 insertions(+), 675 deletions(-)
 create mode 100644 memory_ldst.inc.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 01/11] exec: optimize remaining address_space_* cases
  2016-12-12 11:18 [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches Paolo Bonzini
@ 2016-12-12 11:18 ` Paolo Bonzini
  2016-12-12 13:27   ` Stefan Hajnoczi
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 02/11] exec: introduce memory_ldst.inc.c Paolo Bonzini
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-12-12 11:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, famz, mst, borntraeger

Do them right before the next patch generalizes them into a multi-included
file.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 exec.c | 126 +++++++++++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 103 insertions(+), 23 deletions(-)

diff --git a/exec.c b/exec.c
index 08c558e..4db0ce5 100644
--- a/exec.c
+++ b/exec.c
@@ -3243,17 +3243,37 @@ uint64_t ldq_be_phys(AddressSpace *as, hwaddr addr)
     return address_space_ldq_be(as, addr, MEMTXATTRS_UNSPECIFIED, NULL);
 }
 
-/* XXX: optimize */
 uint32_t address_space_ldub(AddressSpace *as, hwaddr addr,
                             MemTxAttrs attrs, MemTxResult *result)
 {
-    uint8_t val;
+    uint8_t *ptr;
+    uint64_t val;
+    MemoryRegion *mr;
+    hwaddr l = 1;
+    hwaddr addr1;
     MemTxResult r;
+    bool release_lock = false;
 
-    r = address_space_rw(as, addr, attrs, &val, 1, 0);
+    rcu_read_lock();
+    mr = address_space_translate(as, addr, &addr1, &l, false);
+    if (!memory_access_is_direct(mr, false)) {
+        release_lock |= prepare_mmio_access(mr);
+
+        /* I/O case */
+        r = memory_region_dispatch_read(mr, addr1, &val, 1, attrs);
+    } else {
+        /* RAM case */
+        ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
+        val = ldub_p(ptr);
+        r = MEMTX_OK;
+    }
     if (result) {
         *result = r;
     }
+    if (release_lock) {
+        qemu_mutex_unlock_iothread();
+    }
+    rcu_read_unlock();
     return val;
 }
 
@@ -3493,17 +3513,35 @@ void stl_be_phys(AddressSpace *as, hwaddr addr, uint32_t val)
     address_space_stl_be(as, addr, val, MEMTXATTRS_UNSPECIFIED, NULL);
 }
 
-/* XXX: optimize */
 void address_space_stb(AddressSpace *as, hwaddr addr, uint32_t val,
                        MemTxAttrs attrs, MemTxResult *result)
 {
-    uint8_t v = val;
+    uint8_t *ptr;
+    MemoryRegion *mr;
+    hwaddr l = 1;
+    hwaddr addr1;
     MemTxResult r;
+    bool release_lock = false;
 
-    r = address_space_rw(as, addr, attrs, &v, 1, 1);
+    rcu_read_lock();
+    mr = address_space_translate(as, addr, &addr1, &l, true);
+    if (!memory_access_is_direct(mr, true)) {
+        release_lock |= prepare_mmio_access(mr);
+        r = memory_region_dispatch_write(mr, addr1, val, 1, attrs);
+    } else {
+        /* RAM case */
+        ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
+        stb_p(ptr, val);
+        invalidate_and_set_dirty(mr, addr1, 1);
+        r = MEMTX_OK;
+    }
     if (result) {
         *result = r;
     }
+    if (release_lock) {
+        qemu_mutex_unlock_iothread();
+    }
+    rcu_read_unlock();
 }
 
 void stb_phys(AddressSpace *as, hwaddr addr, uint32_t val)
@@ -3602,37 +3640,79 @@ void stw_be_phys(AddressSpace *as, hwaddr addr, uint32_t val)
     address_space_stw_be(as, addr, val, MEMTXATTRS_UNSPECIFIED, NULL);
 }
 
-/* XXX: optimize */
-void address_space_stq(AddressSpace *as, hwaddr addr, uint64_t val,
-                       MemTxAttrs attrs, MemTxResult *result)
+static inline void address_space_stq_internal(AddressSpace *as,
+                                              hwaddr addr, uint64_t val,
+                                              MemTxAttrs attrs,
+                                              MemTxResult *result,
+                                              enum device_endian endian)
 {
+    uint8_t *ptr;
+    MemoryRegion *mr;
+    hwaddr l = 8;
+    hwaddr addr1;
     MemTxResult r;
-    val = tswap64(val);
-    r = address_space_rw(as, addr, attrs, (void *) &val, 8, 1);
+    bool release_lock = false;
+
+    rcu_read_lock();
+    mr = address_space_translate(as, addr, &addr1, &l, true);
+    if (l < 8 || !memory_access_is_direct(mr, true)) {
+        release_lock |= prepare_mmio_access(mr);
+
+#if defined(TARGET_WORDS_BIGENDIAN)
+        if (endian == DEVICE_LITTLE_ENDIAN) {
+            val = bswap64(val);
+        }
+#else
+        if (endian == DEVICE_BIG_ENDIAN) {
+            val = bswap64(val);
+        }
+#endif
+        r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
+    } else {
+        /* RAM case */
+        ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
+        switch (endian) {
+        case DEVICE_LITTLE_ENDIAN:
+            stq_le_p(ptr, val);
+            break;
+        case DEVICE_BIG_ENDIAN:
+            stq_be_p(ptr, val);
+            break;
+        default:
+            stq_p(ptr, val);
+            break;
+        }
+        invalidate_and_set_dirty(mr, addr1, 8);
+        r = MEMTX_OK;
+    }
     if (result) {
         *result = r;
     }
+    if (release_lock) {
+        qemu_mutex_unlock_iothread();
+    }
+    rcu_read_unlock();
+}
+
+void address_space_stq(AddressSpace *as, hwaddr addr, uint64_t val,
+                       MemTxAttrs attrs, MemTxResult *result)
+{
+    address_space_stq_internal(as, addr, val, attrs, result,
+                               DEVICE_NATIVE_ENDIAN);
 }
 
 void address_space_stq_le(AddressSpace *as, hwaddr addr, uint64_t val,
                        MemTxAttrs attrs, MemTxResult *result)
 {
-    MemTxResult r;
-    val = cpu_to_le64(val);
-    r = address_space_rw(as, addr, attrs, (void *) &val, 8, 1);
-    if (result) {
-        *result = r;
-    }
+    address_space_stq_internal(as, addr, val, attrs, result,
+                               DEVICE_LITTLE_ENDIAN);
 }
+
 void address_space_stq_be(AddressSpace *as, hwaddr addr, uint64_t val,
                        MemTxAttrs attrs, MemTxResult *result)
 {
-    MemTxResult r;
-    val = cpu_to_be64(val);
-    r = address_space_rw(as, addr, attrs, (void *) &val, 8, 1);
-    if (result) {
-        *result = r;
-    }
+    address_space_stq_internal(as, addr, val, attrs, result,
+                               DEVICE_BIG_ENDIAN);
 }
 
 void stq_phys(AddressSpace *as, hwaddr addr, uint64_t val)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 02/11] exec: introduce memory_ldst.inc.c
  2016-12-12 11:18 [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches Paolo Bonzini
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 01/11] exec: optimize remaining address_space_* cases Paolo Bonzini
@ 2016-12-12 11:18 ` Paolo Bonzini
  2016-12-12 13:44   ` Stefan Hajnoczi
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 03/11] exec: introduce address_space_extend_translation Paolo Bonzini
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-12-12 11:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, famz, mst, borntraeger

Templatize the address_space_* and *_phys functions, so that we can add
similar functions in the next patch that work with a lightweight version
of address_space_map/unmap.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 exec.c                    | 681 +-------------------------------------------
 include/exec/cpu-common.h |  15 -
 include/exec/memory.h     |  15 +
 memory_ldst.inc.c         | 709 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 734 insertions(+), 686 deletions(-)
 create mode 100644 memory_ldst.inc.c

diff --git a/exec.c b/exec.c
index 4db0ce5..8568a6f 100644
--- a/exec.c
+++ b/exec.c
@@ -3058,677 +3058,16 @@ void cpu_physical_memory_unmap(void *buffer, hwaddr len,
     return address_space_unmap(&address_space_memory, buffer, len, is_write, access_len);
 }
 
-/* warning: addr must be aligned */
-static inline uint32_t address_space_ldl_internal(AddressSpace *as, hwaddr addr,
-                                                  MemTxAttrs attrs,
-                                                  MemTxResult *result,
-                                                  enum device_endian endian)
-{
-    uint8_t *ptr;
-    uint64_t val;
-    MemoryRegion *mr;
-    hwaddr l = 4;
-    hwaddr addr1;
-    MemTxResult r;
-    bool release_lock = false;
-
-    rcu_read_lock();
-    mr = address_space_translate(as, addr, &addr1, &l, false);
-    if (l < 4 || !memory_access_is_direct(mr, false)) {
-        release_lock |= prepare_mmio_access(mr);
-
-        /* I/O case */
-        r = memory_region_dispatch_read(mr, addr1, &val, 4, attrs);
-#if defined(TARGET_WORDS_BIGENDIAN)
-        if (endian == DEVICE_LITTLE_ENDIAN) {
-            val = bswap32(val);
-        }
-#else
-        if (endian == DEVICE_BIG_ENDIAN) {
-            val = bswap32(val);
-        }
-#endif
-    } else {
-        /* RAM case */
-        ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
-        switch (endian) {
-        case DEVICE_LITTLE_ENDIAN:
-            val = ldl_le_p(ptr);
-            break;
-        case DEVICE_BIG_ENDIAN:
-            val = ldl_be_p(ptr);
-            break;
-        default:
-            val = ldl_p(ptr);
-            break;
-        }
-        r = MEMTX_OK;
-    }
-    if (result) {
-        *result = r;
-    }
-    if (release_lock) {
-        qemu_mutex_unlock_iothread();
-    }
-    rcu_read_unlock();
-    return val;
-}
-
-uint32_t address_space_ldl(AddressSpace *as, hwaddr addr,
-                           MemTxAttrs attrs, MemTxResult *result)
-{
-    return address_space_ldl_internal(as, addr, attrs, result,
-                                      DEVICE_NATIVE_ENDIAN);
-}
-
-uint32_t address_space_ldl_le(AddressSpace *as, hwaddr addr,
-                              MemTxAttrs attrs, MemTxResult *result)
-{
-    return address_space_ldl_internal(as, addr, attrs, result,
-                                      DEVICE_LITTLE_ENDIAN);
-}
-
-uint32_t address_space_ldl_be(AddressSpace *as, hwaddr addr,
-                              MemTxAttrs attrs, MemTxResult *result)
-{
-    return address_space_ldl_internal(as, addr, attrs, result,
-                                      DEVICE_BIG_ENDIAN);
-}
-
-uint32_t ldl_phys(AddressSpace *as, hwaddr addr)
-{
-    return address_space_ldl(as, addr, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-uint32_t ldl_le_phys(AddressSpace *as, hwaddr addr)
-{
-    return address_space_ldl_le(as, addr, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-uint32_t ldl_be_phys(AddressSpace *as, hwaddr addr)
-{
-    return address_space_ldl_be(as, addr, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-/* warning: addr must be aligned */
-static inline uint64_t address_space_ldq_internal(AddressSpace *as, hwaddr addr,
-                                                  MemTxAttrs attrs,
-                                                  MemTxResult *result,
-                                                  enum device_endian endian)
-{
-    uint8_t *ptr;
-    uint64_t val;
-    MemoryRegion *mr;
-    hwaddr l = 8;
-    hwaddr addr1;
-    MemTxResult r;
-    bool release_lock = false;
-
-    rcu_read_lock();
-    mr = address_space_translate(as, addr, &addr1, &l,
-                                 false);
-    if (l < 8 || !memory_access_is_direct(mr, false)) {
-        release_lock |= prepare_mmio_access(mr);
-
-        /* I/O case */
-        r = memory_region_dispatch_read(mr, addr1, &val, 8, attrs);
-#if defined(TARGET_WORDS_BIGENDIAN)
-        if (endian == DEVICE_LITTLE_ENDIAN) {
-            val = bswap64(val);
-        }
-#else
-        if (endian == DEVICE_BIG_ENDIAN) {
-            val = bswap64(val);
-        }
-#endif
-    } else {
-        /* RAM case */
-        ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
-        switch (endian) {
-        case DEVICE_LITTLE_ENDIAN:
-            val = ldq_le_p(ptr);
-            break;
-        case DEVICE_BIG_ENDIAN:
-            val = ldq_be_p(ptr);
-            break;
-        default:
-            val = ldq_p(ptr);
-            break;
-        }
-        r = MEMTX_OK;
-    }
-    if (result) {
-        *result = r;
-    }
-    if (release_lock) {
-        qemu_mutex_unlock_iothread();
-    }
-    rcu_read_unlock();
-    return val;
-}
-
-uint64_t address_space_ldq(AddressSpace *as, hwaddr addr,
-                           MemTxAttrs attrs, MemTxResult *result)
-{
-    return address_space_ldq_internal(as, addr, attrs, result,
-                                      DEVICE_NATIVE_ENDIAN);
-}
-
-uint64_t address_space_ldq_le(AddressSpace *as, hwaddr addr,
-                           MemTxAttrs attrs, MemTxResult *result)
-{
-    return address_space_ldq_internal(as, addr, attrs, result,
-                                      DEVICE_LITTLE_ENDIAN);
-}
-
-uint64_t address_space_ldq_be(AddressSpace *as, hwaddr addr,
-                           MemTxAttrs attrs, MemTxResult *result)
-{
-    return address_space_ldq_internal(as, addr, attrs, result,
-                                      DEVICE_BIG_ENDIAN);
-}
-
-uint64_t ldq_phys(AddressSpace *as, hwaddr addr)
-{
-    return address_space_ldq(as, addr, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-uint64_t ldq_le_phys(AddressSpace *as, hwaddr addr)
-{
-    return address_space_ldq_le(as, addr, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-uint64_t ldq_be_phys(AddressSpace *as, hwaddr addr)
-{
-    return address_space_ldq_be(as, addr, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-uint32_t address_space_ldub(AddressSpace *as, hwaddr addr,
-                            MemTxAttrs attrs, MemTxResult *result)
-{
-    uint8_t *ptr;
-    uint64_t val;
-    MemoryRegion *mr;
-    hwaddr l = 1;
-    hwaddr addr1;
-    MemTxResult r;
-    bool release_lock = false;
-
-    rcu_read_lock();
-    mr = address_space_translate(as, addr, &addr1, &l, false);
-    if (!memory_access_is_direct(mr, false)) {
-        release_lock |= prepare_mmio_access(mr);
-
-        /* I/O case */
-        r = memory_region_dispatch_read(mr, addr1, &val, 1, attrs);
-    } else {
-        /* RAM case */
-        ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
-        val = ldub_p(ptr);
-        r = MEMTX_OK;
-    }
-    if (result) {
-        *result = r;
-    }
-    if (release_lock) {
-        qemu_mutex_unlock_iothread();
-    }
-    rcu_read_unlock();
-    return val;
-}
-
-uint32_t ldub_phys(AddressSpace *as, hwaddr addr)
-{
-    return address_space_ldub(as, addr, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-/* warning: addr must be aligned */
-static inline uint32_t address_space_lduw_internal(AddressSpace *as,
-                                                   hwaddr addr,
-                                                   MemTxAttrs attrs,
-                                                   MemTxResult *result,
-                                                   enum device_endian endian)
-{
-    uint8_t *ptr;
-    uint64_t val;
-    MemoryRegion *mr;
-    hwaddr l = 2;
-    hwaddr addr1;
-    MemTxResult r;
-    bool release_lock = false;
-
-    rcu_read_lock();
-    mr = address_space_translate(as, addr, &addr1, &l,
-                                 false);
-    if (l < 2 || !memory_access_is_direct(mr, false)) {
-        release_lock |= prepare_mmio_access(mr);
-
-        /* I/O case */
-        r = memory_region_dispatch_read(mr, addr1, &val, 2, attrs);
-#if defined(TARGET_WORDS_BIGENDIAN)
-        if (endian == DEVICE_LITTLE_ENDIAN) {
-            val = bswap16(val);
-        }
-#else
-        if (endian == DEVICE_BIG_ENDIAN) {
-            val = bswap16(val);
-        }
-#endif
-    } else {
-        /* RAM case */
-        ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
-        switch (endian) {
-        case DEVICE_LITTLE_ENDIAN:
-            val = lduw_le_p(ptr);
-            break;
-        case DEVICE_BIG_ENDIAN:
-            val = lduw_be_p(ptr);
-            break;
-        default:
-            val = lduw_p(ptr);
-            break;
-        }
-        r = MEMTX_OK;
-    }
-    if (result) {
-        *result = r;
-    }
-    if (release_lock) {
-        qemu_mutex_unlock_iothread();
-    }
-    rcu_read_unlock();
-    return val;
-}
-
-uint32_t address_space_lduw(AddressSpace *as, hwaddr addr,
-                           MemTxAttrs attrs, MemTxResult *result)
-{
-    return address_space_lduw_internal(as, addr, attrs, result,
-                                       DEVICE_NATIVE_ENDIAN);
-}
-
-uint32_t address_space_lduw_le(AddressSpace *as, hwaddr addr,
-                           MemTxAttrs attrs, MemTxResult *result)
-{
-    return address_space_lduw_internal(as, addr, attrs, result,
-                                       DEVICE_LITTLE_ENDIAN);
-}
-
-uint32_t address_space_lduw_be(AddressSpace *as, hwaddr addr,
-                           MemTxAttrs attrs, MemTxResult *result)
-{
-    return address_space_lduw_internal(as, addr, attrs, result,
-                                       DEVICE_BIG_ENDIAN);
-}
-
-uint32_t lduw_phys(AddressSpace *as, hwaddr addr)
-{
-    return address_space_lduw(as, addr, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-uint32_t lduw_le_phys(AddressSpace *as, hwaddr addr)
-{
-    return address_space_lduw_le(as, addr, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-uint32_t lduw_be_phys(AddressSpace *as, hwaddr addr)
-{
-    return address_space_lduw_be(as, addr, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-/* warning: addr must be aligned. The ram page is not masked as dirty
-   and the code inside is not invalidated. It is useful if the dirty
-   bits are used to track modified PTEs */
-void address_space_stl_notdirty(AddressSpace *as, hwaddr addr, uint32_t val,
-                                MemTxAttrs attrs, MemTxResult *result)
-{
-    uint8_t *ptr;
-    MemoryRegion *mr;
-    hwaddr l = 4;
-    hwaddr addr1;
-    MemTxResult r;
-    uint8_t dirty_log_mask;
-    bool release_lock = false;
-
-    rcu_read_lock();
-    mr = address_space_translate(as, addr, &addr1, &l,
-                                 true);
-    if (l < 4 || !memory_access_is_direct(mr, true)) {
-        release_lock |= prepare_mmio_access(mr);
-
-        r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
-    } else {
-        ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
-        stl_p(ptr, val);
-
-        dirty_log_mask = memory_region_get_dirty_log_mask(mr);
-        dirty_log_mask &= ~(1 << DIRTY_MEMORY_CODE);
-        cpu_physical_memory_set_dirty_range(memory_region_get_ram_addr(mr) + addr,
-                                            4, dirty_log_mask);
-        r = MEMTX_OK;
-    }
-    if (result) {
-        *result = r;
-    }
-    if (release_lock) {
-        qemu_mutex_unlock_iothread();
-    }
-    rcu_read_unlock();
-}
-
-void stl_phys_notdirty(AddressSpace *as, hwaddr addr, uint32_t val)
-{
-    address_space_stl_notdirty(as, addr, val, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-/* warning: addr must be aligned */
-static inline void address_space_stl_internal(AddressSpace *as,
-                                              hwaddr addr, uint32_t val,
-                                              MemTxAttrs attrs,
-                                              MemTxResult *result,
-                                              enum device_endian endian)
-{
-    uint8_t *ptr;
-    MemoryRegion *mr;
-    hwaddr l = 4;
-    hwaddr addr1;
-    MemTxResult r;
-    bool release_lock = false;
-
-    rcu_read_lock();
-    mr = address_space_translate(as, addr, &addr1, &l,
-                                 true);
-    if (l < 4 || !memory_access_is_direct(mr, true)) {
-        release_lock |= prepare_mmio_access(mr);
-
-#if defined(TARGET_WORDS_BIGENDIAN)
-        if (endian == DEVICE_LITTLE_ENDIAN) {
-            val = bswap32(val);
-        }
-#else
-        if (endian == DEVICE_BIG_ENDIAN) {
-            val = bswap32(val);
-        }
-#endif
-        r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
-    } else {
-        /* RAM case */
-        ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
-        switch (endian) {
-        case DEVICE_LITTLE_ENDIAN:
-            stl_le_p(ptr, val);
-            break;
-        case DEVICE_BIG_ENDIAN:
-            stl_be_p(ptr, val);
-            break;
-        default:
-            stl_p(ptr, val);
-            break;
-        }
-        invalidate_and_set_dirty(mr, addr1, 4);
-        r = MEMTX_OK;
-    }
-    if (result) {
-        *result = r;
-    }
-    if (release_lock) {
-        qemu_mutex_unlock_iothread();
-    }
-    rcu_read_unlock();
-}
-
-void address_space_stl(AddressSpace *as, hwaddr addr, uint32_t val,
-                       MemTxAttrs attrs, MemTxResult *result)
-{
-    address_space_stl_internal(as, addr, val, attrs, result,
-                               DEVICE_NATIVE_ENDIAN);
-}
-
-void address_space_stl_le(AddressSpace *as, hwaddr addr, uint32_t val,
-                       MemTxAttrs attrs, MemTxResult *result)
-{
-    address_space_stl_internal(as, addr, val, attrs, result,
-                               DEVICE_LITTLE_ENDIAN);
-}
-
-void address_space_stl_be(AddressSpace *as, hwaddr addr, uint32_t val,
-                       MemTxAttrs attrs, MemTxResult *result)
-{
-    address_space_stl_internal(as, addr, val, attrs, result,
-                               DEVICE_BIG_ENDIAN);
-}
-
-void stl_phys(AddressSpace *as, hwaddr addr, uint32_t val)
-{
-    address_space_stl(as, addr, val, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-void stl_le_phys(AddressSpace *as, hwaddr addr, uint32_t val)
-{
-    address_space_stl_le(as, addr, val, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-void stl_be_phys(AddressSpace *as, hwaddr addr, uint32_t val)
-{
-    address_space_stl_be(as, addr, val, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-void address_space_stb(AddressSpace *as, hwaddr addr, uint32_t val,
-                       MemTxAttrs attrs, MemTxResult *result)
-{
-    uint8_t *ptr;
-    MemoryRegion *mr;
-    hwaddr l = 1;
-    hwaddr addr1;
-    MemTxResult r;
-    bool release_lock = false;
-
-    rcu_read_lock();
-    mr = address_space_translate(as, addr, &addr1, &l, true);
-    if (!memory_access_is_direct(mr, true)) {
-        release_lock |= prepare_mmio_access(mr);
-        r = memory_region_dispatch_write(mr, addr1, val, 1, attrs);
-    } else {
-        /* RAM case */
-        ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
-        stb_p(ptr, val);
-        invalidate_and_set_dirty(mr, addr1, 1);
-        r = MEMTX_OK;
-    }
-    if (result) {
-        *result = r;
-    }
-    if (release_lock) {
-        qemu_mutex_unlock_iothread();
-    }
-    rcu_read_unlock();
-}
-
-void stb_phys(AddressSpace *as, hwaddr addr, uint32_t val)
-{
-    address_space_stb(as, addr, val, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-/* warning: addr must be aligned */
-static inline void address_space_stw_internal(AddressSpace *as,
-                                              hwaddr addr, uint32_t val,
-                                              MemTxAttrs attrs,
-                                              MemTxResult *result,
-                                              enum device_endian endian)
-{
-    uint8_t *ptr;
-    MemoryRegion *mr;
-    hwaddr l = 2;
-    hwaddr addr1;
-    MemTxResult r;
-    bool release_lock = false;
-
-    rcu_read_lock();
-    mr = address_space_translate(as, addr, &addr1, &l, true);
-    if (l < 2 || !memory_access_is_direct(mr, true)) {
-        release_lock |= prepare_mmio_access(mr);
-
-#if defined(TARGET_WORDS_BIGENDIAN)
-        if (endian == DEVICE_LITTLE_ENDIAN) {
-            val = bswap16(val);
-        }
-#else
-        if (endian == DEVICE_BIG_ENDIAN) {
-            val = bswap16(val);
-        }
-#endif
-        r = memory_region_dispatch_write(mr, addr1, val, 2, attrs);
-    } else {
-        /* RAM case */
-        ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
-        switch (endian) {
-        case DEVICE_LITTLE_ENDIAN:
-            stw_le_p(ptr, val);
-            break;
-        case DEVICE_BIG_ENDIAN:
-            stw_be_p(ptr, val);
-            break;
-        default:
-            stw_p(ptr, val);
-            break;
-        }
-        invalidate_and_set_dirty(mr, addr1, 2);
-        r = MEMTX_OK;
-    }
-    if (result) {
-        *result = r;
-    }
-    if (release_lock) {
-        qemu_mutex_unlock_iothread();
-    }
-    rcu_read_unlock();
-}
-
-void address_space_stw(AddressSpace *as, hwaddr addr, uint32_t val,
-                       MemTxAttrs attrs, MemTxResult *result)
-{
-    address_space_stw_internal(as, addr, val, attrs, result,
-                               DEVICE_NATIVE_ENDIAN);
-}
-
-void address_space_stw_le(AddressSpace *as, hwaddr addr, uint32_t val,
-                       MemTxAttrs attrs, MemTxResult *result)
-{
-    address_space_stw_internal(as, addr, val, attrs, result,
-                               DEVICE_LITTLE_ENDIAN);
-}
-
-void address_space_stw_be(AddressSpace *as, hwaddr addr, uint32_t val,
-                       MemTxAttrs attrs, MemTxResult *result)
-{
-    address_space_stw_internal(as, addr, val, attrs, result,
-                               DEVICE_BIG_ENDIAN);
-}
-
-void stw_phys(AddressSpace *as, hwaddr addr, uint32_t val)
-{
-    address_space_stw(as, addr, val, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-void stw_le_phys(AddressSpace *as, hwaddr addr, uint32_t val)
-{
-    address_space_stw_le(as, addr, val, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-void stw_be_phys(AddressSpace *as, hwaddr addr, uint32_t val)
-{
-    address_space_stw_be(as, addr, val, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-static inline void address_space_stq_internal(AddressSpace *as,
-                                              hwaddr addr, uint64_t val,
-                                              MemTxAttrs attrs,
-                                              MemTxResult *result,
-                                              enum device_endian endian)
-{
-    uint8_t *ptr;
-    MemoryRegion *mr;
-    hwaddr l = 8;
-    hwaddr addr1;
-    MemTxResult r;
-    bool release_lock = false;
-
-    rcu_read_lock();
-    mr = address_space_translate(as, addr, &addr1, &l, true);
-    if (l < 8 || !memory_access_is_direct(mr, true)) {
-        release_lock |= prepare_mmio_access(mr);
-
-#if defined(TARGET_WORDS_BIGENDIAN)
-        if (endian == DEVICE_LITTLE_ENDIAN) {
-            val = bswap64(val);
-        }
-#else
-        if (endian == DEVICE_BIG_ENDIAN) {
-            val = bswap64(val);
-        }
-#endif
-        r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
-    } else {
-        /* RAM case */
-        ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
-        switch (endian) {
-        case DEVICE_LITTLE_ENDIAN:
-            stq_le_p(ptr, val);
-            break;
-        case DEVICE_BIG_ENDIAN:
-            stq_be_p(ptr, val);
-            break;
-        default:
-            stq_p(ptr, val);
-            break;
-        }
-        invalidate_and_set_dirty(mr, addr1, 8);
-        r = MEMTX_OK;
-    }
-    if (result) {
-        *result = r;
-    }
-    if (release_lock) {
-        qemu_mutex_unlock_iothread();
-    }
-    rcu_read_unlock();
-}
-
-void address_space_stq(AddressSpace *as, hwaddr addr, uint64_t val,
-                       MemTxAttrs attrs, MemTxResult *result)
-{
-    address_space_stq_internal(as, addr, val, attrs, result,
-                               DEVICE_NATIVE_ENDIAN);
-}
-
-void address_space_stq_le(AddressSpace *as, hwaddr addr, uint64_t val,
-                       MemTxAttrs attrs, MemTxResult *result)
-{
-    address_space_stq_internal(as, addr, val, attrs, result,
-                               DEVICE_LITTLE_ENDIAN);
-}
-
-void address_space_stq_be(AddressSpace *as, hwaddr addr, uint64_t val,
-                       MemTxAttrs attrs, MemTxResult *result)
-{
-    address_space_stq_internal(as, addr, val, attrs, result,
-                               DEVICE_BIG_ENDIAN);
-}
-
-void stq_phys(AddressSpace *as, hwaddr addr, uint64_t val)
-{
-    address_space_stq(as, addr, val, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-void stq_le_phys(AddressSpace *as, hwaddr addr, uint64_t val)
-{
-    address_space_stq_le(as, addr, val, MEMTXATTRS_UNSPECIFIED, NULL);
-}
-
-void stq_be_phys(AddressSpace *as, hwaddr addr, uint64_t val)
-{
-    address_space_stq_be(as, addr, val, MEMTXATTRS_UNSPECIFIED, NULL);
-}
+#define ARG1_DECL                AddressSpace *as
+#define ARG1                     as
+#define SUFFIX
+#define TRANSLATE(...)           address_space_translate(as, __VA_ARGS__)
+#define IS_DIRECT(mr, is_write)  memory_access_is_direct(mr, is_write)
+#define MAP_RAM(mr, ofs)         qemu_map_ram_ptr((mr)->ram_block, ofs)
+#define INVALIDATE(mr, ofs, len) invalidate_and_set_dirty(mr, ofs, len)
+#define RCU_READ_LOCK(...)       rcu_read_lock()
+#define RCU_READ_UNLOCK(...)     rcu_read_unlock()
+#include "memory_ldst.inc.c"
 
 /* virtual memory access for debug (includes writing to ROM) */
 int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr,
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index cffdc13..bd15853 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -94,21 +94,6 @@ bool cpu_physical_memory_is_io(hwaddr phys_addr);
  */
 void qemu_flush_coalesced_mmio_buffer(void);
 
-uint32_t ldub_phys(AddressSpace *as, hwaddr addr);
-uint32_t lduw_le_phys(AddressSpace *as, hwaddr addr);
-uint32_t lduw_be_phys(AddressSpace *as, hwaddr addr);
-uint32_t ldl_le_phys(AddressSpace *as, hwaddr addr);
-uint32_t ldl_be_phys(AddressSpace *as, hwaddr addr);
-uint64_t ldq_le_phys(AddressSpace *as, hwaddr addr);
-uint64_t ldq_be_phys(AddressSpace *as, hwaddr addr);
-void stb_phys(AddressSpace *as, hwaddr addr, uint32_t val);
-void stw_le_phys(AddressSpace *as, hwaddr addr, uint32_t val);
-void stw_be_phys(AddressSpace *as, hwaddr addr, uint32_t val);
-void stl_le_phys(AddressSpace *as, hwaddr addr, uint32_t val);
-void stl_be_phys(AddressSpace *as, hwaddr addr, uint32_t val);
-void stq_le_phys(AddressSpace *as, hwaddr addr, uint64_t val);
-void stq_be_phys(AddressSpace *as, hwaddr addr, uint64_t val);
-
 void cpu_physical_memory_write_rom(AddressSpace *as, hwaddr addr,
                                    const uint8_t *buf, int len);
 void cpu_flush_icache_range(hwaddr start, int len);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 9728a2f..f35b612 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1404,6 +1404,21 @@ void address_space_stq_le(AddressSpace *as, hwaddr addr, uint64_t val,
 void address_space_stq_be(AddressSpace *as, hwaddr addr, uint64_t val,
                             MemTxAttrs attrs, MemTxResult *result);
 
+uint32_t ldub_phys(AddressSpace *as, hwaddr addr);
+uint32_t lduw_le_phys(AddressSpace *as, hwaddr addr);
+uint32_t lduw_be_phys(AddressSpace *as, hwaddr addr);
+uint32_t ldl_le_phys(AddressSpace *as, hwaddr addr);
+uint32_t ldl_be_phys(AddressSpace *as, hwaddr addr);
+uint64_t ldq_le_phys(AddressSpace *as, hwaddr addr);
+uint64_t ldq_be_phys(AddressSpace *as, hwaddr addr);
+void stb_phys(AddressSpace *as, hwaddr addr, uint32_t val);
+void stw_le_phys(AddressSpace *as, hwaddr addr, uint32_t val);
+void stw_be_phys(AddressSpace *as, hwaddr addr, uint32_t val);
+void stl_le_phys(AddressSpace *as, hwaddr addr, uint32_t val);
+void stl_be_phys(AddressSpace *as, hwaddr addr, uint32_t val);
+void stq_le_phys(AddressSpace *as, hwaddr addr, uint64_t val);
+void stq_be_phys(AddressSpace *as, hwaddr addr, uint64_t val);
+
 /* address_space_translate: translate an address range into an address space
  * into a MemoryRegion and an address range into that section.  Should be
  * called from an RCU critical section, to avoid that the last reference
diff --git a/memory_ldst.inc.c b/memory_ldst.inc.c
new file mode 100644
index 0000000..e1abe5f
--- /dev/null
+++ b/memory_ldst.inc.c
@@ -0,0 +1,709 @@
+/*
+ *  Physical memory access templates
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *  Copyright (c) 2015 Linaro, Inc.
+ *  Copyright (c) 2016 Red Hat, Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/* warning: addr must be aligned */
+static inline uint32_t glue(address_space_ldl_internal, SUFFIX)(ARG1_DECL,
+    hwaddr addr, MemTxAttrs attrs, MemTxResult *result,
+    enum device_endian endian)
+{
+    uint8_t *ptr;
+    uint64_t val;
+    MemoryRegion *mr;
+    hwaddr l = 4;
+    hwaddr addr1;
+    MemTxResult r;
+    bool release_lock = false;
+
+    RCU_READ_LOCK();
+    mr = TRANSLATE(addr, &addr1, &l, false);
+    if (l < 4 || !IS_DIRECT(mr, false)) {
+        release_lock |= prepare_mmio_access(mr);
+
+        /* I/O case */
+        r = memory_region_dispatch_read(mr, addr1, &val, 4, attrs);
+#if defined(TARGET_WORDS_BIGENDIAN)
+        if (endian == DEVICE_LITTLE_ENDIAN) {
+            val = bswap32(val);
+        }
+#else
+        if (endian == DEVICE_BIG_ENDIAN) {
+            val = bswap32(val);
+        }
+#endif
+    } else {
+        /* RAM case */
+        ptr = MAP_RAM(mr, addr1);
+        switch (endian) {
+        case DEVICE_LITTLE_ENDIAN:
+            val = ldl_le_p(ptr);
+            break;
+        case DEVICE_BIG_ENDIAN:
+            val = ldl_be_p(ptr);
+            break;
+        default:
+            val = ldl_p(ptr);
+            break;
+        }
+        r = MEMTX_OK;
+    }
+    if (result) {
+        *result = r;
+    }
+    if (release_lock) {
+        qemu_mutex_unlock_iothread();
+    }
+    RCU_READ_UNLOCK();
+    return val;
+}
+
+uint32_t glue(address_space_ldl, SUFFIX)(ARG1_DECL,
+    hwaddr addr, MemTxAttrs attrs, MemTxResult *result)
+{
+    return glue(address_space_ldl_internal, SUFFIX)(ARG1, addr, attrs, result,
+                                                    DEVICE_NATIVE_ENDIAN);
+}
+
+uint32_t glue(address_space_ldl_le, SUFFIX)(ARG1_DECL,
+    hwaddr addr, MemTxAttrs attrs, MemTxResult *result)
+{
+    return glue(address_space_ldl_internal, SUFFIX)(ARG1, addr, attrs, result,
+                                                    DEVICE_LITTLE_ENDIAN);
+}
+
+uint32_t glue(address_space_ldl_be, SUFFIX)(ARG1_DECL,
+    hwaddr addr, MemTxAttrs attrs, MemTxResult *result)
+{
+    return glue(address_space_ldl_internal, SUFFIX)(ARG1, addr, attrs, result,
+                                                    DEVICE_BIG_ENDIAN);
+}
+
+uint32_t glue(ldl_phys, SUFFIX)(ARG1_DECL, hwaddr addr)
+{
+    return glue(address_space_ldl, SUFFIX)(ARG1, addr,
+                                           MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+uint32_t glue(ldl_le_phys, SUFFIX)(ARG1_DECL, hwaddr addr)
+{
+    return glue(address_space_ldl_le, SUFFIX)(ARG1, addr,
+                                              MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+uint32_t glue(ldl_be_phys, SUFFIX)(ARG1_DECL, hwaddr addr)
+{
+    return glue(address_space_ldl_be, SUFFIX)(ARG1, addr,
+                                              MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+/* warning: addr must be aligned */
+static inline uint64_t glue(address_space_ldq_internal, SUFFIX)(ARG1_DECL,
+    hwaddr addr, MemTxAttrs attrs, MemTxResult *result,
+    enum device_endian endian)
+{
+    uint8_t *ptr;
+    uint64_t val;
+    MemoryRegion *mr;
+    hwaddr l = 8;
+    hwaddr addr1;
+    MemTxResult r;
+    bool release_lock = false;
+
+    RCU_READ_LOCK();
+    mr = TRANSLATE(addr, &addr1, &l, false);
+    if (l < 8 || !IS_DIRECT(mr, false)) {
+        release_lock |= prepare_mmio_access(mr);
+
+        /* I/O case */
+        r = memory_region_dispatch_read(mr, addr1, &val, 8, attrs);
+#if defined(TARGET_WORDS_BIGENDIAN)
+        if (endian == DEVICE_LITTLE_ENDIAN) {
+            val = bswap64(val);
+        }
+#else
+        if (endian == DEVICE_BIG_ENDIAN) {
+            val = bswap64(val);
+        }
+#endif
+    } else {
+        /* RAM case */
+        ptr = MAP_RAM(mr, addr1);
+        switch (endian) {
+        case DEVICE_LITTLE_ENDIAN:
+            val = ldq_le_p(ptr);
+            break;
+        case DEVICE_BIG_ENDIAN:
+            val = ldq_be_p(ptr);
+            break;
+        default:
+            val = ldq_p(ptr);
+            break;
+        }
+        r = MEMTX_OK;
+    }
+    if (result) {
+        *result = r;
+    }
+    if (release_lock) {
+        qemu_mutex_unlock_iothread();
+    }
+    RCU_READ_UNLOCK();
+    return val;
+}
+
+uint64_t glue(address_space_ldq, SUFFIX)(ARG1_DECL,
+    hwaddr addr, MemTxAttrs attrs, MemTxResult *result)
+{
+    return glue(address_space_ldq_internal, SUFFIX)(ARG1, addr, attrs, result,
+                                                    DEVICE_NATIVE_ENDIAN);
+}
+
+uint64_t glue(address_space_ldq_le, SUFFIX)(ARG1_DECL,
+    hwaddr addr, MemTxAttrs attrs, MemTxResult *result)
+{
+    return glue(address_space_ldq_internal, SUFFIX)(ARG1, addr, attrs, result,
+                                                    DEVICE_LITTLE_ENDIAN);
+}
+
+uint64_t glue(address_space_ldq_be, SUFFIX)(ARG1_DECL,
+    hwaddr addr, MemTxAttrs attrs, MemTxResult *result)
+{
+    return glue(address_space_ldq_internal, SUFFIX)(ARG1, addr, attrs, result,
+                                                    DEVICE_BIG_ENDIAN);
+}
+
+uint64_t glue(ldq_phys, SUFFIX)(ARG1_DECL, hwaddr addr)
+{
+    return glue(address_space_ldq, SUFFIX)(ARG1, addr,
+                                           MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+uint64_t glue(ldq_le_phys, SUFFIX)(ARG1_DECL, hwaddr addr)
+{
+    return glue(address_space_ldq_le, SUFFIX)(ARG1, addr,
+                                              MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+uint64_t glue(ldq_be_phys, SUFFIX)(ARG1_DECL, hwaddr addr)
+{
+    return glue(address_space_ldq_be, SUFFIX)(ARG1, addr,
+                                              MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+uint32_t glue(address_space_ldub, SUFFIX)(ARG1_DECL,
+    hwaddr addr, MemTxAttrs attrs, MemTxResult *result)
+{
+    uint8_t *ptr;
+    uint64_t val;
+    MemoryRegion *mr;
+    hwaddr l = 1;
+    hwaddr addr1;
+    MemTxResult r;
+    bool release_lock = false;
+
+    RCU_READ_LOCK();
+    mr = TRANSLATE(addr, &addr1, &l, false);
+    if (!IS_DIRECT(mr, false)) {
+        release_lock |= prepare_mmio_access(mr);
+
+        /* I/O case */
+        r = memory_region_dispatch_read(mr, addr1, &val, 1, attrs);
+    } else {
+        /* RAM case */
+        ptr = MAP_RAM(mr, addr1);
+        val = ldub_p(ptr);
+        r = MEMTX_OK;
+    }
+    if (result) {
+        *result = r;
+    }
+    if (release_lock) {
+        qemu_mutex_unlock_iothread();
+    }
+    RCU_READ_UNLOCK();
+    return val;
+}
+
+uint32_t glue(ldub_phys, SUFFIX)(ARG1_DECL, hwaddr addr)
+{
+    return glue(address_space_ldub, SUFFIX)(ARG1, addr,
+                                            MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+/* warning: addr must be aligned */
+static inline uint32_t glue(address_space_lduw_internal, SUFFIX)(ARG1_DECL,
+    hwaddr addr, MemTxAttrs attrs, MemTxResult *result,
+    enum device_endian endian)
+{
+    uint8_t *ptr;
+    uint64_t val;
+    MemoryRegion *mr;
+    hwaddr l = 2;
+    hwaddr addr1;
+    MemTxResult r;
+    bool release_lock = false;
+
+    RCU_READ_LOCK();
+    mr = TRANSLATE(addr, &addr1, &l, false);
+    if (l < 2 || !IS_DIRECT(mr, false)) {
+        release_lock |= prepare_mmio_access(mr);
+
+        /* I/O case */
+        r = memory_region_dispatch_read(mr, addr1, &val, 2, attrs);
+#if defined(TARGET_WORDS_BIGENDIAN)
+        if (endian == DEVICE_LITTLE_ENDIAN) {
+            val = bswap16(val);
+        }
+#else
+        if (endian == DEVICE_BIG_ENDIAN) {
+            val = bswap16(val);
+        }
+#endif
+    } else {
+        /* RAM case */
+        ptr = MAP_RAM(mr, addr1);
+        switch (endian) {
+        case DEVICE_LITTLE_ENDIAN:
+            val = lduw_le_p(ptr);
+            break;
+        case DEVICE_BIG_ENDIAN:
+            val = lduw_be_p(ptr);
+            break;
+        default:
+            val = lduw_p(ptr);
+            break;
+        }
+        r = MEMTX_OK;
+    }
+    if (result) {
+        *result = r;
+    }
+    if (release_lock) {
+        qemu_mutex_unlock_iothread();
+    }
+    RCU_READ_UNLOCK();
+    return val;
+}
+
+uint32_t glue(address_space_lduw, SUFFIX)(ARG1_DECL,
+    hwaddr addr, MemTxAttrs attrs, MemTxResult *result)
+{
+    return glue(address_space_lduw_internal, SUFFIX)(ARG1, addr, attrs, result,
+                                                     DEVICE_NATIVE_ENDIAN);
+}
+
+uint32_t glue(address_space_lduw_le, SUFFIX)(ARG1_DECL,
+    hwaddr addr, MemTxAttrs attrs, MemTxResult *result)
+{
+    return glue(address_space_lduw_internal, SUFFIX)(ARG1, addr, attrs, result,
+                                                     DEVICE_LITTLE_ENDIAN);
+}
+
+uint32_t glue(address_space_lduw_be, SUFFIX)(ARG1_DECL,
+    hwaddr addr, MemTxAttrs attrs, MemTxResult *result)
+{
+    return glue(address_space_lduw_internal, SUFFIX)(ARG1, addr, attrs, result,
+                                       DEVICE_BIG_ENDIAN);
+}
+
+uint32_t glue(lduw_phys, SUFFIX)(ARG1_DECL, hwaddr addr)
+{
+    return glue(address_space_lduw, SUFFIX)(ARG1, addr,
+                                            MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+uint32_t glue(lduw_le_phys, SUFFIX)(ARG1_DECL, hwaddr addr)
+{
+    return glue(address_space_lduw_le, SUFFIX)(ARG1, addr,
+                                               MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+uint32_t glue(lduw_be_phys, SUFFIX)(ARG1_DECL, hwaddr addr)
+{
+    return glue(address_space_lduw_be, SUFFIX)(ARG1, addr,
+                                               MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+/* warning: addr must be aligned. The ram page is not masked as dirty
+   and the code inside is not invalidated. It is useful if the dirty
+   bits are used to track modified PTEs */
+void glue(address_space_stl_notdirty, SUFFIX)(ARG1_DECL,
+    hwaddr addr, uint32_t val, MemTxAttrs attrs, MemTxResult *result)
+{
+    uint8_t *ptr;
+    MemoryRegion *mr;
+    hwaddr l = 4;
+    hwaddr addr1;
+    MemTxResult r;
+    uint8_t dirty_log_mask;
+    bool release_lock = false;
+
+    RCU_READ_LOCK();
+    mr = TRANSLATE(addr, &addr1, &l, true);
+    if (l < 4 || !IS_DIRECT(mr, true)) {
+        release_lock |= prepare_mmio_access(mr);
+
+        r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
+    } else {
+        ptr = MAP_RAM(mr, addr1);
+        stl_p(ptr, val);
+
+        dirty_log_mask = memory_region_get_dirty_log_mask(mr);
+        dirty_log_mask &= ~(1 << DIRTY_MEMORY_CODE);
+        cpu_physical_memory_set_dirty_range(memory_region_get_ram_addr(mr) + addr,
+                                            4, dirty_log_mask);
+        r = MEMTX_OK;
+    }
+    if (result) {
+        *result = r;
+    }
+    if (release_lock) {
+        qemu_mutex_unlock_iothread();
+    }
+    RCU_READ_UNLOCK();
+}
+
+void glue(stl_phys_notdirty, SUFFIX)(ARG1_DECL, hwaddr addr, uint32_t val)
+{
+    glue(address_space_stl_notdirty, SUFFIX)(ARG1, addr, val,
+                                             MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+/* warning: addr must be aligned */
+static inline void glue(address_space_stl_internal, SUFFIX)(ARG1_DECL,
+    hwaddr addr, uint32_t val, MemTxAttrs attrs,
+    MemTxResult *result, enum device_endian endian)
+{
+    uint8_t *ptr;
+    MemoryRegion *mr;
+    hwaddr l = 4;
+    hwaddr addr1;
+    MemTxResult r;
+    bool release_lock = false;
+
+    RCU_READ_LOCK();
+    mr = TRANSLATE(addr, &addr1, &l, true);
+    if (l < 4 || !IS_DIRECT(mr, true)) {
+        release_lock |= prepare_mmio_access(mr);
+
+#if defined(TARGET_WORDS_BIGENDIAN)
+        if (endian == DEVICE_LITTLE_ENDIAN) {
+            val = bswap32(val);
+        }
+#else
+        if (endian == DEVICE_BIG_ENDIAN) {
+            val = bswap32(val);
+        }
+#endif
+        r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
+    } else {
+        /* RAM case */
+        ptr = MAP_RAM(mr, addr1);
+        switch (endian) {
+        case DEVICE_LITTLE_ENDIAN:
+            stl_le_p(ptr, val);
+            break;
+        case DEVICE_BIG_ENDIAN:
+            stl_be_p(ptr, val);
+            break;
+        default:
+            stl_p(ptr, val);
+            break;
+        }
+        INVALIDATE(mr, addr1, 4);
+        r = MEMTX_OK;
+    }
+    if (result) {
+        *result = r;
+    }
+    if (release_lock) {
+        qemu_mutex_unlock_iothread();
+    }
+    RCU_READ_UNLOCK();
+}
+
+void glue(address_space_stl, SUFFIX)(ARG1_DECL,
+    hwaddr addr, uint32_t val, MemTxAttrs attrs, MemTxResult *result)
+{
+    glue(address_space_stl_internal, SUFFIX)(ARG1, addr, val, attrs,
+                                             result, DEVICE_NATIVE_ENDIAN);
+}
+
+void glue(address_space_stl_le, SUFFIX)(ARG1_DECL,
+    hwaddr addr, uint32_t val, MemTxAttrs attrs, MemTxResult *result)
+{
+    glue(address_space_stl_internal, SUFFIX)(ARG1, addr, val, attrs,
+                                             result, DEVICE_LITTLE_ENDIAN);
+}
+
+void glue(address_space_stl_be, SUFFIX)(ARG1_DECL,
+    hwaddr addr, uint32_t val, MemTxAttrs attrs, MemTxResult *result)
+{
+    glue(address_space_stl_internal, SUFFIX)(ARG1, addr, val, attrs,
+                                             result, DEVICE_BIG_ENDIAN);
+}
+
+void glue(stl_phys, SUFFIX)(ARG1_DECL, hwaddr addr, uint32_t val)
+{
+    glue(address_space_stl, SUFFIX)(ARG1, addr, val,
+                                    MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+void glue(stl_le_phys, SUFFIX)(ARG1_DECL, hwaddr addr, uint32_t val)
+{
+    glue(address_space_stl_le, SUFFIX)(ARG1, addr, val,
+                                       MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+void glue(stl_be_phys, SUFFIX)(ARG1_DECL, hwaddr addr, uint32_t val)
+{
+    glue(address_space_stl_be, SUFFIX)(ARG1, addr, val,
+                                       MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+void glue(address_space_stb, SUFFIX)(ARG1_DECL,
+    hwaddr addr, uint32_t val, MemTxAttrs attrs, MemTxResult *result)
+{
+    uint8_t *ptr;
+    MemoryRegion *mr;
+    hwaddr l = 1;
+    hwaddr addr1;
+    MemTxResult r;
+    bool release_lock = false;
+
+    RCU_READ_LOCK();
+    mr = TRANSLATE(addr, &addr1, &l, true);
+    if (!IS_DIRECT(mr, true)) {
+        release_lock |= prepare_mmio_access(mr);
+        r = memory_region_dispatch_write(mr, addr1, val, 1, attrs);
+    } else {
+        /* RAM case */
+        ptr = MAP_RAM(mr, addr1);
+        stb_p(ptr, val);
+        INVALIDATE(mr, addr1, 1);
+        r = MEMTX_OK;
+    }
+    if (result) {
+        *result = r;
+    }
+    if (release_lock) {
+        qemu_mutex_unlock_iothread();
+    }
+    RCU_READ_UNLOCK();
+}
+
+void glue(stb_phys, SUFFIX)(ARG1_DECL, hwaddr addr, uint32_t val)
+{
+    glue(address_space_stb, SUFFIX)(ARG1, addr, val,
+                                    MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+/* warning: addr must be aligned */
+static inline void glue(address_space_stw_internal, SUFFIX)(ARG1_DECL,
+    hwaddr addr, uint32_t val, MemTxAttrs attrs,
+    MemTxResult *result, enum device_endian endian)
+{
+    uint8_t *ptr;
+    MemoryRegion *mr;
+    hwaddr l = 2;
+    hwaddr addr1;
+    MemTxResult r;
+    bool release_lock = false;
+
+    RCU_READ_LOCK();
+    mr = TRANSLATE(addr, &addr1, &l, true);
+    if (l < 2 || !IS_DIRECT(mr, true)) {
+        release_lock |= prepare_mmio_access(mr);
+
+#if defined(TARGET_WORDS_BIGENDIAN)
+        if (endian == DEVICE_LITTLE_ENDIAN) {
+            val = bswap16(val);
+        }
+#else
+        if (endian == DEVICE_BIG_ENDIAN) {
+            val = bswap16(val);
+        }
+#endif
+        r = memory_region_dispatch_write(mr, addr1, val, 2, attrs);
+    } else {
+        /* RAM case */
+        ptr = MAP_RAM(mr, addr1);
+        switch (endian) {
+        case DEVICE_LITTLE_ENDIAN:
+            stw_le_p(ptr, val);
+            break;
+        case DEVICE_BIG_ENDIAN:
+            stw_be_p(ptr, val);
+            break;
+        default:
+            stw_p(ptr, val);
+            break;
+        }
+        INVALIDATE(mr, addr1, 2);
+        r = MEMTX_OK;
+    }
+    if (result) {
+        *result = r;
+    }
+    if (release_lock) {
+        qemu_mutex_unlock_iothread();
+    }
+    RCU_READ_UNLOCK();
+}
+
+void glue(address_space_stw, SUFFIX)(ARG1_DECL,
+    hwaddr addr, uint32_t val, MemTxAttrs attrs, MemTxResult *result)
+{
+    glue(address_space_stw_internal, SUFFIX)(ARG1, addr, val, attrs, result,
+                                             DEVICE_NATIVE_ENDIAN);
+}
+
+void glue(address_space_stw_le, SUFFIX)(ARG1_DECL,
+    hwaddr addr, uint32_t val, MemTxAttrs attrs, MemTxResult *result)
+{
+    glue(address_space_stw_internal, SUFFIX)(ARG1, addr, val, attrs, result,
+                                             DEVICE_LITTLE_ENDIAN);
+}
+
+void glue(address_space_stw_be, SUFFIX)(ARG1_DECL,
+    hwaddr addr, uint32_t val, MemTxAttrs attrs, MemTxResult *result)
+{
+    glue(address_space_stw_internal, SUFFIX)(ARG1, addr, val, attrs, result,
+                               DEVICE_BIG_ENDIAN);
+}
+
+void glue(stw_phys, SUFFIX)(ARG1_DECL, hwaddr addr, uint32_t val)
+{
+    glue(address_space_stw, SUFFIX)(ARG1, addr, val,
+                                    MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+void glue(stw_le_phys, SUFFIX)(ARG1_DECL, hwaddr addr, uint32_t val)
+{
+    glue(address_space_stw_le, SUFFIX)(ARG1, addr, val,
+                                       MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+void glue(stw_be_phys, SUFFIX)(ARG1_DECL, hwaddr addr, uint32_t val)
+{
+    glue(address_space_stw_be, SUFFIX)(ARG1, addr, val,
+                                       MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+static void glue(address_space_stq_internal, SUFFIX)(ARG1_DECL,
+    hwaddr addr, uint64_t val, MemTxAttrs attrs,
+    MemTxResult *result, enum device_endian endian)
+{
+    uint8_t *ptr;
+    MemoryRegion *mr;
+    hwaddr l = 8;
+    hwaddr addr1;
+    MemTxResult r;
+    bool release_lock = false;
+
+    RCU_READ_LOCK();
+    mr = TRANSLATE(addr, &addr1, &l, true);
+    if (l < 8 || !IS_DIRECT(mr, true)) {
+        release_lock |= prepare_mmio_access(mr);
+
+#if defined(TARGET_WORDS_BIGENDIAN)
+        if (endian == DEVICE_LITTLE_ENDIAN) {
+            val = bswap64(val);
+        }
+#else
+        if (endian == DEVICE_BIG_ENDIAN) {
+            val = bswap64(val);
+        }
+#endif
+        r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
+    } else {
+        /* RAM case */
+        ptr = MAP_RAM(mr, addr1);
+        switch (endian) {
+        case DEVICE_LITTLE_ENDIAN:
+            stq_le_p(ptr, val);
+            break;
+        case DEVICE_BIG_ENDIAN:
+            stq_be_p(ptr, val);
+            break;
+        default:
+            stq_p(ptr, val);
+            break;
+        }
+        INVALIDATE(mr, addr1, 8);
+        r = MEMTX_OK;
+    }
+    if (result) {
+        *result = r;
+    }
+    if (release_lock) {
+        qemu_mutex_unlock_iothread();
+    }
+    RCU_READ_UNLOCK();
+}
+
+void glue(address_space_stq, SUFFIX)(ARG1_DECL,
+    hwaddr addr, uint64_t val, MemTxAttrs attrs, MemTxResult *result)
+{
+    glue(address_space_stq_internal, SUFFIX)(ARG1, addr, val, attrs, result,
+                                             DEVICE_NATIVE_ENDIAN);
+}
+
+void glue(address_space_stq_le, SUFFIX)(ARG1_DECL,
+    hwaddr addr, uint64_t val, MemTxAttrs attrs, MemTxResult *result)
+{
+    glue(address_space_stq_internal, SUFFIX)(ARG1, addr, val, attrs, result,
+                                             DEVICE_LITTLE_ENDIAN);
+}
+
+void glue(address_space_stq_be, SUFFIX)(ARG1_DECL,
+    hwaddr addr, uint64_t val, MemTxAttrs attrs, MemTxResult *result)
+{
+    glue(address_space_stq_internal, SUFFIX)(ARG1, addr, val, attrs, result,
+                                             DEVICE_BIG_ENDIAN);
+}
+
+void glue(stq_phys, SUFFIX)(ARG1_DECL, hwaddr addr, uint64_t val)
+{
+    glue(address_space_stq, SUFFIX)(ARG1, addr, val,
+                                    MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+void glue(stq_le_phys, SUFFIX)(ARG1_DECL, hwaddr addr, uint64_t val)
+{
+    glue(address_space_stq_le, SUFFIX)(ARG1, addr, val,
+                                       MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+void glue(stq_be_phys, SUFFIX)(ARG1_DECL, hwaddr addr, uint64_t val)
+{
+    glue(address_space_stq_be, SUFFIX)(ARG1, addr, val,
+                                       MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
+#undef ARG1_DECL
+#undef ARG1
+#undef SUFFIX
+#undef TRANSLATE
+#undef IS_DIRECT
+#undef MAP_RAM
+#undef INVALIDATE
+#undef RCU_READ_LOCK
+#undef RCU_READ_UNLOCK
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 03/11] exec: introduce address_space_extend_translation
  2016-12-12 11:18 [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches Paolo Bonzini
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 01/11] exec: optimize remaining address_space_* cases Paolo Bonzini
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 02/11] exec: introduce memory_ldst.inc.c Paolo Bonzini
@ 2016-12-12 11:18 ` Paolo Bonzini
  2016-12-12 13:47   ` Stefan Hajnoczi
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 04/11] exec: introduce MemoryRegionCache Paolo Bonzini
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-12-12 11:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, famz, mst, borntraeger

This extracts the common part of address_space_map and
address_space_cache_init into a new function.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 exec.c | 50 +++++++++++++++++++++++++++++---------------------
 1 file changed, 29 insertions(+), 21 deletions(-)

diff --git a/exec.c b/exec.c
index 8568a6f..d4b3656 100644
--- a/exec.c
+++ b/exec.c
@@ -2938,6 +2938,31 @@ bool address_space_access_valid(AddressSpace *as, hwaddr addr, int len, bool is_
     return true;
 }
 
+static hwaddr
+address_space_extend_translation(AddressSpace *as, hwaddr addr, hwaddr target_len,
+                                 MemoryRegion *mr, hwaddr base, hwaddr len,
+                                 bool is_write)
+{
+    hwaddr done = 0;
+    hwaddr xlat;
+    MemoryRegion *this_mr;
+
+    for (;;) {
+        target_len -= len;
+        addr += len;
+        done += len;
+        if (target_len == 0) {
+            return done;
+        }
+
+        len = target_len;
+        this_mr = address_space_translate(as, addr, &xlat, &len, is_write);
+        if (this_mr != mr || xlat != base + done) {
+            return done;
+        }
+    }
+}
+
 /* Map a physical memory region into a host virtual address.
  * May map a subset of the requested range, given by and returned in *plen.
  * May return NULL if resources needed to perform the mapping are exhausted.
@@ -2951,9 +2976,8 @@ void *address_space_map(AddressSpace *as,
                         bool is_write)
 {
     hwaddr len = *plen;
-    hwaddr done = 0;
-    hwaddr l, xlat, base;
-    MemoryRegion *mr, *this_mr;
+    hwaddr l, xlat;
+    MemoryRegion *mr;
     void *ptr;
 
     if (len == 0) {
@@ -2987,26 +3011,10 @@ void *address_space_map(AddressSpace *as,
         return bounce.buffer;
     }
 
-    base = xlat;
-
-    for (;;) {
-        len -= l;
-        addr += l;
-        done += l;
-        if (len == 0) {
-            break;
-        }
-
-        l = len;
-        this_mr = address_space_translate(as, addr, &xlat, &l, is_write);
-        if (this_mr != mr || xlat != base + done) {
-            break;
-        }
-    }
 
     memory_region_ref(mr);
-    *plen = done;
-    ptr = qemu_ram_ptr_length(mr->ram_block, base, plen);
+    *plen = address_space_extend_translation(as, addr, len, mr, xlat, l, is_write);
+    ptr = qemu_ram_ptr_length(mr->ram_block, xlat, plen);
     rcu_read_unlock();
 
     return ptr;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 04/11] exec: introduce MemoryRegionCache
  2016-12-12 11:18 [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches Paolo Bonzini
                   ` (2 preceding siblings ...)
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 03/11] exec: introduce address_space_extend_translation Paolo Bonzini
@ 2016-12-12 11:18 ` Paolo Bonzini
  2016-12-12 14:06   ` Stefan Hajnoczi
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 05/11] virtio: make virtio_should_notify static Paolo Bonzini
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-12-12 11:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, famz, mst, borntraeger

Device models often have to perform multiple access to a single
memory region that is known in advance, but would to use "DMA-style"
functions instead of address_space_map/unmap.  This can happen
for example when the data has to undergo endianness conversion.
Introduce a new data structure to cache the result of
address_space_translate without forcing usage of a host address
like address_space_map does.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 exec.c                  |  76 ++++++++++++++++++++++++
 include/exec/cpu-all.h  |  23 ++++++++
 include/exec/memory.h   | 151 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/qemu/typedefs.h |   1 +
 4 files changed, 251 insertions(+)

diff --git a/exec.c b/exec.c
index d4b3656..8d4bb0e 100644
--- a/exec.c
+++ b/exec.c
@@ -3077,6 +3077,82 @@ void cpu_physical_memory_unmap(void *buffer, hwaddr len,
 #define RCU_READ_UNLOCK(...)     rcu_read_unlock()
 #include "memory_ldst.inc.c"
 
+int64_t address_space_cache_init(MemoryRegionCache *cache,
+                                 AddressSpace *as,
+                                 hwaddr addr,
+                                 hwaddr len,
+                                 bool is_write)
+{
+    hwaddr l, xlat;
+    MemoryRegion *mr;
+    void *ptr;
+
+    assert(len > 0);
+
+    l = len;
+    mr = address_space_translate(as, addr, &xlat, &l, is_write);
+    if (!memory_access_is_direct(mr, is_write)) {
+        return -EINVAL;
+    }
+
+    l = address_space_extend_translation(as, addr, len, mr, xlat, l, is_write);
+    ptr = qemu_ram_ptr_length(mr->ram_block, xlat, &l);
+
+    cache->xlat = xlat;
+    cache->is_write = is_write;
+    cache->mr = mr;
+    cache->ptr = ptr;
+    cache->len = l;
+    memory_region_ref(cache->mr);
+
+    return l;
+}
+
+void address_space_cache_invalidate(MemoryRegionCache *cache,
+                                    hwaddr addr,
+                                    hwaddr access_len)
+{
+    assert(cache->is_write);
+    invalidate_and_set_dirty(cache->mr, addr + cache->xlat, access_len);
+}
+
+void address_space_cache_destroy(MemoryRegionCache *cache)
+{
+    if (!cache->mr) {
+        return;
+    }
+
+    if (xen_enabled()) {
+        xen_invalidate_map_cache_entry(cache->ptr);
+    }
+    memory_region_unref(cache->mr);
+}
+
+/* Called from RCU critical section.  This function has the same
+ * semantics as address_space_translate, but it only works on a
+ * predefined range of a MemoryRegion that was mapped with
+ * address_space_cache_init.
+ */
+static inline MemoryRegion *address_space_translate_cached(
+    MemoryRegionCache *cache, hwaddr addr, hwaddr *xlat,
+    hwaddr *plen, bool is_write)
+{
+    assert(addr < cache->len && *plen <= cache->len - addr);
+    *xlat = addr + cache->xlat;
+    return cache->mr;
+}
+
+#define ARG1_DECL                MemoryRegionCache *cache
+#define ARG1                     cache
+#define SUFFIX                   _cached
+#define TRANSLATE(...)           address_space_translate_cached(cache, __VA_ARGS__)
+#define IS_DIRECT(mr, is_write)  true
+#define MAP_RAM(mr, ofs)         (cache->ptr + (ofs - cache->xlat))
+#define INVALIDATE(mr, ofs, len) ((void)0)
+#define RCU_READ_LOCK()          ((void)0)
+#define RCU_READ_UNLOCK()        ((void)0)
+#include "memory_ldst.inc.c"
+
 /* virtual memory access for debug (includes writing to ROM) */
 int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr,
                         uint8_t *buf, int len, int is_write)
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index e9004e5..ffe43d5 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -186,6 +186,29 @@ void address_space_stl(AddressSpace *as, hwaddr addr, uint32_t val,
                             MemTxAttrs attrs, MemTxResult *result);
 void address_space_stq(AddressSpace *as, hwaddr addr, uint64_t val,
                             MemTxAttrs attrs, MemTxResult *result);
+
+uint32_t lduw_phys_cached(MemoryRegionCache *cache, hwaddr addr);
+uint32_t ldl_phys_cached(MemoryRegionCache *cache, hwaddr addr);
+uint64_t ldq_phys_cached(MemoryRegionCache *cache, hwaddr addr);
+void stl_phys_notdirty_cached(MemoryRegionCache *cache, hwaddr addr, uint32_t val);
+void stw_phys_cached(MemoryRegionCache *cache, hwaddr addr, uint32_t val);
+void stl_phys_cached(MemoryRegionCache *cache, hwaddr addr, uint32_t val);
+void stq_phys_cached(MemoryRegionCache *cache, hwaddr addr, uint64_t val);
+
+uint32_t address_space_lduw_cached(MemoryRegionCache *cache, hwaddr addr,
+                            MemTxAttrs attrs, MemTxResult *result);
+uint32_t address_space_ldl_cached(MemoryRegionCache *cache, hwaddr addr,
+                            MemTxAttrs attrs, MemTxResult *result);
+uint64_t address_space_ldq_cached(MemoryRegionCache *cache, hwaddr addr,
+                            MemTxAttrs attrs, MemTxResult *result);
+void address_space_stl_notdirty_cached(MemoryRegionCache *cache, hwaddr addr,
+                            uint32_t val, MemTxAttrs attrs, MemTxResult *result);
+void address_space_stw_cached(MemoryRegionCache *cache, hwaddr addr, uint32_t val,
+                            MemTxAttrs attrs, MemTxResult *result);
+void address_space_stl_cached(MemoryRegionCache *cache, hwaddr addr, uint32_t val,
+                            MemTxAttrs attrs, MemTxResult *result);
+void address_space_stq_cached(MemoryRegionCache *cache, hwaddr addr, uint64_t val,
+                            MemTxAttrs attrs, MemTxResult *result);
 #endif
 
 /* page related stuff */
diff --git a/include/exec/memory.h b/include/exec/memory.h
index f35b612..64560f6 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1419,6 +1419,125 @@ void stl_be_phys(AddressSpace *as, hwaddr addr, uint32_t val);
 void stq_le_phys(AddressSpace *as, hwaddr addr, uint64_t val);
 void stq_be_phys(AddressSpace *as, hwaddr addr, uint64_t val);
 
+struct MemoryRegionCache {
+    hwaddr xlat;
+    void *ptr;
+    hwaddr len;
+    MemoryRegion *mr;
+    bool is_write;
+};
+
+/* address_space_cache_init: prepare for repeated access to a physical
+ * memory region
+ *
+ * @cache: #MemoryRegionCache to be filled
+ * @as: #AddressSpace to be accessed
+ * @addr: address within that address space
+ * @len: length of buffer
+ * @is_write: indicates the transfer direction
+ *
+ * Will only work with RAM, and may map a subset of the requested range by
+ * returning a value that is less than @len.  On failure, return a negative
+ * errno value.
+ *
+ * Because it only works with RAM, this function can be used for
+ * read-modify-write operations.  In this case, is_write should be %true.
+ *
+ * Note that addresses passed to the address_space_*_cached functions
+ * are relative to @addr.
+ */
+int64_t address_space_cache_init(MemoryRegionCache *cache,
+                                 AddressSpace *as,
+                                 hwaddr addr,
+                                 hwaddr len,
+                                 bool is_write);
+
+/**
+ * address_space_cache_invalidate: complete a write to a #MemoryRegionCache
+ *
+ * @cache: The #MemoryRegionCache to operate on.
+ * @addr: The first physical address that was written, relative to the
+ * address that was passed to @address_space_cache_init.
+ * @access_len: The number of bytes that were written starting at @addr.
+ */
+void address_space_cache_invalidate(MemoryRegionCache *cache,
+                                    hwaddr addr,
+                                    hwaddr access_len);
+
+/**
+ * address_space_cache_destroy: free a #MemoryRegionCache
+ *
+ * @cache: The #MemoryRegionCache whose memory should be released.
+ */
+void address_space_cache_destroy(MemoryRegionCache *cache);
+
+/* address_space_ld*_cached: load from a cached #MemoryRegion
+ * address_space_st*_cached: store into a cached #MemoryRegion
+ *
+ * These functions perform a load or store of the byte, word,
+ * longword or quad to the specified address.  The address is
+ * a physical address in the AddressSpace, but it must lie within
+ * a #MemoryRegion that was mapped with address_space_cache_init.
+ *
+ * The _le suffixed functions treat the data as little endian;
+ * _be indicates big endian; no suffix indicates "same endianness
+ * as guest CPU".
+ *
+ * The "guest CPU endianness" accessors are deprecated for use outside
+ * target-* code; devices should be CPU-agnostic and use either the LE
+ * or the BE accessors.
+ *
+ * @cache: previously initialized #MemoryRegionCache to be accessed
+ * @addr: address within the address space
+ * @val: data value, for stores
+ * @attrs: memory transaction attributes
+ * @result: location to write the success/failure of the transaction;
+ *   if NULL, this information is discarded
+ */
+uint32_t address_space_ldub_cached(MemoryRegionCache *cache, hwaddr addr,
+                            MemTxAttrs attrs, MemTxResult *result);
+uint32_t address_space_lduw_le_cached(MemoryRegionCache *cache, hwaddr addr,
+                            MemTxAttrs attrs, MemTxResult *result);
+uint32_t address_space_lduw_be_cached(MemoryRegionCache *cache, hwaddr addr,
+                            MemTxAttrs attrs, MemTxResult *result);
+uint32_t address_space_ldl_le_cached(MemoryRegionCache *cache, hwaddr addr,
+                            MemTxAttrs attrs, MemTxResult *result);
+uint32_t address_space_ldl_be_cached(MemoryRegionCache *cache, hwaddr addr,
+                            MemTxAttrs attrs, MemTxResult *result);
+uint64_t address_space_ldq_le_cached(MemoryRegionCache *cache, hwaddr addr,
+                            MemTxAttrs attrs, MemTxResult *result);
+uint64_t address_space_ldq_be_cached(MemoryRegionCache *cache, hwaddr addr,
+                            MemTxAttrs attrs, MemTxResult *result);
+void address_space_stb_cached(MemoryRegionCache *cache, hwaddr addr, uint32_t val,
+                            MemTxAttrs attrs, MemTxResult *result);
+void address_space_stw_le_cached(MemoryRegionCache *cache, hwaddr addr, uint32_t val,
+                            MemTxAttrs attrs, MemTxResult *result);
+void address_space_stw_be_cached(MemoryRegionCache *cache, hwaddr addr, uint32_t val,
+                            MemTxAttrs attrs, MemTxResult *result);
+void address_space_stl_le_cached(MemoryRegionCache *cache, hwaddr addr, uint32_t val,
+                            MemTxAttrs attrs, MemTxResult *result);
+void address_space_stl_be_cached(MemoryRegionCache *cache, hwaddr addr, uint32_t val,
+                            MemTxAttrs attrs, MemTxResult *result);
+void address_space_stq_le_cached(MemoryRegionCache *cache, hwaddr addr, uint64_t val,
+                            MemTxAttrs attrs, MemTxResult *result);
+void address_space_stq_be_cached(MemoryRegionCache *cache, hwaddr addr, uint64_t val,
+                            MemTxAttrs attrs, MemTxResult *result);
+
+uint32_t ldub_phys_cached(MemoryRegionCache *cache, hwaddr addr);
+uint32_t lduw_le_phys_cached(MemoryRegionCache *cache, hwaddr addr);
+uint32_t lduw_be_phys_cached(MemoryRegionCache *cache, hwaddr addr);
+uint32_t ldl_le_phys_cached(MemoryRegionCache *cache, hwaddr addr);
+uint32_t ldl_be_phys_cached(MemoryRegionCache *cache, hwaddr addr);
+uint64_t ldq_le_phys_cached(MemoryRegionCache *cache, hwaddr addr);
+uint64_t ldq_be_phys_cached(MemoryRegionCache *cache, hwaddr addr);
+void stb_phys_cached(MemoryRegionCache *cache, hwaddr addr, uint32_t val);
+void stw_le_phys_cached(MemoryRegionCache *cache, hwaddr addr, uint32_t val);
+void stw_be_phys_cached(MemoryRegionCache *cache, hwaddr addr, uint32_t val);
+void stl_le_phys_cached(MemoryRegionCache *cache, hwaddr addr, uint32_t val);
+void stl_be_phys_cached(MemoryRegionCache *cache, hwaddr addr, uint32_t val);
+void stq_le_phys_cached(MemoryRegionCache *cache, hwaddr addr, uint64_t val);
+void stq_be_phys_cached(MemoryRegionCache *cache, hwaddr addr, uint64_t val);
+
 /* address_space_translate: translate an address range into an address space
  * into a MemoryRegion and an address range into that section.  Should be
  * called from an RCU critical section, to avoid that the last reference
@@ -1544,6 +1663,38 @@ MemTxResult address_space_read(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
     return result;
 }
 
+/**
+ * address_space_read_cached: read from a cached RAM region
+ *
+ * @cache: Cached region to be addressed
+ * @addr: address relative to the base of the RAM region
+ * @buf: buffer with the data transferred
+ * @len: length of the data transferred
+ */
+static inline void
+address_space_read_cached(MemoryRegionCache *cache, hwaddr addr,
+                          void *buf, int len)
+{
+    assert(addr < cache->len && len <= cache->len - addr);
+    memcpy(buf, cache->ptr + addr, len);
+}
+
+/**
+ * address_space_write_cached: write to a cached RAM region
+ *
+ * @cache: Cached region to be addressed
+ * @addr: address relative to the base of the RAM region
+ * @buf: buffer with the data transferred
+ * @len: length of the data transferred
+ */
+static inline void
+address_space_write_cached(MemoryRegionCache *cache, hwaddr addr,
+                           void *buf, int len)
+{
+    assert(addr < cache->len && len <= cache->len - addr);
+    memcpy(cache->ptr + addr, buf, len);
+}
+
 #endif
 
 #endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 1b8c30a..9a8bcbd 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -45,6 +45,7 @@ typedef struct MachineState MachineState;
 typedef struct MemoryListener MemoryListener;
 typedef struct MemoryMappingList MemoryMappingList;
 typedef struct MemoryRegion MemoryRegion;
+typedef struct MemoryRegionCache MemoryRegionCache;
 typedef struct MemoryRegionSection MemoryRegionSection;
 typedef struct MigrationIncomingState MigrationIncomingState;
 typedef struct MigrationParams MigrationParams;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 05/11] virtio: make virtio_should_notify static
  2016-12-12 11:18 [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches Paolo Bonzini
                   ` (3 preceding siblings ...)
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 04/11] exec: introduce MemoryRegionCache Paolo Bonzini
@ 2016-12-12 11:18 ` Paolo Bonzini
  2016-12-12 14:07   ` Stefan Hajnoczi
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 06/11] virtio: add virtio_*_phys_cached Paolo Bonzini
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-12-12 11:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, famz, mst, borntraeger

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/virtio/virtio.c         | 2 +-
 include/hw/virtio/virtio.h | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 1af2de2..568f4be 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1330,7 +1330,7 @@ static void virtio_set_isr(VirtIODevice *vdev, int value)
     }
 }
 
-bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
+static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
 {
     uint16_t old, new;
     bool v;
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index ab0e030..9b21795 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -180,7 +180,6 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
                                unsigned int *out_bytes,
                                unsigned max_in_bytes, unsigned max_out_bytes);
 
-bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq);
 void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq);
 void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 06/11] virtio: add virtio_*_phys_cached
  2016-12-12 11:18 [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches Paolo Bonzini
                   ` (4 preceding siblings ...)
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 05/11] virtio: make virtio_should_notify static Paolo Bonzini
@ 2016-12-12 11:18 ` Paolo Bonzini
  2016-12-12 14:08   ` Stefan Hajnoczi
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 07/11] virtio: use address_space_map/unmap to access descriptors Paolo Bonzini
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-12-12 11:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, famz, mst, borntraeger

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/hw/virtio/virtio-access.h | 52 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h
index 440b455..0771ca0 100644
--- a/include/hw/virtio/virtio-access.h
+++ b/include/hw/virtio/virtio-access.h
@@ -145,6 +145,58 @@ static inline uint16_t virtio_tswap16(VirtIODevice *vdev, uint16_t s)
 #endif
 }
 
+static inline uint16_t virtio_lduw_phys_cached(VirtIODevice *vdev,
+                                               MemoryRegionCache *cache,
+                                               hwaddr pa)
+{
+    if (virtio_access_is_big_endian(vdev)) {
+        return lduw_be_phys_cached(cache, pa);
+    }
+    return lduw_le_phys_cached(cache, pa);
+}
+
+static inline uint32_t virtio_ldl_phys_cached(VirtIODevice *vdev,
+                                              MemoryRegionCache *cache,
+                                              hwaddr pa)
+{
+    if (virtio_access_is_big_endian(vdev)) {
+        return ldl_be_phys_cached(cache, pa);
+    }
+    return ldl_le_phys_cached(cache, pa);
+}
+
+static inline uint64_t virtio_ldq_phys_cached(VirtIODevice *vdev,
+                                              MemoryRegionCache *cache,
+                                              hwaddr pa)
+{
+    if (virtio_access_is_big_endian(vdev)) {
+        return ldq_be_phys_cached(cache, pa);
+    }
+    return ldq_le_phys_cached(cache, pa);
+}
+
+static inline void virtio_stw_phys_cached(VirtIODevice *vdev,
+                                          MemoryRegionCache *cache,
+                                          hwaddr pa, uint16_t value)
+{
+    if (virtio_access_is_big_endian(vdev)) {
+        stw_be_phys_cached(cache, pa, value);
+    } else {
+        stw_le_phys_cached(cache, pa, value);
+    }
+}
+
+static inline void virtio_stl_phys_cached(VirtIODevice *vdev,
+                                          MemoryRegionCache *cache,
+                                          hwaddr pa, uint32_t value)
+{
+    if (virtio_access_is_big_endian(vdev)) {
+        stl_be_phys_cached(cache, pa, value);
+    } else {
+        stl_le_phys_cached(cache, pa, value);
+    }
+}
+
 static inline void virtio_tswap16s(VirtIODevice *vdev, uint16_t *s)
 {
     *s = virtio_tswap16(vdev, *s);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 07/11] virtio: use address_space_map/unmap to access descriptors
  2016-12-12 11:18 [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches Paolo Bonzini
                   ` (5 preceding siblings ...)
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 06/11] virtio: add virtio_*_phys_cached Paolo Bonzini
@ 2016-12-12 11:18 ` Paolo Bonzini
  2016-12-12 14:12   ` Stefan Hajnoczi
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 08/11] virtio: use MemoryRegionCache " Paolo Bonzini
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-12-12 11:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, famz, mst, borntraeger

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/virtio/virtio.c | 78 +++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 60 insertions(+), 18 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 568f4be..459f9dd 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -119,10 +119,9 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n)
 }
 
 static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
-                            hwaddr desc_pa, int i)
+                            uint8_t *desc_ptr, int i)
 {
-    address_space_read(&address_space_memory, desc_pa + i * sizeof(VRingDesc),
-                       MEMTXATTRS_UNSPECIFIED, (void *)desc, sizeof(VRingDesc));
+    memcpy(desc, desc_ptr + i * sizeof(VRingDesc), sizeof(VRingDesc));
     virtio_tswap64s(vdev, &desc->addr);
     virtio_tswap32s(vdev, &desc->len);
     virtio_tswap16s(vdev, &desc->flags);
@@ -405,7 +404,7 @@ enum {
 };
 
 static int virtqueue_read_next_desc(VirtIODevice *vdev, VRingDesc *desc,
-                                    hwaddr desc_pa, unsigned int max,
+                                    void *desc_ptr, unsigned int max,
                                     unsigned int *next)
 {
     /* If this descriptor says it doesn't chain, we're done. */
@@ -423,7 +422,7 @@ static int virtqueue_read_next_desc(VirtIODevice *vdev, VRingDesc *desc,
         return VIRTQUEUE_READ_DESC_ERROR;
     }
 
-    vring_desc_read(vdev, desc, desc_pa, *next);
+    vring_desc_read(vdev, desc, desc_ptr, *next);
     return VIRTQUEUE_READ_DESC_MORE;
 }
 
@@ -433,6 +432,8 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
 {
     unsigned int idx;
     unsigned int total_bufs, in_total, out_total;
+    void *desc_ptr = NULL;
+    hwaddr len = 0;
     int rc;
 
     idx = vq->last_avail_idx;
@@ -442,7 +443,6 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
         VirtIODevice *vdev = vq->vdev;
         unsigned int max, num_bufs, indirect = 0;
         VRingDesc desc;
-        hwaddr desc_pa;
         unsigned int i;
 
         max = vq->vring.num;
@@ -452,10 +452,19 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
             goto err;
         }
 
-        desc_pa = vq->vring.desc;
-        vring_desc_read(vdev, &desc, desc_pa, i);
+        len = max * sizeof(VRingDesc);
+        desc_ptr = address_space_map(&address_space_memory, vq->vring.desc,
+                                     &len, false);
+        if (len < max * sizeof(VRingDesc)) {
+            virtio_error(vdev, "Cannot map descriptor ring");
+            goto err;
+        }
+
+        vring_desc_read(vdev, &desc, desc_ptr, i);
 
         if (desc.flags & VRING_DESC_F_INDIRECT) {
+            address_space_unmap(&address_space_memory, desc_ptr, len, false, 0);
+            len = desc.len;
             if (desc.len % sizeof(VRingDesc)) {
                 virtio_error(vdev, "Invalid size for indirect buffer table");
                 goto err;
@@ -468,11 +477,17 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
             }
 
             /* loop over the indirect descriptor table */
+            desc_ptr = address_space_map(&address_space_memory, desc.addr,
+                                         &len, false);
+            if (len < desc.len) {
+                virtio_error(vdev, "Cannot map indirect buffer");
+                goto err;
+            }
+
             indirect = 1;
             max = desc.len / sizeof(VRingDesc);
-            desc_pa = desc.addr;
             num_bufs = i = 0;
-            vring_desc_read(vdev, &desc, desc_pa, i);
+            vring_desc_read(vdev, &desc, desc_ptr, i);
         }
 
         do {
@@ -491,7 +506,7 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
                 goto done;
             }
 
-            rc = virtqueue_read_next_desc(vdev, &desc, desc_pa, max, &i);
+            rc = virtqueue_read_next_desc(vdev, &desc, desc_ptr, max, &i);
         } while (rc == VIRTQUEUE_READ_DESC_MORE);
 
         if (rc == VIRTQUEUE_READ_DESC_ERROR) {
@@ -509,6 +524,9 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
     }
 
 done:
+    if (desc_ptr) {
+        address_space_unmap(&address_space_memory, desc_ptr, len, false, 0);
+    }
     if (in_bytes) {
         *in_bytes = in_total;
     }
@@ -656,9 +674,10 @@ static void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_nu
 void *virtqueue_pop(VirtQueue *vq, size_t sz)
 {
     unsigned int i, head, max;
-    hwaddr desc_pa = vq->vring.desc;
+    void *desc_ptr = NULL;
+    hwaddr len;
     VirtIODevice *vdev = vq->vdev;
-    VirtQueueElement *elem;
+    VirtQueueElement *elem = NULL;
     unsigned out_num, in_num;
     hwaddr addr[VIRTQUEUE_MAX_SIZE];
     struct iovec iov[VIRTQUEUE_MAX_SIZE];
@@ -694,18 +713,35 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     }
 
     i = head;
-    vring_desc_read(vdev, &desc, desc_pa, i);
+
+    len = max * sizeof(VRingDesc);
+    desc_ptr = address_space_map(&address_space_memory, vq->vring.desc, &len,
+                                 false);
+    if (len < max * sizeof(VRingDesc)) {
+        virtio_error(vdev, "Cannot map descriptor ring");
+        return NULL;
+    }
+
+    vring_desc_read(vdev, &desc, desc_ptr, i);
     if (desc.flags & VRING_DESC_F_INDIRECT) {
+        address_space_unmap(&address_space_memory, desc_ptr, len, false, 0);
         if (desc.len % sizeof(VRingDesc)) {
             virtio_error(vdev, "Invalid size for indirect buffer table");
             return NULL;
         }
 
         /* loop over the indirect descriptor table */
+        len = desc.len;
+        desc_ptr = address_space_map(&address_space_memory, desc.addr,
+                                     &len, false);
+        if (len < desc.len) {
+            virtio_error(vdev, "Cannot map indirect buffer");
+            return NULL;
+        }
+
         max = desc.len / sizeof(VRingDesc);
-        desc_pa = desc.addr;
         i = 0;
-        vring_desc_read(vdev, &desc, desc_pa, i);
+        vring_desc_read(vdev, &desc, desc_ptr, i);
     }
 
     /* Collect all the descriptors */
@@ -736,7 +772,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
             goto err_undo_map;
         }
 
-        rc = virtqueue_read_next_desc(vdev, &desc, desc_pa, max, &i);
+        rc = virtqueue_read_next_desc(vdev, &desc, desc_ptr, max, &i);
     } while (rc == VIRTQUEUE_READ_DESC_MORE);
 
     if (rc == VIRTQUEUE_READ_DESC_ERROR) {
@@ -758,11 +794,17 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     vq->inuse++;
 
     trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
+done:
+    if (desc_ptr) {
+        address_space_unmap(&address_space_memory, desc_ptr, len, false, 0);
+    }
+
     return elem;
 
 err_undo_map:
     virtqueue_undo_map_desc(out_num, in_num, iov);
-    return NULL;
+    elem = NULL;
+    goto done;
 }
 
 /* Reading and writing a structure directly to QEMUFile is *awful*, but
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 08/11] virtio: use MemoryRegionCache to access descriptors
  2016-12-12 11:18 [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches Paolo Bonzini
                   ` (6 preceding siblings ...)
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 07/11] virtio: use address_space_map/unmap to access descriptors Paolo Bonzini
@ 2016-12-12 11:18 ` Paolo Bonzini
  2016-12-12 14:17   ` Stefan Hajnoczi
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 09/11] virtio: add MemoryListener to cache ring translations Paolo Bonzini
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-12-12 11:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, famz, mst, borntraeger

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/virtio/virtio.c | 102 +++++++++++++++++++++++++++++------------------------
 1 file changed, 56 insertions(+), 46 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 459f9dd..562e2b7 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -119,9 +119,10 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n)
 }
 
 static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
-                            uint8_t *desc_ptr, int i)
+                            MemoryRegionCache *cache, int i)
 {
-    memcpy(desc, desc_ptr + i * sizeof(VRingDesc), sizeof(VRingDesc));
+    address_space_read_cached(cache, i * sizeof(VRingDesc),
+                              desc, sizeof(VRingDesc));
     virtio_tswap64s(vdev, &desc->addr);
     virtio_tswap32s(vdev, &desc->len);
     virtio_tswap16s(vdev, &desc->flags);
@@ -404,7 +405,7 @@ enum {
 };
 
 static int virtqueue_read_next_desc(VirtIODevice *vdev, VRingDesc *desc,
-                                    void *desc_ptr, unsigned int max,
+                                    MemoryRegionCache *desc_cache, unsigned int max,
                                     unsigned int *next)
 {
     /* If this descriptor says it doesn't chain, we're done. */
@@ -422,7 +423,7 @@ static int virtqueue_read_next_desc(VirtIODevice *vdev, VRingDesc *desc,
         return VIRTQUEUE_READ_DESC_ERROR;
     }
 
-    vring_desc_read(vdev, desc, desc_ptr, *next);
+    vring_desc_read(vdev, desc, desc_cache, *next);
     return VIRTQUEUE_READ_DESC_MORE;
 }
 
@@ -430,41 +431,42 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
                                unsigned int *out_bytes,
                                unsigned max_in_bytes, unsigned max_out_bytes)
 {
-    unsigned int idx;
+    VirtIODevice *vdev = vq->vdev;
+    unsigned int max, idx;
     unsigned int total_bufs, in_total, out_total;
-    void *desc_ptr = NULL;
-    hwaddr len = 0;
+    MemoryRegionCache *desc_cache = NULL;
+    MemoryRegionCache vring_desc_cache;
+    MemoryRegionCache indirect_desc_cache;
+    int64_t len = 0;
     int rc;
 
+    rcu_read_lock();
     idx = vq->last_avail_idx;
+    max = vq->vring.num;
+    len = address_space_cache_init(&vring_desc_cache, &address_space_memory,
+                                   vq->vring.desc, max * sizeof(VRingDesc),
+                                   false);
+    if (len < max * sizeof(VRingDesc)) {
+        virtio_error(vdev, "Cannot map descriptor ring");
+        goto err;
+    }
 
     total_bufs = in_total = out_total = 0;
     while ((rc = virtqueue_num_heads(vq, idx)) > 0) {
-        VirtIODevice *vdev = vq->vdev;
-        unsigned int max, num_bufs, indirect = 0;
+        unsigned int num_bufs;
         VRingDesc desc;
         unsigned int i;
 
-        max = vq->vring.num;
         num_bufs = total_bufs;
 
         if (!virtqueue_get_head(vq, idx++, &i)) {
             goto err;
         }
 
-        len = max * sizeof(VRingDesc);
-        desc_ptr = address_space_map(&address_space_memory, vq->vring.desc,
-                                     &len, false);
-        if (len < max * sizeof(VRingDesc)) {
-            virtio_error(vdev, "Cannot map descriptor ring");
-            goto err;
-        }
-
-        vring_desc_read(vdev, &desc, desc_ptr, i);
+        desc_cache = &vring_desc_cache;
+        vring_desc_read(vdev, &desc, desc_cache, i);
 
         if (desc.flags & VRING_DESC_F_INDIRECT) {
-            address_space_unmap(&address_space_memory, desc_ptr, len, false, 0);
-            len = desc.len;
             if (desc.len % sizeof(VRingDesc)) {
                 virtio_error(vdev, "Invalid size for indirect buffer table");
                 goto err;
@@ -477,17 +479,18 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
             }
 
             /* loop over the indirect descriptor table */
-            desc_ptr = address_space_map(&address_space_memory, desc.addr,
-                                         &len, false);
+            len = address_space_cache_init(&indirect_desc_cache,
+                                           &address_space_memory,
+                                           desc.addr, desc.len, false);
+            desc_cache = &indirect_desc_cache;
             if (len < desc.len) {
                 virtio_error(vdev, "Cannot map indirect buffer");
                 goto err;
             }
 
-            indirect = 1;
             max = desc.len / sizeof(VRingDesc);
             num_bufs = i = 0;
-            vring_desc_read(vdev, &desc, desc_ptr, i);
+            vring_desc_read(vdev, &desc, desc_cache, i);
         }
 
         do {
@@ -506,17 +509,19 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
                 goto done;
             }
 
-            rc = virtqueue_read_next_desc(vdev, &desc, desc_ptr, max, &i);
+            rc = virtqueue_read_next_desc(vdev, &desc, desc_cache, max, &i);
         } while (rc == VIRTQUEUE_READ_DESC_MORE);
 
         if (rc == VIRTQUEUE_READ_DESC_ERROR) {
             goto err;
         }
 
-        if (!indirect)
-            total_bufs = num_bufs;
-        else
+        if (desc_cache == &indirect_desc_cache) {
+            address_space_cache_destroy(&indirect_desc_cache);
             total_bufs++;
+        } else {
+            total_bufs = num_bufs;
+        }
     }
 
     if (rc < 0) {
@@ -524,15 +529,14 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
     }
 
 done:
-    if (desc_ptr) {
-        address_space_unmap(&address_space_memory, desc_ptr, len, false, 0);
-    }
+    address_space_cache_destroy(&vring_desc_cache);
     if (in_bytes) {
         *in_bytes = in_total;
     }
     if (out_bytes) {
         *out_bytes = out_total;
     }
+    rcu_read_unlock();
     return;
 
 err:
@@ -674,8 +678,10 @@ static void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_nu
 void *virtqueue_pop(VirtQueue *vq, size_t sz)
 {
     unsigned int i, head, max;
-    void *desc_ptr = NULL;
-    hwaddr len;
+    MemoryRegionCache *desc_cache = NULL;
+    MemoryRegionCache indirect_desc_cache;
+    MemoryRegionCache vring_desc_cache;
+    int64_t len;
     VirtIODevice *vdev = vq->vdev;
     VirtQueueElement *elem = NULL;
     unsigned out_num, in_num;
@@ -714,26 +720,28 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 
     i = head;
 
-    len = max * sizeof(VRingDesc);
-    desc_ptr = address_space_map(&address_space_memory, vq->vring.desc, &len,
-                                 false);
+    rcu_read_lock();
+    len = address_space_cache_init(&vring_desc_cache, &address_space_memory,
+                                   vq->vring.desc, max * sizeof(VRingDesc),
+                                   false);
+    desc_cache = &vring_desc_cache;
     if (len < max * sizeof(VRingDesc)) {
         virtio_error(vdev, "Cannot map descriptor ring");
         return NULL;
     }
 
-    vring_desc_read(vdev, &desc, desc_ptr, i);
+    vring_desc_read(vdev, &desc, desc_cache, i);
     if (desc.flags & VRING_DESC_F_INDIRECT) {
-        address_space_unmap(&address_space_memory, desc_ptr, len, false, 0);
         if (desc.len % sizeof(VRingDesc)) {
             virtio_error(vdev, "Invalid size for indirect buffer table");
             return NULL;
         }
 
         /* loop over the indirect descriptor table */
-        len = desc.len;
-        desc_ptr = address_space_map(&address_space_memory, desc.addr,
-                                     &len, false);
+        len = address_space_cache_init(&indirect_desc_cache,
+                                       &address_space_memory,
+                                       desc.addr, desc.len, false);
+        desc_cache = &indirect_desc_cache;
         if (len < desc.len) {
             virtio_error(vdev, "Cannot map indirect buffer");
             return NULL;
@@ -741,7 +749,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 
         max = desc.len / sizeof(VRingDesc);
         i = 0;
-        vring_desc_read(vdev, &desc, desc_ptr, i);
+        vring_desc_read(vdev, &desc, desc_cache, i);
     }
 
     /* Collect all the descriptors */
@@ -772,7 +780,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
             goto err_undo_map;
         }
 
-        rc = virtqueue_read_next_desc(vdev, &desc, desc_ptr, max, &i);
+        rc = virtqueue_read_next_desc(vdev, &desc, desc_cache, max, &i);
     } while (rc == VIRTQUEUE_READ_DESC_MORE);
 
     if (rc == VIRTQUEUE_READ_DESC_ERROR) {
@@ -795,9 +803,11 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 
     trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
 done:
-    if (desc_ptr) {
-        address_space_unmap(&address_space_memory, desc_ptr, len, false, 0);
+    if (desc_cache == &indirect_desc_cache) {
+        address_space_cache_destroy(&indirect_desc_cache);
     }
+    address_space_cache_destroy(&vring_desc_cache);
+    rcu_read_unlock();
 
     return elem;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 09/11] virtio: add MemoryListener to cache ring translations
  2016-12-12 11:18 [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches Paolo Bonzini
                   ` (7 preceding siblings ...)
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 08/11] virtio: use MemoryRegionCache " Paolo Bonzini
@ 2016-12-12 11:18 ` Paolo Bonzini
  2016-12-12 14:24   ` Stefan Hajnoczi
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 10/11] virtio: use VRingMemoryRegionCaches for descriptor ring Paolo Bonzini
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-12-12 11:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, famz, mst, borntraeger

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/virtio/virtio.c         | 91 ++++++++++++++++++++++++++++++++++++++++++++--
 include/hw/virtio/virtio.h |  1 +
 2 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 562e2b7..4f355b4 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -59,6 +59,13 @@ typedef struct VRingUsed
     VRingUsedElem ring[0];
 } VRingUsed;
 
+typedef struct VRingMemoryRegionCaches {
+    struct rcu_head rcu;
+    MemoryRegionCache desc;
+    MemoryRegionCache avail;
+    MemoryRegionCache used;
+} VRingMemoryRegionCaches;
+
 typedef struct VRing
 {
     unsigned int num;
@@ -67,6 +74,7 @@ typedef struct VRing
     hwaddr desc;
     hwaddr avail;
     hwaddr used;
+    VRingMemoryRegionCaches *caches;
 } VRing;
 
 struct VirtQueue
@@ -103,6 +111,46 @@ struct VirtQueue
     QLIST_ENTRY(VirtQueue) node;
 };
 
+static void virtio_free_region_cache(VRingMemoryRegionCaches *caches)
+{
+    address_space_cache_destroy(&caches->desc);
+    address_space_cache_destroy(&caches->avail);
+    address_space_cache_destroy(&caches->used);
+    g_free(caches);
+}
+
+static void virtio_init_region_cache(VirtIODevice *vdev, int i)
+{
+    VirtQueue *vq = &vdev->vq[i];
+    VRingMemoryRegionCaches *old = vq->vring.caches;
+    VRingMemoryRegionCaches *new = g_new0(VRingMemoryRegionCaches, 1);
+    hwaddr addr, size;
+    int event_size;
+
+    event_size = virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
+
+    addr = vq->vring.desc;
+    if (!addr) {
+        return;
+    }
+    size = virtio_queue_get_desc_size(vdev, i);
+    address_space_cache_init(&new->desc, &address_space_memory,
+                             addr, size, false);
+
+    size = virtio_queue_get_used_size(vdev, i) + event_size;
+    address_space_cache_init(&new->used, &address_space_memory,
+                             vq->vring.used, size, true);
+
+    size = virtio_queue_get_avail_size(vdev, i) + event_size;
+    address_space_cache_init(&new->avail, &address_space_memory,
+                             vq->vring.avail, size, false);
+
+    atomic_rcu_set(&vq->vring.caches, new);
+    if (old) {
+        call_rcu(old, virtio_free_region_cache, rcu);
+    }
+}
+
 /* virt queue functions */
 void virtio_queue_update_rings(VirtIODevice *vdev, int n)
 {
@@ -116,6 +164,7 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n)
     vring->used = vring_align(vring->avail +
                               offsetof(VRingAvail, ring[vring->num]),
                               vring->align);
+    virtio_init_region_cache(vdev, n);
 }
 
 static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
@@ -1223,6 +1272,7 @@ void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
     vdev->vq[n].vring.desc = desc;
     vdev->vq[n].vring.avail = avail;
     vdev->vq[n].vring.used = used;
+    virtio_init_region_cache(vdev, n);
 }
 
 void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
@@ -1927,9 +1977,6 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
 void virtio_cleanup(VirtIODevice *vdev)
 {
     qemu_del_vm_change_state_handler(vdev->vmstate);
-    g_free(vdev->config);
-    g_free(vdev->vq);
-    g_free(vdev->vector_queues);
 }
 
 static void virtio_vmstate_change(void *opaque, int running, RunState state)
@@ -2150,6 +2197,19 @@ void GCC_FMT_ATTR(2, 3) virtio_error(VirtIODevice *vdev, const char *fmt, ...)
     }
 }
 
+static void virtio_memory_listener_commit(MemoryListener *listener)
+{
+    VirtIODevice *vdev = container_of(listener, VirtIODevice, listener);
+    int i;
+
+    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
+        if (vdev->vq[i].vring.num == 0) {
+            break;
+        }
+        virtio_init_region_cache(vdev, i);
+    }
+}
+
 static void virtio_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -2172,6 +2232,9 @@ static void virtio_device_realize(DeviceState *dev, Error **errp)
         error_propagate(errp, err);
         return;
     }
+
+    vdev->listener.commit = virtio_memory_listener_commit;
+    memory_listener_register(&vdev->listener, &address_space_memory);
 }
 
 static void virtio_device_unrealize(DeviceState *dev, Error **errp)
@@ -2194,6 +2257,27 @@ static void virtio_device_unrealize(DeviceState *dev, Error **errp)
     vdev->bus_name = NULL;
 }
 
+static void virtio_device_instance_finalize(Object *obj)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(obj);
+    int i;
+
+    memory_listener_unregister(&vdev->listener);
+    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
+        VRingMemoryRegionCaches *caches;
+        if (vdev->vq[i].vring.num == 0) {
+            break;
+        }
+        caches = atomic_read(&vdev->vq[i].vring.caches);
+        atomic_set(&vdev->vq[i].vring.caches, NULL);
+        virtio_free_region_cache(caches);
+    }
+
+    g_free(vdev->config);
+    g_free(vdev->vq);
+    g_free(vdev->vector_queues);
+}
+
 static Property virtio_properties[] = {
     DEFINE_VIRTIO_COMMON_FEATURES(VirtIODevice, host_features),
     DEFINE_PROP_END_OF_LIST(),
@@ -2320,6 +2404,7 @@ static const TypeInfo virtio_device_info = {
     .parent = TYPE_DEVICE,
     .instance_size = sizeof(VirtIODevice),
     .class_init = virtio_device_class_init,
+    .instance_finalize = virtio_device_instance_finalize,
     .abstract = true,
     .class_size = sizeof(VirtioDeviceClass),
 };
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 9b21795..5bcc9a8 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -85,6 +85,7 @@ struct VirtIODevice
     uint32_t generation;
     int nvectors;
     VirtQueue *vq;
+    MemoryListener listener;
     uint16_t device_id;
     bool vm_running;
     bool broken; /* device in invalid state, needs reset */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 10/11] virtio: use VRingMemoryRegionCaches for descriptor ring
  2016-12-12 11:18 [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches Paolo Bonzini
                   ` (8 preceding siblings ...)
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 09/11] virtio: add MemoryListener to cache ring translations Paolo Bonzini
@ 2016-12-12 11:18 ` Paolo Bonzini
  2016-12-12 16:06   ` Stefan Hajnoczi
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 11/11] virtio: use VRingMemoryRegionCaches for avail and used rings Paolo Bonzini
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-12-12 11:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, famz, mst, borntraeger

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/virtio/virtio.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 4f355b4..702da0b 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -484,18 +484,16 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
     unsigned int max, idx;
     unsigned int total_bufs, in_total, out_total;
     MemoryRegionCache *desc_cache = NULL;
-    MemoryRegionCache vring_desc_cache;
     MemoryRegionCache indirect_desc_cache;
+    VRingMemoryRegionCaches *caches;
     int64_t len = 0;
     int rc;
 
     rcu_read_lock();
     idx = vq->last_avail_idx;
     max = vq->vring.num;
-    len = address_space_cache_init(&vring_desc_cache, &address_space_memory,
-                                   vq->vring.desc, max * sizeof(VRingDesc),
-                                   false);
-    if (len < max * sizeof(VRingDesc)) {
+    caches = atomic_rcu_read(&vq->vring.caches);
+    if (caches->desc.len < max * sizeof(VRingDesc)) {
         virtio_error(vdev, "Cannot map descriptor ring");
         goto err;
     }
@@ -512,7 +510,7 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
             goto err;
         }
 
-        desc_cache = &vring_desc_cache;
+        desc_cache = &caches->desc;
         vring_desc_read(vdev, &desc, desc_cache, i);
 
         if (desc.flags & VRING_DESC_F_INDIRECT) {
@@ -578,7 +576,6 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
     }
 
 done:
-    address_space_cache_destroy(&vring_desc_cache);
     if (in_bytes) {
         *in_bytes = in_total;
     }
@@ -729,7 +726,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     unsigned int i, head, max;
     MemoryRegionCache *desc_cache = NULL;
     MemoryRegionCache indirect_desc_cache;
-    MemoryRegionCache vring_desc_cache;
+    VRingMemoryRegionCaches *caches;
     int64_t len;
     VirtIODevice *vdev = vq->vdev;
     VirtQueueElement *elem = NULL;
@@ -770,15 +767,13 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     i = head;
 
     rcu_read_lock();
-    len = address_space_cache_init(&vring_desc_cache, &address_space_memory,
-                                   vq->vring.desc, max * sizeof(VRingDesc),
-                                   false);
-    desc_cache = &vring_desc_cache;
-    if (len < max * sizeof(VRingDesc)) {
+    caches = atomic_rcu_read(&vq->vring.caches);
+    if (caches->desc.len < max * sizeof(VRingDesc)) {
         virtio_error(vdev, "Cannot map descriptor ring");
         return NULL;
     }
 
+    desc_cache = &caches->desc;
     vring_desc_read(vdev, &desc, desc_cache, i);
     if (desc.flags & VRING_DESC_F_INDIRECT) {
         if (desc.len % sizeof(VRingDesc)) {
@@ -855,7 +850,6 @@ done:
     if (desc_cache == &indirect_desc_cache) {
         address_space_cache_destroy(&indirect_desc_cache);
     }
-    address_space_cache_destroy(&vring_desc_cache);
     rcu_read_unlock();
 
     return elem;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 11/11] virtio: use VRingMemoryRegionCaches for avail and used rings
  2016-12-12 11:18 [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches Paolo Bonzini
                   ` (9 preceding siblings ...)
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 10/11] virtio: use VRingMemoryRegionCaches for descriptor ring Paolo Bonzini
@ 2016-12-12 11:18 ` Paolo Bonzini
  2016-12-12 16:08   ` Stefan Hajnoczi
  2016-12-12 16:11 ` [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches Stefan Hajnoczi
  2016-12-13 12:56 ` Christian Borntraeger
  12 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-12-12 11:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, famz, mst, borntraeger

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/net/virtio-net.c |  14 +++++-
 hw/virtio/virtio.c  | 137 +++++++++++++++++++++++++++++++++++++---------------
 2 files changed, 111 insertions(+), 40 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 5009533..c9f8c1c 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1101,7 +1101,8 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size)
     return 0;
 }
 
-static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
+                                      size_t size)
 {
     VirtIONet *n = qemu_get_nic_opaque(nc);
     VirtIONetQueue *q = virtio_net_get_subqueue(nc);
@@ -1204,6 +1205,17 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
     return size;
 }
 
+static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf,
+                                  size_t size)
+{
+    ssize_t r;
+
+    rcu_read_lock();
+    r = virtio_net_receive_rcu(nc, buf, size);
+    rcu_read_unlock();
+    return r;
+}
+
 static int32_t virtio_net_flush_tx(VirtIONetQueue *q);
 
 static void virtio_net_tx_complete(NetClientState *nc, ssize_t len)
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 702da0b..eb7ef83 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -167,6 +167,7 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n)
     virtio_init_region_cache(vdev, n);
 }
 
+/* Called within rcu_read_lock().  */
 static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
                             MemoryRegionCache *cache, int i)
 {
@@ -178,88 +179,109 @@ static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
     virtio_tswap16s(vdev, &desc->next);
 }
 
+/* Called within rcu_read_lock().  */
 static inline uint16_t vring_avail_flags(VirtQueue *vq)
 {
-    hwaddr pa;
-    pa = vq->vring.avail + offsetof(VRingAvail, flags);
-    return virtio_lduw_phys(vq->vdev, pa);
+    VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
+    hwaddr pa = offsetof(VRingAvail, flags);
+    return virtio_lduw_phys_cached(vq->vdev, &caches->avail, pa);
 }
 
+/* Called within rcu_read_lock().  */
 static inline uint16_t vring_avail_idx(VirtQueue *vq)
 {
-    hwaddr pa;
-    pa = vq->vring.avail + offsetof(VRingAvail, idx);
-    vq->shadow_avail_idx = virtio_lduw_phys(vq->vdev, pa);
+    VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
+    hwaddr pa = offsetof(VRingAvail, idx);
+    vq->shadow_avail_idx = virtio_lduw_phys_cached(vq->vdev, &caches->avail, pa);
     return vq->shadow_avail_idx;
 }
 
+/* Called within rcu_read_lock().  */
 static inline uint16_t vring_avail_ring(VirtQueue *vq, int i)
 {
-    hwaddr pa;
-    pa = vq->vring.avail + offsetof(VRingAvail, ring[i]);
-    return virtio_lduw_phys(vq->vdev, pa);
+    VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
+    hwaddr pa = offsetof(VRingAvail, ring[i]);
+    return virtio_lduw_phys_cached(vq->vdev, &caches->avail, pa);
 }
 
+/* Called within rcu_read_lock().  */
 static inline uint16_t vring_get_used_event(VirtQueue *vq)
 {
     return vring_avail_ring(vq, vq->vring.num);
 }
 
+/* Called within rcu_read_lock().  */
 static inline void vring_used_write(VirtQueue *vq, VRingUsedElem *uelem,
                                     int i)
 {
-    hwaddr pa;
+    VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
+    hwaddr pa = offsetof(VRingUsed, ring[i]);
     virtio_tswap32s(vq->vdev, &uelem->id);
     virtio_tswap32s(vq->vdev, &uelem->len);
-    pa = vq->vring.used + offsetof(VRingUsed, ring[i]);
-    address_space_write(&address_space_memory, pa, MEMTXATTRS_UNSPECIFIED,
-                       (void *)uelem, sizeof(VRingUsedElem));
+    address_space_write_cached(&caches->used, pa, uelem, sizeof(VRingUsedElem));
+    address_space_cache_invalidate(&caches->used, pa, sizeof(VRingUsedElem));
 }
 
+/* Called within rcu_read_lock().  */
 static uint16_t vring_used_idx(VirtQueue *vq)
 {
-    hwaddr pa;
-    pa = vq->vring.used + offsetof(VRingUsed, idx);
-    return virtio_lduw_phys(vq->vdev, pa);
+    VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
+    hwaddr pa = vq->vring.used + offsetof(VRingUsed, idx);
+    return virtio_lduw_phys_cached(vq->vdev, &caches->used, pa);
 }
 
+/* Called within rcu_read_lock().  */
 static inline void vring_used_idx_set(VirtQueue *vq, uint16_t val)
 {
-    hwaddr pa;
-    pa = vq->vring.used + offsetof(VRingUsed, idx);
-    virtio_stw_phys(vq->vdev, pa, val);
+    VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
+    hwaddr pa = offsetof(VRingUsed, idx);
+    virtio_stw_phys_cached(vq->vdev, &caches->used, pa, val);
+    address_space_cache_invalidate(&caches->used, pa, sizeof(val));
     vq->used_idx = val;
 }
 
+/* Called within rcu_read_lock().  */
 static inline void vring_used_flags_set_bit(VirtQueue *vq, int mask)
 {
+    VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
     VirtIODevice *vdev = vq->vdev;
-    hwaddr pa;
-    pa = vq->vring.used + offsetof(VRingUsed, flags);
-    virtio_stw_phys(vdev, pa, virtio_lduw_phys(vdev, pa) | mask);
+    hwaddr pa = offsetof(VRingUsed, flags);
+    uint16_t flags = virtio_lduw_phys_cached(vq->vdev, &caches->used, pa);
+
+    virtio_stw_phys_cached(vdev, &caches->used, pa, flags | mask);
+    address_space_cache_invalidate(&caches->used, pa, sizeof(flags));
 }
 
+/* Called within rcu_read_lock().  */
 static inline void vring_used_flags_unset_bit(VirtQueue *vq, int mask)
 {
+    VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
     VirtIODevice *vdev = vq->vdev;
-    hwaddr pa;
-    pa = vq->vring.used + offsetof(VRingUsed, flags);
-    virtio_stw_phys(vdev, pa, virtio_lduw_phys(vdev, pa) & ~mask);
+    hwaddr pa = offsetof(VRingUsed, flags);
+    uint16_t flags = virtio_lduw_phys_cached(vq->vdev, &caches->used, pa);
+
+    virtio_stw_phys_cached(vdev, &caches->used, pa, flags & ~mask);
+    address_space_cache_invalidate(&caches->used, pa, sizeof(flags));
 }
 
+/* Called within rcu_read_lock().  */
 static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
 {
+    VRingMemoryRegionCaches *caches;
     hwaddr pa;
     if (!vq->notification) {
         return;
     }
-    pa = vq->vring.used + offsetof(VRingUsed, ring[vq->vring.num]);
-    virtio_stw_phys(vq->vdev, pa, val);
+
+    caches = atomic_rcu_read(&vq->vring.caches);
+    pa = offsetof(VRingUsed, ring[vq->vring.num]);
+    virtio_stw_phys_cached(vq->vdev, &caches->used, pa, val);
 }
 
 void virtio_queue_set_notification(VirtQueue *vq, int enable)
 {
     vq->notification = enable;
+    rcu_read_lock();
     if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
         vring_set_avail_event(vq, vring_avail_idx(vq));
     } else if (enable) {
@@ -271,6 +293,7 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable)
         /* Expose avail event/used flags before caller checks the avail idx. */
         smp_mb();
     }
+    rcu_read_unlock();
 }
 
 int virtio_queue_ready(VirtQueue *vq)
@@ -279,8 +302,9 @@ int virtio_queue_ready(VirtQueue *vq)
 }
 
 /* Fetch avail_idx from VQ memory only when we really need to know if
- * guest has added some buffers. */
-int virtio_queue_empty(VirtQueue *vq)
+ * guest has added some buffers.
+ * Called within rcu_read_lock().  */
+static int virtio_queue_empty_rcu(VirtQueue *vq)
 {
     if (vq->shadow_avail_idx != vq->last_avail_idx) {
         return 0;
@@ -289,6 +313,20 @@ int virtio_queue_empty(VirtQueue *vq)
     return vring_avail_idx(vq) == vq->last_avail_idx;
 }
 
+int virtio_queue_empty(VirtQueue *vq)
+{
+    bool empty;
+
+    if (vq->shadow_avail_idx != vq->last_avail_idx) {
+        return 0;
+    }
+
+    rcu_read_lock();
+    empty = vring_avail_idx(vq) == vq->last_avail_idx;
+    rcu_read_unlock();
+    return empty;
+}
+
 static void virtqueue_unmap_sg(VirtQueue *vq, const VirtQueueElement *elem,
                                unsigned int len)
 {
@@ -365,6 +403,7 @@ bool virtqueue_rewind(VirtQueue *vq, unsigned int num)
     return true;
 }
 
+/* Called within rcu_read_lock().  */
 void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len, unsigned int idx)
 {
@@ -385,6 +424,7 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
     vring_used_write(vq, &uelem, idx);
 }
 
+/* Called within rcu_read_lock().  */
 void virtqueue_flush(VirtQueue *vq, unsigned int count)
 {
     uint16_t old, new;
@@ -408,10 +448,13 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
 void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len)
 {
+    rcu_read_lock();
     virtqueue_fill(vq, elem, len, 0);
     virtqueue_flush(vq, 1);
+    rcu_read_unlock();
 }
 
+/* Called within rcu_read_lock().  */
 static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
 {
     uint16_t num_heads = vring_avail_idx(vq) - idx;
@@ -431,6 +474,7 @@ static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
     return num_heads;
 }
 
+/* Called within rcu_read_lock().  */
 static bool virtqueue_get_head(VirtQueue *vq, unsigned int idx,
                                unsigned int *head)
 {
@@ -739,8 +783,9 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     if (unlikely(vdev->broken)) {
         return NULL;
     }
-    if (virtio_queue_empty(vq)) {
-        return NULL;
+    rcu_read_lock();
+    if (virtio_queue_empty_rcu(vq)) {
+        goto out_rcu;
     }
     /* Needed after virtio_queue_empty(), see comment in
      * virtqueue_num_heads(). */
@@ -753,11 +798,11 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 
     if (vq->inuse >= vq->vring.num) {
         virtio_error(vdev, "Virtqueue size exceeded");
-        return NULL;
+        goto out_rcu;
     }
 
     if (!virtqueue_get_head(vq, vq->last_avail_idx++, &head)) {
-        return NULL;
+        goto out_rcu;
     }
 
     if (virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
@@ -770,7 +815,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     caches = atomic_rcu_read(&vq->vring.caches);
     if (caches->desc.len < max * sizeof(VRingDesc)) {
         virtio_error(vdev, "Cannot map descriptor ring");
-        return NULL;
+        goto out_rcu;
     }
 
     desc_cache = &caches->desc;
@@ -778,7 +823,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     if (desc.flags & VRING_DESC_F_INDIRECT) {
         if (desc.len % sizeof(VRingDesc)) {
             virtio_error(vdev, "Invalid size for indirect buffer table");
-            return NULL;
+            goto out_rcu;
         }
 
         /* loop over the indirect descriptor table */
@@ -788,7 +833,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
         desc_cache = &indirect_desc_cache;
         if (len < desc.len) {
             virtio_error(vdev, "Cannot map indirect buffer");
-            return NULL;
+            goto out_rcu;
         }
 
         max = desc.len / sizeof(VRingDesc);
@@ -850,12 +895,13 @@ done:
     if (desc_cache == &indirect_desc_cache) {
         address_space_cache_destroy(&indirect_desc_cache);
     }
-    rcu_read_unlock();
 
+    rcu_read_unlock();
     return elem;
 
 err_undo_map:
     virtqueue_undo_map_desc(out_num, in_num, iov);
+out_rcu:
     elem = NULL;
     goto done;
 }
@@ -1426,6 +1472,7 @@ static void virtio_set_isr(VirtIODevice *vdev, int value)
     }
 }
 
+/* Called within rcu_read_lock().  */
 static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
 {
     uint16_t old, new;
@@ -1451,7 +1498,12 @@ static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
 
 void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
 {
-    if (!virtio_should_notify(vdev, vq)) {
+    bool should_notify;
+    rcu_read_lock();
+    should_notify = virtio_should_notify(vdev, vq);
+    rcu_read_unlock();
+
+    if (!should_notify) {
         return;
     }
 
@@ -1478,7 +1530,12 @@ void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
 
 void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
 {
-    if (!virtio_should_notify(vdev, vq)) {
+    bool should_notify;
+    rcu_read_lock();
+    should_notify = virtio_should_notify(vdev, vq);
+    rcu_read_unlock();
+
+    if (!should_notify) {
         return;
     }
 
@@ -1932,6 +1989,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
         }
     }
 
+    rcu_read_lock();
     for (i = 0; i < num; i++) {
         if (vdev->vq[i].vring.desc) {
             uint16_t nheads;
@@ -1964,6 +2022,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
             }
         }
     }
+    rcu_read_unlock();
 
     return 0;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 01/11] exec: optimize remaining address_space_* cases
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 01/11] exec: optimize remaining address_space_* cases Paolo Bonzini
@ 2016-12-12 13:27   ` Stefan Hajnoczi
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2016-12-12 13:27 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, mst, borntraeger

[-- Attachment #1: Type: text/plain, Size: 408 bytes --]

On Mon, Dec 12, 2016 at 12:18:47PM +0100, Paolo Bonzini wrote:
> Do them right before the next patch generalizes them into a multi-included
> file.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  exec.c | 126 +++++++++++++++++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 103 insertions(+), 23 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 02/11] exec: introduce memory_ldst.inc.c
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 02/11] exec: introduce memory_ldst.inc.c Paolo Bonzini
@ 2016-12-12 13:44   ` Stefan Hajnoczi
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2016-12-12 13:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, mst, borntraeger

[-- Attachment #1: Type: text/plain, Size: 703 bytes --]

On Mon, Dec 12, 2016 at 12:18:48PM +0100, Paolo Bonzini wrote:
> Templatize the address_space_* and *_phys functions, so that we can add
> similar functions in the next patch that work with a lightweight version
> of address_space_map/unmap.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  exec.c                    | 681 +-------------------------------------------
>  include/exec/cpu-common.h |  15 -
>  include/exec/memory.h     |  15 +
>  memory_ldst.inc.c         | 709 ++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 734 insertions(+), 686 deletions(-)
>  create mode 100644 memory_ldst.inc.c

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 03/11] exec: introduce address_space_extend_translation
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 03/11] exec: introduce address_space_extend_translation Paolo Bonzini
@ 2016-12-12 13:47   ` Stefan Hajnoczi
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2016-12-12 13:47 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, mst, borntraeger

[-- Attachment #1: Type: text/plain, Size: 411 bytes --]

On Mon, Dec 12, 2016 at 12:18:49PM +0100, Paolo Bonzini wrote:
> This extracts the common part of address_space_map and
> address_space_cache_init into a new function.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  exec.c | 50 +++++++++++++++++++++++++++++---------------------
>  1 file changed, 29 insertions(+), 21 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 04/11] exec: introduce MemoryRegionCache
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 04/11] exec: introduce MemoryRegionCache Paolo Bonzini
@ 2016-12-12 14:06   ` Stefan Hajnoczi
  2016-12-13 13:14     ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Hajnoczi @ 2016-12-12 14:06 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, mst, borntraeger

[-- Attachment #1: Type: text/plain, Size: 1562 bytes --]

On Mon, Dec 12, 2016 at 12:18:50PM +0100, Paolo Bonzini wrote:
> diff --git a/exec.c b/exec.c
> index d4b3656..8d4bb0e 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -3077,6 +3077,82 @@ void cpu_physical_memory_unmap(void *buffer, hwaddr len,
>  #define RCU_READ_UNLOCK(...)     rcu_read_unlock()
>  #include "memory_ldst.inc.c"
>  
> +int64_t address_space_cache_init(MemoryRegionCache *cache,
> +                                 AddressSpace *as,
> +                                 hwaddr addr,
> +                                 hwaddr len,
> +                                 bool is_write)
> +{
> +    hwaddr l, xlat;
> +    MemoryRegion *mr;
> +    void *ptr;
> +
> +    assert(len > 0);
> +
> +    l = len;
> +    mr = address_space_translate(as, addr, &xlat, &l, is_write);
> +    if (!memory_access_is_direct(mr, is_write)) {
> +        return -EINVAL;
> +    }
> +
> +    l = address_space_extend_translation(as, addr, len, mr, xlat, l, is_write);
> +    ptr = qemu_ram_ptr_length(mr->ram_block, xlat, &l);
> +
> +    cache->xlat = xlat;
> +    cache->is_write = is_write;
> +    cache->mr = mr;
> +    cache->ptr = ptr;
> +    cache->len = l;
> +    memory_region_ref(cache->mr);
> +
> +    return l;
> +}

What happens when [addr, addr + len) overlaps a MemoryRegion boundary?
It looks like this function silently truncates the MemoryRegionCache,
leading to an assertion failure in address_space_translate_cached().

Perhaps it would be better to fail address_space_cache_init() if the
length is truncated.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 05/11] virtio: make virtio_should_notify static
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 05/11] virtio: make virtio_should_notify static Paolo Bonzini
@ 2016-12-12 14:07   ` Stefan Hajnoczi
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2016-12-12 14:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, mst, borntraeger

[-- Attachment #1: Type: text/plain, Size: 298 bytes --]

On Mon, Dec 12, 2016 at 12:18:51PM +0100, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/virtio/virtio.c         | 2 +-
>  include/hw/virtio/virtio.h | 1 -
>  2 files changed, 1 insertion(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 06/11] virtio: add virtio_*_phys_cached
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 06/11] virtio: add virtio_*_phys_cached Paolo Bonzini
@ 2016-12-12 14:08   ` Stefan Hajnoczi
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2016-12-12 14:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, mst, borntraeger

[-- Attachment #1: Type: text/plain, Size: 292 bytes --]

On Mon, Dec 12, 2016 at 12:18:52PM +0100, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  include/hw/virtio/virtio-access.h | 52 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 07/11] virtio: use address_space_map/unmap to access descriptors
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 07/11] virtio: use address_space_map/unmap to access descriptors Paolo Bonzini
@ 2016-12-12 14:12   ` Stefan Hajnoczi
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2016-12-12 14:12 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, mst, borntraeger

[-- Attachment #1: Type: text/plain, Size: 309 bytes --]

On Mon, Dec 12, 2016 at 12:18:53PM +0100, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/virtio/virtio.c | 78 +++++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 60 insertions(+), 18 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 08/11] virtio: use MemoryRegionCache to access descriptors
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 08/11] virtio: use MemoryRegionCache " Paolo Bonzini
@ 2016-12-12 14:17   ` Stefan Hajnoczi
  2016-12-13 11:14     ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Hajnoczi @ 2016-12-12 14:17 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, mst, borntraeger

[-- Attachment #1: Type: text/plain, Size: 845 bytes --]

On Mon, Dec 12, 2016 at 12:18:54PM +0100, Paolo Bonzini wrote:
> @@ -430,41 +431,42 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
>                                 unsigned int *out_bytes,
>                                 unsigned max_in_bytes, unsigned max_out_bytes)
>  {
> -    unsigned int idx;
> +    VirtIODevice *vdev = vq->vdev;
> +    unsigned int max, idx;
>      unsigned int total_bufs, in_total, out_total;
> -    void *desc_ptr = NULL;
> -    hwaddr len = 0;
> +    MemoryRegionCache *desc_cache = NULL;
> +    MemoryRegionCache vring_desc_cache;
> +    MemoryRegionCache indirect_desc_cache;
> +    int64_t len = 0;
>      int rc;
>  
> +    rcu_read_lock();

Please document the purpose of the rcu_read_lock() in virtio code.

Also, do the goto err cases call rcu_read_unlock()?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 09/11] virtio: add MemoryListener to cache ring translations
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 09/11] virtio: add MemoryListener to cache ring translations Paolo Bonzini
@ 2016-12-12 14:24   ` Stefan Hajnoczi
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2016-12-12 14:24 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, mst, borntraeger

[-- Attachment #1: Type: text/plain, Size: 1218 bytes --]

On Mon, Dec 12, 2016 at 12:18:55PM +0100, Paolo Bonzini wrote:
> @@ -103,6 +111,46 @@ struct VirtQueue
>      QLIST_ENTRY(VirtQueue) node;
>  };
>  
> +static void virtio_free_region_cache(VRingMemoryRegionCaches *caches)
> +{
> +    address_space_cache_destroy(&caches->desc);
> +    address_space_cache_destroy(&caches->avail);
> +    address_space_cache_destroy(&caches->used);
> +    g_free(caches);
> +}
> +
> +static void virtio_init_region_cache(VirtIODevice *vdev, int i)

s/int i/int n/ - it seems to be the convention for virtqueue numbers.

> +{
> +    VirtQueue *vq = &vdev->vq[i];
> +    VRingMemoryRegionCaches *old = vq->vring.caches;
> +    VRingMemoryRegionCaches *new = g_new0(VRingMemoryRegionCaches, 1);
> +    hwaddr addr, size;
> +    int event_size;
> +
> +    event_size = virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
> +
> +    addr = vq->vring.desc;
> +    if (!addr) {
> +        return;
> +    }
> +    size = virtio_queue_get_desc_size(vdev, i);
> +    address_space_cache_init(&new->desc, &address_space_memory,
> +                             addr, size, false);

Missing error handling in case address_space_cache_init() fails.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 10/11] virtio: use VRingMemoryRegionCaches for descriptor ring
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 10/11] virtio: use VRingMemoryRegionCaches for descriptor ring Paolo Bonzini
@ 2016-12-12 16:06   ` Stefan Hajnoczi
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2016-12-12 16:06 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, mst, borntraeger

[-- Attachment #1: Type: text/plain, Size: 1279 bytes --]

On Mon, Dec 12, 2016 at 12:18:56PM +0100, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/virtio/virtio.c | 22 ++++++++--------------
>  1 file changed, 8 insertions(+), 14 deletions(-)
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 4f355b4..702da0b 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -484,18 +484,16 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
>      unsigned int max, idx;
>      unsigned int total_bufs, in_total, out_total;
>      MemoryRegionCache *desc_cache = NULL;
> -    MemoryRegionCache vring_desc_cache;
>      MemoryRegionCache indirect_desc_cache;
> +    VRingMemoryRegionCaches *caches;
>      int64_t len = 0;
>      int rc;
>  
>      rcu_read_lock();
>      idx = vq->last_avail_idx;
>      max = vq->vring.num;
> -    len = address_space_cache_init(&vring_desc_cache, &address_space_memory,
> -                                   vq->vring.desc, max * sizeof(VRingDesc),
> -                                   false);
> -    if (len < max * sizeof(VRingDesc)) {
> +    caches = atomic_rcu_read(&vq->vring.caches);

Now the rcu_read_lock() above makes sense :).

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 11/11] virtio: use VRingMemoryRegionCaches for avail and used rings
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 11/11] virtio: use VRingMemoryRegionCaches for avail and used rings Paolo Bonzini
@ 2016-12-12 16:08   ` Stefan Hajnoczi
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2016-12-12 16:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, mst, borntraeger

[-- Attachment #1: Type: text/plain, Size: 347 bytes --]

On Mon, Dec 12, 2016 at 12:18:57PM +0100, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/net/virtio-net.c |  14 +++++-
>  hw/virtio/virtio.c  | 137 +++++++++++++++++++++++++++++++++++++---------------
>  2 files changed, 111 insertions(+), 40 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches
  2016-12-12 11:18 [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches Paolo Bonzini
                   ` (10 preceding siblings ...)
  2016-12-12 11:18 ` [Qemu-devel] [PATCH 11/11] virtio: use VRingMemoryRegionCaches for avail and used rings Paolo Bonzini
@ 2016-12-12 16:11 ` Stefan Hajnoczi
  2016-12-13 12:56 ` Christian Borntraeger
  12 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2016-12-12 16:11 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, mst, borntraeger

[-- Attachment #1: Type: text/plain, Size: 1010 bytes --]

On Mon, Dec 12, 2016 at 12:18:46PM +0100, Paolo Bonzini wrote:
> It is known that virtio's usage of ld_*_phys and st_*_phys functions
> wastes time in address_space_translate, visiting the
> AddressSpaceDispatch's radix tree.
> 
> This series introduces a small cache that changes these functions
> to a simple range check and memory access.  The effect is a bit
> underwhelming, because the improvement is only 1-2Kiops/second.
> Nevertheless I'm throwing out the patches so that for example they
> can be tested on s390.
> 
> Things to fix: handle address_space_cache_init failures
> in virtio_init_region_cache.  Also, once virtio breaks free of
> address_space_memoory, we'll need to handle invalidation in IOMMU regions.
> For the latter, maybe it's worth introducing a new abstraction that
> is higher-level than MemoryListener and covers both regular and IOMMU
> memory regions.

I think this new API will be important for getting the best
performance in emulated devices.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 08/11] virtio: use MemoryRegionCache to access descriptors
  2016-12-12 14:17   ` Stefan Hajnoczi
@ 2016-12-13 11:14     ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2016-12-13 11:14 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: borntraeger, famz, qemu-devel, mst



On 12/12/2016 15:17, Stefan Hajnoczi wrote:
>>      unsigned int total_bufs, in_total, out_total;
>> -    void *desc_ptr = NULL;
>> -    hwaddr len = 0;
>> +    MemoryRegionCache *desc_cache = NULL;
>> +    MemoryRegionCache vring_desc_cache;
>> +    MemoryRegionCache indirect_desc_cache;
>> +    int64_t len = 0;
>>      int rc;
>>  
>> +    rcu_read_lock();
>
> Please document the purpose of the rcu_read_lock() in virtio code.

Wrong patch, this should be in patch 10 (and then the answer is simply
that VRingMemoryRegionCaches is protected by RCU).

> Also, do the goto err cases call rcu_read_unlock()?

Yes, it does; the err label ends with a "goto done".

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches
  2016-12-12 11:18 [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches Paolo Bonzini
                   ` (11 preceding siblings ...)
  2016-12-12 16:11 ` [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches Stefan Hajnoczi
@ 2016-12-13 12:56 ` Christian Borntraeger
  12 siblings, 0 replies; 27+ messages in thread
From: Christian Borntraeger @ 2016-12-13 12:56 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: famz, stefanha, mst

On 12/12/2016 12:18 PM, Paolo Bonzini wrote:
> It is known that virtio's usage of ld_*_phys and st_*_phys functions
> wastes time in address_space_translate, visiting the
> AddressSpaceDispatch's radix tree.
> 
> This series introduces a small cache that changes these functions
> to a simple range check and memory access.  The effect is a bit
> underwhelming, because the improvement is only 1-2Kiops/second.
> Nevertheless I'm throwing out the patches so that for example they
> can be tested on s390.

It does seem to save some cpu cycles and by that it gives some benefit
if you are cpu bound (e.g. with just one host CPU I get ~4% benefit in 
terms of throughput)

> 
> Things to fix: handle address_space_cache_init failures
> in virtio_init_region_cache.  Also, once virtio breaks free of
> address_space_memoory, we'll need to handle invalidation in IOMMU regions.
> For the latter, maybe it's worth introducing a new abstraction that
> is higher-level than MemoryListener and covers both regular and IOMMU
> memory regions.
> 
> Paolo
> 
> Paolo Bonzini (11):
>   exec: optimize remaining address_space_* cases
>   exec: introduce memory_ldst.inc.c
>   exec: introduce address_space_extend_translation
>   exec: introduce MemoryRegionCache
>   virtio: make virtio_should_notify static
>   virtio: add virtio_*_phys_cached
>   virtio: use address_space_map/unmap to access descriptors
>   virtio: use MemoryRegionCache to access descriptors
>   virtio: add MemoryListener to cache ring translations
>   virtio: use VRingMemoryRegionCaches for descriptor ring
>   virtio: use VRingMemoryRegionCaches for avail and used rings
> 
>  exec.c                            | 687 +++++-------------------------------
>  hw/net/virtio-net.c               |  14 +-
>  hw/virtio/virtio.c                | 322 +++++++++++++----
>  include/exec/cpu-all.h            |  23 ++
>  include/exec/cpu-common.h         |  15 -
>  include/exec/memory.h             | 166 +++++++++
>  include/hw/virtio/virtio-access.h |  52 +++
>  include/hw/virtio/virtio.h        |   2 +-
>  include/qemu/typedefs.h           |   1 +
>  memory_ldst.inc.c                 | 709 ++++++++++++++++++++++++++++++++++++++
>  10 files changed, 1316 insertions(+), 675 deletions(-)
>  create mode 100644 memory_ldst.inc.c
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 04/11] exec: introduce MemoryRegionCache
  2016-12-12 14:06   ` Stefan Hajnoczi
@ 2016-12-13 13:14     ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2016-12-13 13:14 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: borntraeger, famz, qemu-devel, mst



On 12/12/2016 15:06, Stefan Hajnoczi wrote:
> On Mon, Dec 12, 2016 at 12:18:50PM +0100, Paolo Bonzini wrote:
>> diff --git a/exec.c b/exec.c
>> index d4b3656..8d4bb0e 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -3077,6 +3077,82 @@ void cpu_physical_memory_unmap(void *buffer, hwaddr len,
>>  #define RCU_READ_UNLOCK(...)     rcu_read_unlock()
>>  #include "memory_ldst.inc.c"
>>  
>> +int64_t address_space_cache_init(MemoryRegionCache *cache,
>> +                                 AddressSpace *as,
>> +                                 hwaddr addr,
>> +                                 hwaddr len,
>> +                                 bool is_write)
>> +{
>> +    hwaddr l, xlat;
>> +    MemoryRegion *mr;
>> +    void *ptr;
>> +
>> +    assert(len > 0);
>> +
>> +    l = len;
>> +    mr = address_space_translate(as, addr, &xlat, &l, is_write);
>> +    if (!memory_access_is_direct(mr, is_write)) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    l = address_space_extend_translation(as, addr, len, mr, xlat, l, is_write);
>> +    ptr = qemu_ram_ptr_length(mr->ram_block, xlat, &l);
>> +
>> +    cache->xlat = xlat;
>> +    cache->is_write = is_write;
>> +    cache->mr = mr;
>> +    cache->ptr = ptr;
>> +    cache->len = l;
>> +    memory_region_ref(cache->mr);
>> +
>> +    return l;
>> +}
> 
> What happens when [addr, addr + len) overlaps a MemoryRegion boundary?
> It looks like this function silently truncates the MemoryRegionCache,

Yes, this is what address_space_map does.  It's up to the caller to
decide what to do.

Patch 8 ("virtio: use MemoryRegionCache to access descriptors") does it
right.  As you noted, patch 9 doesn't check for errors at all---that's
part of why this is RFC.

Paolo

> leading to an assertion failure in address_space_translate_cached().
> 
> Perhaps it would be better to fail address_space_cache_init() if the
> length is truncated.
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-12-13 13:15 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-12 11:18 [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches Paolo Bonzini
2016-12-12 11:18 ` [Qemu-devel] [PATCH 01/11] exec: optimize remaining address_space_* cases Paolo Bonzini
2016-12-12 13:27   ` Stefan Hajnoczi
2016-12-12 11:18 ` [Qemu-devel] [PATCH 02/11] exec: introduce memory_ldst.inc.c Paolo Bonzini
2016-12-12 13:44   ` Stefan Hajnoczi
2016-12-12 11:18 ` [Qemu-devel] [PATCH 03/11] exec: introduce address_space_extend_translation Paolo Bonzini
2016-12-12 13:47   ` Stefan Hajnoczi
2016-12-12 11:18 ` [Qemu-devel] [PATCH 04/11] exec: introduce MemoryRegionCache Paolo Bonzini
2016-12-12 14:06   ` Stefan Hajnoczi
2016-12-13 13:14     ` Paolo Bonzini
2016-12-12 11:18 ` [Qemu-devel] [PATCH 05/11] virtio: make virtio_should_notify static Paolo Bonzini
2016-12-12 14:07   ` Stefan Hajnoczi
2016-12-12 11:18 ` [Qemu-devel] [PATCH 06/11] virtio: add virtio_*_phys_cached Paolo Bonzini
2016-12-12 14:08   ` Stefan Hajnoczi
2016-12-12 11:18 ` [Qemu-devel] [PATCH 07/11] virtio: use address_space_map/unmap to access descriptors Paolo Bonzini
2016-12-12 14:12   ` Stefan Hajnoczi
2016-12-12 11:18 ` [Qemu-devel] [PATCH 08/11] virtio: use MemoryRegionCache " Paolo Bonzini
2016-12-12 14:17   ` Stefan Hajnoczi
2016-12-13 11:14     ` Paolo Bonzini
2016-12-12 11:18 ` [Qemu-devel] [PATCH 09/11] virtio: add MemoryListener to cache ring translations Paolo Bonzini
2016-12-12 14:24   ` Stefan Hajnoczi
2016-12-12 11:18 ` [Qemu-devel] [PATCH 10/11] virtio: use VRingMemoryRegionCaches for descriptor ring Paolo Bonzini
2016-12-12 16:06   ` Stefan Hajnoczi
2016-12-12 11:18 ` [Qemu-devel] [PATCH 11/11] virtio: use VRingMemoryRegionCaches for avail and used rings Paolo Bonzini
2016-12-12 16:08   ` Stefan Hajnoczi
2016-12-12 16:11 ` [Qemu-devel] [RFC PATCH 00/11] speedup vring processing with MemoryRegionCaches Stefan Hajnoczi
2016-12-13 12:56 ` Christian Borntraeger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.