All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH qemu 0/4] memory: Reduce memory use
@ 2017-09-07  9:20 Alexey Kardashevskiy
  2017-09-07  9:20 ` [Qemu-devel] [RFC PATCH qemu 1/4] memory: Postpone flatview and dispatch tree building till all devices are added Alexey Kardashevskiy
                   ` (4 more replies)
  0 siblings, 5 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-09-07  9:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, David Gibson, Paolo Bonzini,
	Stefan Hajnoczi, Peter Maydell

This was inspired by https://bugzilla.redhat.com/show_bug.cgi?id=1481593

What happens ithere is that every virtio block device creates 2 address
spaces - for modern config space (called "virtio-pci-cfg-as") and
for busmaster (common pci thing, called after the device name,
in my case "virtio-blk-pci").

Each address_space_init() updates topology for every address space.
Every topology update (address_space_update_topology()) creates a new
dispatch tree - AddressSpaceDispatch with nodes (1KB) and
sections (48KB) and destroys the old one.

However the dispatch destructor is postponed via RCU which does not
get a chance to execute until the machine is initialized but before
we get there, memory is not returned to the pool, and this is a lot
of memory which grows n^2.

These patches are trying to address the memory use and boot time
issues but tbh only the first one provides visible outcome.

There are still things to polish and double check the use of RCU,
I'd like to get any feedback before proceeding - is this going
the right way or way too ugly?


This is based on sha1
1ab5eb4efb Peter Maydell "Update version for v2.10.0 release".

Please comment. Thanks.



Alexey Kardashevskiy (4):
  memory: Postpone flatview and dispatch tree building till all devices
    are added
  memory: Prepare for shared flat views
  memory: Share flat views and dispatch trees between address spaces
  memory: Add flat views to HMP "info mtree"

 include/exec/memory-internal.h |   6 +-
 include/exec/memory.h          |  93 +++++++++----
 exec.c                         | 242 +++++++++++++++++++--------------
 hw/alpha/typhoon.c             |   2 +-
 hw/dma/rc4030.c                |   4 +-
 hw/i386/amd_iommu.c            |   2 +-
 hw/i386/intel_iommu.c          |   9 +-
 hw/intc/openpic_kvm.c          |   2 +-
 hw/pci-host/apb.c              |   2 +-
 hw/pci/pci.c                   |   3 +-
 hw/ppc/spapr_iommu.c           |   4 +-
 hw/s390x/s390-pci-bus.c        |   2 +-
 hw/vfio/common.c               |   6 +-
 hw/virtio/vhost.c              |   6 +-
 memory.c                       | 299 +++++++++++++++++++++++++++--------------
 monitor.c                      |   3 +-
 vl.c                           |   4 +
 hmp-commands-info.hx           |   7 +-
 18 files changed, 448 insertions(+), 248 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC PATCH qemu 1/4] memory: Postpone flatview and dispatch tree building till all devices are added
  2017-09-07  9:20 [Qemu-devel] [RFC PATCH qemu 0/4] memory: Reduce memory use Alexey Kardashevskiy
@ 2017-09-07  9:20 ` Alexey Kardashevskiy
  2017-09-07  9:30   ` Peter Maydell
  2017-09-07  9:20 ` [Qemu-devel] [RFC PATCH qemu 2/4] memory: Prepare for shared flat views Alexey Kardashevskiy
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-09-07  9:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, David Gibson, Paolo Bonzini,
	Stefan Hajnoczi, Peter Maydell

Most devices use at least one address space and every time a new address
space is added, flat views and dispatch trees are rebuild for all address
spaces. This is not a problem for a relatively small amount of devices but
even 50 virtio-pci devices use more than 8GB of RAM.

What happens that on every flatview/dispatch rebuild, new arrays are
allocated and old ones release but the release is done via RCU so until
an entire machine is build, they are not released.

This wraps devices creation into memory_region_transaction_begin/commit
to massively reduce amount of flat view/dispatch tree (re)allocations.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 vl.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/vl.c b/vl.c
index 8e247cc2a2..3c39cc8b3a 100644
--- a/vl.c
+++ b/vl.c
@@ -4655,12 +4655,16 @@ int main(int argc, char **argv, char **envp)
     igd_gfx_passthru();
 
     /* init generic devices */
+    memory_region_transaction_begin();
+
     rom_set_order_override(FW_CFG_ORDER_OVERRIDE_DEVICE);
     if (qemu_opts_foreach(qemu_find_opts("device"),
                           device_init_func, NULL, NULL)) {
         exit(1);
     }
 
+    memory_region_transaction_commit();
+
     cpu_synchronize_all_post_init();
 
     rom_reset_order_override();
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC PATCH qemu 2/4] memory: Prepare for shared flat views
  2017-09-07  9:20 [Qemu-devel] [RFC PATCH qemu 0/4] memory: Reduce memory use Alexey Kardashevskiy
  2017-09-07  9:20 ` [Qemu-devel] [RFC PATCH qemu 1/4] memory: Postpone flatview and dispatch tree building till all devices are added Alexey Kardashevskiy
@ 2017-09-07  9:20 ` Alexey Kardashevskiy
  2017-09-09  7:18   ` David Gibson
  2017-09-07  9:20 ` [Qemu-devel] [RFC PATCH qemu 3/4] memory: Share flat views and dispatch trees between address spaces Alexey Kardashevskiy
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-09-07  9:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, David Gibson, Paolo Bonzini,
	Stefan Hajnoczi, Peter Maydell

We are going to share flat views and dispatch trees between address
spaces. This moves bits around but no change in behaviour is expected
here. The following patch will implement sharing.

This switches from AddressSpace to AddressSpaceDispatch as in many
places this is what is used actually. Also, since multiple address
spaces will be sharing a flat view, MemoryRegionSection cannot store
the address space pointer, this changes it to AddressSpaceDispatch
as well. As a not very obvious result, IOMMUTLBEntry also switched
from AddressSpace to AddressSpaceDispatch to make
address_space_get_iotlb_entry() work.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 include/exec/memory.h   |  79 +++++++++++++++++++++++-------
 exec.c                  | 128 +++++++++++++++++++++++++++---------------------
 hw/alpha/typhoon.c      |   2 +-
 hw/dma/rc4030.c         |   4 +-
 hw/i386/amd_iommu.c     |   2 +-
 hw/i386/intel_iommu.c   |   9 ++--
 hw/intc/openpic_kvm.c   |   2 +-
 hw/pci-host/apb.c       |   2 +-
 hw/ppc/spapr_iommu.c    |   4 +-
 hw/s390x/s390-pci-bus.c |   2 +-
 hw/vfio/common.c        |   6 +--
 hw/virtio/vhost.c       |   6 +--
 memory.c                |  26 ++++++----
 13 files changed, 170 insertions(+), 102 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 400dd4491b..83e82e90d5 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -48,6 +48,7 @@
 
 typedef struct MemoryRegionOps MemoryRegionOps;
 typedef struct MemoryRegionMmio MemoryRegionMmio;
+typedef struct AddressSpaceDispatch AddressSpaceDispatch;
 
 struct MemoryRegionMmio {
     CPUReadMemoryFunc *read[3];
@@ -67,7 +68,7 @@ typedef enum {
 #define IOMMU_ACCESS_FLAG(r, w) (((r) ? IOMMU_RO : 0) | ((w) ? IOMMU_WO : 0))
 
 struct IOMMUTLBEntry {
-    AddressSpace    *target_as;
+    AddressSpaceDispatch *target_dispatch;
     hwaddr           iova;
     hwaddr           translated_addr;
     hwaddr           addr_mask;  /* 0xfff = 4k translation */
@@ -310,6 +311,8 @@ struct MemoryListener {
     QTAILQ_ENTRY(MemoryListener) link_as;
 };
 
+AddressSpaceDispatch *address_space_to_dispatch(AddressSpace *as);
+
 /**
  * AddressSpace: describes a mapping of addresses to #MemoryRegion objects
  */
@@ -346,7 +349,7 @@ struct AddressSpace {
  */
 struct MemoryRegionSection {
     MemoryRegion *mr;
-    AddressSpace *address_space;
+    AddressSpaceDispatch *dispatch;
     hwaddr offset_within_region;
     Int128 size;
     hwaddr offset_within_address_space;
@@ -1635,10 +1638,19 @@ void address_space_destroy(AddressSpace *as);
  * @buf: buffer with the data transferred
  * @is_write: indicates the transfer direction
  */
-MemTxResult address_space_rw(AddressSpace *as, hwaddr addr,
+MemTxResult address_space_dispatch_rw(AddressSpaceDispatch *d, hwaddr addr,
                              MemTxAttrs attrs, uint8_t *buf,
                              int len, bool is_write);
 
+static inline MemTxResult address_space_rw(AddressSpace *as, hwaddr addr,
+                                           MemTxAttrs attrs, uint8_t *buf,
+                                           int len, bool is_write)
+{
+    return address_space_dispatch_rw(address_space_to_dispatch(as),
+                                     addr, attrs, buf, len, is_write);
+}
+
+
 /**
  * address_space_write: write to address space.
  *
@@ -1651,10 +1663,18 @@ MemTxResult address_space_rw(AddressSpace *as, hwaddr addr,
  * @attrs: memory transaction attributes
  * @buf: buffer with the data transferred
  */
-MemTxResult address_space_write(AddressSpace *as, hwaddr addr,
+MemTxResult address_space_dispatch_write(AddressSpaceDispatch *d, hwaddr addr,
                                 MemTxAttrs attrs,
                                 const uint8_t *buf, int len);
 
+static inline MemTxResult address_space_write(AddressSpace *as, hwaddr addr,
+                                MemTxAttrs attrs,
+                                const uint8_t *buf, int len)
+{
+    return address_space_dispatch_write(address_space_to_dispatch(as),
+                                        addr, attrs, buf, len);
+}
+
 /* address_space_ld*: load from an address space
  * address_space_st*: store to an address space
  *
@@ -1840,8 +1860,8 @@ void stq_be_phys_cached(MemoryRegionCache *cache, hwaddr addr, uint64_t val);
 /* address_space_get_iotlb_entry: translate an address into an IOTLB
  * entry. Should be called from an RCU critical section.
  */
-IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
-                                            bool is_write);
+IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpaceDispatch *d,
+                                            hwaddr addr, bool is_write);
 
 /* address_space_translate: translate an address range into an address space
  * into a MemoryRegion and an address range into that section.  Should be
@@ -1855,9 +1875,17 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
  * @len: pointer to length
  * @is_write: indicates the transfer direction
  */
-MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
-                                      hwaddr *xlat, hwaddr *len,
-                                      bool is_write);
+MemoryRegion *address_space_dispatch_translate(AddressSpaceDispatch *d,
+                                               hwaddr addr, hwaddr *xlat,
+                                               hwaddr *len, bool is_write);
+
+static inline MemoryRegion *address_space_translate(AddressSpace *as,
+                                                    hwaddr addr, hwaddr *xlat,
+                                                    hwaddr *len, bool is_write)
+{
+    return address_space_dispatch_translate(address_space_to_dispatch(as),
+                                            addr, xlat, len, is_write);
+}
 
 /* address_space_access_valid: check for validity of accessing an address
  * space range
@@ -1874,7 +1902,15 @@ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
  * @len: length of the area to be checked
  * @is_write: indicates the transfer direction
  */
-bool address_space_access_valid(AddressSpace *as, hwaddr addr, int len, bool is_write);
+bool address_space_dispatch_access_valid(AddressSpaceDispatch *d, hwaddr addr,
+                                         int len, bool is_write);
+
+static inline bool address_space_access_valid(AddressSpace *as, hwaddr addr,
+                                              int len, bool is_write)
+{
+    return address_space_dispatch_access_valid(address_space_to_dispatch(as),
+                                               addr, len, is_write);
+}
 
 /* address_space_map: map a physical memory region into a host virtual address
  *
@@ -1908,11 +1944,11 @@ void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
 
 
 /* Internal functions, part of the implementation of address_space_read.  */
-MemTxResult address_space_read_continue(AddressSpace *as, hwaddr addr,
+MemTxResult address_space_read_continue(AddressSpaceDispatch *d, hwaddr addr,
                                         MemTxAttrs attrs, uint8_t *buf,
                                         int len, hwaddr addr1, hwaddr l,
 					MemoryRegion *mr);
-MemTxResult address_space_read_full(AddressSpace *as, hwaddr addr,
+MemTxResult address_space_read_full(AddressSpaceDispatch *d, hwaddr addr,
                                     MemTxAttrs attrs, uint8_t *buf, int len);
 void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t addr);
 
@@ -1940,8 +1976,9 @@ static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
  * @buf: buffer with the data transferred
  */
 static inline __attribute__((__always_inline__))
-MemTxResult address_space_read(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
-                               uint8_t *buf, int len)
+MemTxResult address_space_dispatch_read(AddressSpaceDispatch *d, hwaddr addr,
+                                        MemTxAttrs attrs, uint8_t *buf,
+                                        int len)
 {
     MemTxResult result = MEMTX_OK;
     hwaddr l, addr1;
@@ -1952,22 +1989,30 @@ MemTxResult address_space_read(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
         if (len) {
             rcu_read_lock();
             l = len;
-            mr = address_space_translate(as, addr, &addr1, &l, false);
+            mr = address_space_dispatch_translate(d, addr, &addr1, &l, false);
             if (len == l && memory_access_is_direct(mr, false)) {
                 ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
                 memcpy(buf, ptr, len);
             } else {
-                result = address_space_read_continue(as, addr, attrs, buf, len,
+                result = address_space_read_continue(d, addr, attrs, buf, len,
                                                      addr1, l, mr);
             }
             rcu_read_unlock();
         }
     } else {
-        result = address_space_read_full(as, addr, attrs, buf, len);
+        result = address_space_read_full(d, addr, attrs, buf, len);
     }
     return result;
 }
 
+static inline MemTxResult address_space_read(AddressSpace *as, hwaddr addr,
+                                             MemTxAttrs attrs, uint8_t *buf,
+                                             int len)
+{
+    return address_space_dispatch_read(address_space_to_dispatch(as),
+                                       addr, attrs, buf, len);
+}
+
 /**
  * address_space_read_cached: read from a cached RAM region
  *
diff --git a/exec.c b/exec.c
index d20c34ca83..66f01f5fce 100644
--- a/exec.c
+++ b/exec.c
@@ -199,10 +199,12 @@ struct AddressSpaceDispatch {
     AddressSpace *as;
 };
 
+typedef struct AddressSpaceDispatch AddressSpaceDispatch;
+
 #define SUBPAGE_IDX(addr) ((addr) & ~TARGET_PAGE_MASK)
 typedef struct subpage_t {
     MemoryRegion iomem;
-    AddressSpace *as;
+    AddressSpaceDispatch *d;
     hwaddr base;
     uint16_t sub_section[];
 } subpage_t;
@@ -238,6 +240,11 @@ struct DirtyBitmapSnapshot {
     unsigned long dirty[];
 };
 
+AddressSpaceDispatch *address_space_to_dispatch(AddressSpace *as)
+{
+    return atomic_rcu_read(&as->dispatch);
+}
+
 #endif
 
 #if !defined(CONFIG_USER_ONLY)
@@ -437,8 +444,9 @@ static MemoryRegionSection *address_space_lookup_region(AddressSpaceDispatch *d,
 
 /* Called from RCU critical section */
 static MemoryRegionSection *
-address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr *xlat,
-                                 hwaddr *plen, bool resolve_subpage)
+address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr,
+                                 hwaddr *xlat, hwaddr *plen,
+                                 bool resolve_subpage)
 {
     MemoryRegionSection *section;
     MemoryRegion *mr;
@@ -472,7 +480,7 @@ address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr *x
 }
 
 /* Called from RCU critical section */
-static MemoryRegionSection address_space_do_translate(AddressSpace *as,
+static MemoryRegionSection address_space_do_translate(AddressSpaceDispatch *d,
                                                       hwaddr addr,
                                                       hwaddr *xlat,
                                                       hwaddr *plen,
@@ -485,8 +493,8 @@ static MemoryRegionSection address_space_do_translate(AddressSpace *as,
     IOMMUMemoryRegionClass *imrc;
 
     for (;;) {
-        AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
-        section = address_space_translate_internal(d, addr, &addr, plen, is_mmio);
+        section = address_space_translate_internal(d, addr, &addr,
+                                                   plen, is_mmio);
 
         iommu_mr = memory_region_get_iommu(section->mr);
         if (!iommu_mr) {
@@ -503,7 +511,7 @@ static MemoryRegionSection address_space_do_translate(AddressSpace *as,
             goto translate_fail;
         }
 
-        as = iotlb.target_as;
+        d = atomic_rcu_read(&iotlb.target_dispatch);
     }
 
     *xlat = addr;
@@ -515,8 +523,8 @@ translate_fail:
 }
 
 /* Called from RCU critical section */
-IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
-                                            bool is_write)
+IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpaceDispatch *d,
+                                            hwaddr addr, bool is_write)
 {
     MemoryRegionSection section;
     hwaddr xlat, plen;
@@ -525,7 +533,7 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
     plen = (hwaddr)-1;
 
     /* This can never be MMIO. */
-    section = address_space_do_translate(as, addr, &xlat, &plen,
+    section = address_space_do_translate(d, addr, &xlat, &plen,
                                          is_write, false);
 
     /* Illegal translation */
@@ -549,7 +557,7 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
     plen -= 1;
 
     return (IOMMUTLBEntry) {
-        .target_as = section.address_space,
+        .target_dispatch = section.dispatch,
         .iova = addr & ~plen,
         .translated_addr = xlat & ~plen,
         .addr_mask = plen,
@@ -562,15 +570,15 @@ iotlb_fail:
 }
 
 /* Called from RCU critical section */
-MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
-                                      hwaddr *xlat, hwaddr *plen,
-                                      bool is_write)
+MemoryRegion *address_space_dispatch_translate(AddressSpaceDispatch *d,
+                                               hwaddr addr, hwaddr *xlat,
+                                               hwaddr *plen, bool is_write)
 {
     MemoryRegion *mr;
     MemoryRegionSection section;
 
     /* This can be MMIO, so setup MMIO bit. */
-    section = address_space_do_translate(as, addr, xlat, plen, is_write, true);
+    section = address_space_do_translate(d, addr, xlat, plen, is_write, true);
     mr = section.mr;
 
     if (xen_enabled() && memory_access_is_direct(mr, is_write)) {
@@ -587,7 +595,8 @@ address_space_translate_for_iotlb(CPUState *cpu, int asidx, hwaddr addr,
                                   hwaddr *xlat, hwaddr *plen)
 {
     MemoryRegionSection *section;
-    AddressSpaceDispatch *d = atomic_rcu_read(&cpu->cpu_ases[asidx].memory_dispatch);
+    AddressSpaceDispatch *d =
+            atomic_rcu_read(&cpu->cpu_ases[asidx].memory_dispatch);
 
     section = address_space_translate_internal(d, addr, xlat, plen, false);
 
@@ -1220,7 +1229,7 @@ hwaddr memory_region_section_get_iotlb(CPUState *cpu,
     } else {
         AddressSpaceDispatch *d;
 
-        d = atomic_rcu_read(&section->address_space->dispatch);
+        d = atomic_rcu_read(&section->dispatch);
         iotlb = section - d->map.sections;
         iotlb += xlat;
     }
@@ -1246,7 +1255,7 @@ hwaddr memory_region_section_get_iotlb(CPUState *cpu,
 
 static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
                              uint16_t section);
-static subpage_t *subpage_init(AddressSpace *as, hwaddr base);
+static subpage_t *subpage_init(AddressSpaceDispatch *d, hwaddr base);
 
 static void *(*phys_mem_alloc)(size_t size, uint64_t *align) =
                                qemu_anon_ram_alloc;
@@ -1318,8 +1327,8 @@ static void register_subpage(AddressSpaceDispatch *d, MemoryRegionSection *secti
     assert(existing->mr->subpage || existing->mr == &io_mem_unassigned);
 
     if (!(existing->mr->subpage)) {
-        subpage = subpage_init(d->as, base);
-        subsection.address_space = d->as;
+        subpage = subpage_init(d, base);
+        subsection.dispatch = d;
         subsection.mr = &subpage->iomem;
         phys_page_set(d, base >> TARGET_PAGE_BITS, 1,
                       phys_section_add(&d->map, &subsection));
@@ -2512,7 +2521,7 @@ static MemTxResult subpage_read(void *opaque, hwaddr addr, uint64_t *data,
     printf("%s: subpage %p len %u addr " TARGET_FMT_plx "\n", __func__,
            subpage, len, addr);
 #endif
-    res = address_space_read(subpage->as, addr + subpage->base,
+    res = address_space_dispatch_read(subpage->d, addr + subpage->base,
                              attrs, buf, len);
     if (res) {
         return res;
@@ -2562,7 +2571,7 @@ static MemTxResult subpage_write(void *opaque, hwaddr addr,
     default:
         abort();
     }
-    return address_space_write(subpage->as, addr + subpage->base,
+    return address_space_dispatch_write(subpage->d, addr + subpage->base,
                                attrs, buf, len);
 }
 
@@ -2575,8 +2584,8 @@ static bool subpage_accepts(void *opaque, hwaddr addr,
            __func__, subpage, is_write ? 'w' : 'r', len, addr);
 #endif
 
-    return address_space_access_valid(subpage->as, addr + subpage->base,
-                                      len, is_write);
+    return address_space_dispatch_access_valid(subpage->d, addr + subpage->base,
+                                               len, is_write);
 }
 
 static const MemoryRegionOps subpage_ops = {
@@ -2610,12 +2619,12 @@ static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
     return 0;
 }
 
-static subpage_t *subpage_init(AddressSpace *as, hwaddr base)
+static subpage_t *subpage_init(AddressSpaceDispatch *d, hwaddr base)
 {
     subpage_t *mmio;
 
     mmio = g_malloc0(sizeof(subpage_t) + TARGET_PAGE_SIZE * sizeof(uint16_t));
-    mmio->as = as;
+    mmio->d = d;
     mmio->base = base;
     memory_region_init_io(&mmio->iomem, NULL, &subpage_ops, mmio,
                           NULL, TARGET_PAGE_SIZE);
@@ -2629,12 +2638,12 @@ static subpage_t *subpage_init(AddressSpace *as, hwaddr base)
     return mmio;
 }
 
-static uint16_t dummy_section(PhysPageMap *map, AddressSpace *as,
+static uint16_t dummy_section(PhysPageMap *map, AddressSpaceDispatch *d,
                               MemoryRegion *mr)
 {
-    assert(as);
+    assert(d);
     MemoryRegionSection section = {
-        .address_space = as,
+        .dispatch = d,
         .mr = mr,
         .offset_within_address_space = 0,
         .offset_within_region = 0,
@@ -2677,17 +2686,16 @@ static void mem_begin(MemoryListener *listener)
     AddressSpaceDispatch *d = g_new0(AddressSpaceDispatch, 1);
     uint16_t n;
 
-    n = dummy_section(&d->map, as, &io_mem_unassigned);
+    n = dummy_section(&d->map, d, &io_mem_unassigned);
     assert(n == PHYS_SECTION_UNASSIGNED);
-    n = dummy_section(&d->map, as, &io_mem_notdirty);
+    n = dummy_section(&d->map, d, &io_mem_notdirty);
     assert(n == PHYS_SECTION_NOTDIRTY);
-    n = dummy_section(&d->map, as, &io_mem_rom);
+    n = dummy_section(&d->map, d, &io_mem_rom);
     assert(n == PHYS_SECTION_ROM);
-    n = dummy_section(&d->map, as, &io_mem_watch);
+    n = dummy_section(&d->map, d, &io_mem_watch);
     assert(n == PHYS_SECTION_WATCH);
 
     d->phys_map  = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .skip = 1 };
-    d->as = as;
     as->next_dispatch = d;
 }
 
@@ -2900,7 +2908,8 @@ static bool prepare_mmio_access(MemoryRegion *mr)
 }
 
 /* Called within RCU critical section.  */
-static MemTxResult address_space_write_continue(AddressSpace *as, hwaddr addr,
+static MemTxResult address_space_write_continue(AddressSpaceDispatch *d,
+                                                hwaddr addr,
                                                 MemTxAttrs attrs,
                                                 const uint8_t *buf,
                                                 int len, hwaddr addr1,
@@ -2966,14 +2975,15 @@ static MemTxResult address_space_write_continue(AddressSpace *as, hwaddr addr,
         }
 
         l = len;
-        mr = address_space_translate(as, addr, &addr1, &l, true);
+        mr = address_space_dispatch_translate(d, addr, &addr1, &l, true);
     }
 
     return result;
 }
 
-MemTxResult address_space_write(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
-                                const uint8_t *buf, int len)
+MemTxResult address_space_dispatch_write(AddressSpaceDispatch *d, hwaddr addr,
+                                         MemTxAttrs attrs, const uint8_t *buf,
+                                         int len)
 {
     hwaddr l;
     hwaddr addr1;
@@ -2983,8 +2993,8 @@ MemTxResult address_space_write(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
     if (len > 0) {
         rcu_read_lock();
         l = len;
-        mr = address_space_translate(as, addr, &addr1, &l, true);
-        result = address_space_write_continue(as, addr, attrs, buf, len,
+        mr = address_space_dispatch_translate(d, addr, &addr1, &l, true);
+        result = address_space_write_continue(d, addr, attrs, buf, len,
                                               addr1, l, mr);
         rcu_read_unlock();
     }
@@ -2993,7 +3003,7 @@ MemTxResult address_space_write(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
 }
 
 /* Called within RCU critical section.  */
-MemTxResult address_space_read_continue(AddressSpace *as, hwaddr addr,
+MemTxResult address_space_read_continue(AddressSpaceDispatch *d, hwaddr addr,
                                         MemTxAttrs attrs, uint8_t *buf,
                                         int len, hwaddr addr1, hwaddr l,
                                         MemoryRegion *mr)
@@ -3056,13 +3066,13 @@ MemTxResult address_space_read_continue(AddressSpace *as, hwaddr addr,
         }
 
         l = len;
-        mr = address_space_translate(as, addr, &addr1, &l, false);
+        mr = address_space_dispatch_translate(d, addr, &addr1, &l, false);
     }
 
     return result;
 }
 
-MemTxResult address_space_read_full(AddressSpace *as, hwaddr addr,
+MemTxResult address_space_read_full(AddressSpaceDispatch *d, hwaddr addr,
                                     MemTxAttrs attrs, uint8_t *buf, int len)
 {
     hwaddr l;
@@ -3073,8 +3083,8 @@ MemTxResult address_space_read_full(AddressSpace *as, hwaddr addr,
     if (len > 0) {
         rcu_read_lock();
         l = len;
-        mr = address_space_translate(as, addr, &addr1, &l, false);
-        result = address_space_read_continue(as, addr, attrs, buf, len,
+        mr = address_space_dispatch_translate(d, addr, &addr1, &l, false);
+        result = address_space_read_continue(d, addr, attrs, buf, len,
                                              addr1, l, mr);
         rcu_read_unlock();
     }
@@ -3082,13 +3092,14 @@ MemTxResult address_space_read_full(AddressSpace *as, hwaddr addr,
     return result;
 }
 
-MemTxResult address_space_rw(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
-                             uint8_t *buf, int len, bool is_write)
+MemTxResult address_space_dispatch_rw(AddressSpaceDispatch *d, hwaddr addr,
+                                      MemTxAttrs attrs, uint8_t *buf,
+                                      int len, bool is_write)
 {
     if (is_write) {
-        return address_space_write(as, addr, attrs, (uint8_t *)buf, len);
+        return address_space_dispatch_write(d, addr, attrs, buf, len);
     } else {
-        return address_space_read(as, addr, attrs, (uint8_t *)buf, len);
+        return address_space_dispatch_read(d, addr, attrs, buf, len);
     }
 }
 
@@ -3249,7 +3260,8 @@ static void cpu_notify_map_clients(void)
     qemu_mutex_unlock(&map_client_list_lock);
 }
 
-bool address_space_access_valid(AddressSpace *as, hwaddr addr, int len, bool is_write)
+bool address_space_dispatch_access_valid(AddressSpaceDispatch *d, hwaddr addr,
+                                         int len, bool is_write)
 {
     MemoryRegion *mr;
     hwaddr l, xlat;
@@ -3257,7 +3269,7 @@ bool address_space_access_valid(AddressSpace *as, hwaddr addr, int len, bool is_
     rcu_read_lock();
     while (len > 0) {
         l = len;
-        mr = address_space_translate(as, addr, &xlat, &l, is_write);
+        mr = address_space_dispatch_translate(d, addr, &xlat, &l, is_write);
         if (!memory_access_is_direct(mr, is_write)) {
             l = memory_access_size(mr, l, addr);
             if (!memory_region_access_valid(mr, xlat, l, is_write)) {
@@ -3274,7 +3286,8 @@ bool address_space_access_valid(AddressSpace *as, hwaddr addr, int len, bool is_
 }
 
 static hwaddr
-address_space_extend_translation(AddressSpace *as, hwaddr addr, hwaddr target_len,
+address_space_extend_translation(AddressSpaceDispatch *d, hwaddr addr,
+                                 hwaddr target_len,
                                  MemoryRegion *mr, hwaddr base, hwaddr len,
                                  bool is_write)
 {
@@ -3291,7 +3304,8 @@ address_space_extend_translation(AddressSpace *as, hwaddr addr, hwaddr target_le
         }
 
         len = target_len;
-        this_mr = address_space_translate(as, addr, &xlat, &len, is_write);
+        this_mr = address_space_dispatch_translate(d, addr, &xlat,
+                                                   &len, is_write);
         if (this_mr != mr || xlat != base + done) {
             return done;
         }
@@ -3314,6 +3328,7 @@ void *address_space_map(AddressSpace *as,
     hwaddr l, xlat;
     MemoryRegion *mr;
     void *ptr;
+    AddressSpaceDispatch *d = address_space_to_dispatch(as);
 
     if (len == 0) {
         return NULL;
@@ -3321,7 +3336,7 @@ void *address_space_map(AddressSpace *as,
 
     l = len;
     rcu_read_lock();
-    mr = address_space_translate(as, addr, &xlat, &l, is_write);
+    mr = address_space_dispatch_translate(d, addr, &xlat, &l, is_write);
 
     if (!memory_access_is_direct(mr, is_write)) {
         if (atomic_xchg(&bounce.in_use, true)) {
@@ -3337,7 +3352,7 @@ void *address_space_map(AddressSpace *as,
         memory_region_ref(mr);
         bounce.mr = mr;
         if (!is_write) {
-            address_space_read(as, addr, MEMTXATTRS_UNSPECIFIED,
+            address_space_dispatch_read(d, addr, MEMTXATTRS_UNSPECIFIED,
                                bounce.buffer, l);
         }
 
@@ -3348,7 +3363,8 @@ void *address_space_map(AddressSpace *as,
 
 
     memory_region_ref(mr);
-    *plen = address_space_extend_translation(as, addr, len, mr, xlat, l, is_write);
+    *plen = address_space_extend_translation(d, addr, len, mr, xlat,
+                                             l, is_write);
     ptr = qemu_ram_ptr_length(mr->ram_block, xlat, plen, true);
     rcu_read_unlock();
 
diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
index ae11e012c7..b3dedf5b74 100644
--- a/hw/alpha/typhoon.c
+++ b/hw/alpha/typhoon.c
@@ -604,7 +604,7 @@ static const MemoryRegionOps pchip_ops = {
 static bool make_iommu_tlbe(hwaddr taddr, hwaddr mask, IOMMUTLBEntry *ret)
 {
     *ret = (IOMMUTLBEntry) {
-        .target_as = &address_space_memory,
+        .target_dispatch = address_space_to_dispatch(&address_space_memory),
         .translated_addr = taddr,
         .addr_mask = mask,
         .perm = IOMMU_RW,
diff --git a/hw/dma/rc4030.c b/hw/dma/rc4030.c
index 5d4833eeca..d771020b78 100644
--- a/hw/dma/rc4030.c
+++ b/hw/dma/rc4030.c
@@ -495,7 +495,7 @@ static IOMMUTLBEntry rc4030_dma_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
 {
     rc4030State *s = container_of(iommu, rc4030State, dma_mr);
     IOMMUTLBEntry ret = {
-        .target_as = &address_space_memory,
+        .target_dispatch = address_space_to_dispatch(&address_space_memory),
         .iova = addr & ~(DMA_PAGESIZE - 1),
         .translated_addr = 0,
         .addr_mask = DMA_PAGESIZE - 1,
@@ -507,7 +507,7 @@ static IOMMUTLBEntry rc4030_dma_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
     i = addr / DMA_PAGESIZE;
     if (i < s->dma_tl_limit / sizeof(entry)) {
         entry_address = (s->dma_tl_base & 0x7fffffff) + i * sizeof(entry);
-        if (address_space_read(ret.target_as, entry_address,
+        if (address_space_dispatch_read(ret.target_dispatch, entry_address,
                                MEMTXATTRS_UNSPECIFIED, (unsigned char *)&entry,
                                sizeof(entry)) == MEMTX_OK) {
             ret.translated_addr = entry.frame & ~(DMA_PAGESIZE - 1);
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 334938a280..dc3303fdad 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -993,7 +993,7 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
     AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
     AMDVIState *s = as->iommu_state;
     IOMMUTLBEntry ret = {
-        .target_as = &address_space_memory,
+        .target_dispatch = address_space_to_dispatch(&address_space_memory),
         .iova = addr,
         .translated_addr = 0,
         .addr_mask = ~(hwaddr)0,
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index a7bf87a19e..b31fe2af75 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -787,7 +787,8 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
         entry_valid = read_cur | write_cur;
 
         if (vtd_is_last_slpte(slpte, level)) {
-            entry.target_as = &address_space_memory;
+            entry.target_dispatch =
+                    address_space_to_dispatch(&address_space_memory);
             entry.iova = iova & subpage_mask;
             /* NOTE: this is only meaningful if entry_valid == true */
             entry.translated_addr = vtd_get_slpte_addr(slpte);
@@ -1810,7 +1811,7 @@ static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
         sz = VTD_PAGE_SIZE;
     }
 
-    entry.target_as = &vtd_dev_as->as;
+    entry.target_dispatch = address_space_to_dispatch(&vtd_dev_as->as);
     entry.addr_mask = sz - 1;
     entry.iova = addr;
     entry.perm = IOMMU_NONE;
@@ -2270,7 +2271,7 @@ static IOMMUTLBEntry vtd_iommu_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
     IntelIOMMUState *s = vtd_as->iommu_state;
     IOMMUTLBEntry iotlb = {
         /* We'll fill in the rest later. */
-        .target_as = &address_space_memory,
+        .target_dispatch = address_space_to_dispatch(&address_space_memory),
     };
     bool success;
 
@@ -2781,7 +2782,7 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
         size = 1ULL << n;
     }
 
-    entry.target_as = &address_space_memory;
+    entry.target_dispatch = address_space_to_dispatch(&address_space_memory);
     /* Adjust iova for the size */
     entry.iova = n->start & ~(size - 1);
     /* This field is meaningless for unmap */
diff --git a/hw/intc/openpic_kvm.c b/hw/intc/openpic_kvm.c
index 0518e017c4..99578e7f7a 100644
--- a/hw/intc/openpic_kvm.c
+++ b/hw/intc/openpic_kvm.c
@@ -124,7 +124,7 @@ static void kvm_openpic_region_add(MemoryListener *listener,
     uint64_t reg_base;
     int ret;
 
-    if (section->address_space != &address_space_memory) {
+    if (section->dispatch != address_space_to_dispatch(&address_space_memory)) {
         abort();
     }
 
diff --git a/hw/pci-host/apb.c b/hw/pci-host/apb.c
index 96e5d0b60d..ecfc9060b9 100644
--- a/hw/pci-host/apb.c
+++ b/hw/pci-host/apb.c
@@ -218,7 +218,7 @@ static IOMMUTLBEntry pbm_translate_iommu(IOMMUMemoryRegion *iommu, hwaddr addr,
     uint64_t tte;
     uint32_t tsbsize;
     IOMMUTLBEntry ret = {
-        .target_as = &address_space_memory,
+        .target_dispatch = address_space_to_dispatch(&address_space_memory),
         .iova = 0,
         .translated_addr = 0,
         .addr_mask = ~(hwaddr)0,
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index ed2d53559a..566d7779bc 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -117,7 +117,7 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(IOMMUMemoryRegion *iommu,
     sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
     uint64_t tce;
     IOMMUTLBEntry ret = {
-        .target_as = &address_space_memory,
+        .target_dispatch = address_space_to_dispatch(&address_space_memory),
         .iova = 0,
         .translated_addr = 0,
         .addr_mask = ~(hwaddr)0,
@@ -407,7 +407,7 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
 
     tcet->table[index] = tce;
 
-    entry.target_as = &address_space_memory,
+    entry.target_dispatch = address_space_to_dispatch(&address_space_memory),
     entry.iova = (ioba - tcet->bus_offset) & page_mask;
     entry.translated_addr = tce & page_mask;
     entry.addr_mask = ~page_mask;
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 61cfd2138f..2f667ddf07 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -363,7 +363,7 @@ static IOMMUTLBEntry s390_translate_iommu(IOMMUMemoryRegion *mr, hwaddr addr,
     uint32_t flags;
     S390PCIIOMMU *iommu = container_of(mr, S390PCIIOMMU, iommu_mr);
     IOMMUTLBEntry ret = {
-        .target_as = &address_space_memory,
+        .target_dispatch = address_space_to_dispatch(&address_space_memory),
         .iova = 0,
         .translated_addr = 0,
         .addr_mask = ~(hwaddr)0,
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7b2924c0ef..04d21b92c1 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -358,9 +358,9 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     trace_vfio_iommu_map_notify(iotlb->perm == IOMMU_NONE ? "UNMAP" : "MAP",
                                 iova, iova + iotlb->addr_mask);
 
-    if (iotlb->target_as != &address_space_memory) {
-        error_report("Wrong target AS \"%s\", only system memory is allowed",
-                     iotlb->target_as->name ? iotlb->target_as->name : "none");
+    if (iotlb->target_dispatch !=
+            address_space_to_dispatch(&address_space_memory)) {
+        error_report("Wrong target, only system memory is allowed");
         return;
     }
 
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 6eddb099b0..ae2a3f5dd7 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -976,12 +976,12 @@ int vhost_device_iotlb_miss(struct vhost_dev *dev, uint64_t iova, int write)
     IOMMUTLBEntry iotlb;
     uint64_t uaddr, len;
     int ret = -EFAULT;
+    AddressSpaceDispatch *d = address_space_to_dispatch(dev->vdev->dma_as);
 
     rcu_read_lock();
 
-    iotlb = address_space_get_iotlb_entry(dev->vdev->dma_as,
-                                          iova, write);
-    if (iotlb.target_as != NULL) {
+    iotlb = address_space_get_iotlb_entry(d, iova, write);
+    if (iotlb.target_dispatch != NULL) {
         ret = vhost_memory_region_lookup(dev, iotlb.translated_addr,
                                          &uaddr, &len);
         if (ret) {
diff --git a/memory.c b/memory.c
index c0adc35410..c6904a7deb 100644
--- a/memory.c
+++ b/memory.c
@@ -154,7 +154,8 @@ enum ListenerDirection { Forward, Reverse };
 /* No need to ref/unref .mr, the FlatRange keeps it alive.  */
 #define MEMORY_LISTENER_UPDATE_REGION(fr, as, dir, callback, _args...)  \
     do {                                                                \
-        MemoryRegionSection mrs = section_from_flat_range(fr, as);      \
+        MemoryRegionSection mrs = section_from_flat_range(fr,           \
+                address_space_to_dispatch(as));                         \
         MEMORY_LISTENER_CALL(as, callback, dir, &mrs, ##_args);         \
     } while(0)
 
@@ -237,11 +238,11 @@ typedef struct AddressSpaceOps AddressSpaceOps;
     for (var = (view)->ranges; var < (view)->ranges + (view)->nr; ++var)
 
 static inline MemoryRegionSection
-section_from_flat_range(FlatRange *fr, AddressSpace *as)
+section_from_flat_range(FlatRange *fr, AddressSpaceDispatch *d)
 {
     return (MemoryRegionSection) {
         .mr = fr->mr,
-        .address_space = as,
+        .dispatch = d,
         .offset_within_region = fr->offset_in_region,
         .size = fr->addr.size,
         .offset_within_address_space = int128_get64(fr->addr.start),
@@ -727,6 +728,7 @@ static void address_space_add_del_ioeventfds(AddressSpace *as,
     unsigned iold, inew;
     MemoryRegionIoeventfd *fd;
     MemoryRegionSection section;
+    AddressSpaceDispatch *d = address_space_to_dispatch(as);
 
     /* Generate a symmetric difference of the old and new fd sets, adding
      * and deleting as necessary.
@@ -740,7 +742,7 @@ static void address_space_add_del_ioeventfds(AddressSpace *as,
                                                   fds_new[inew]))) {
             fd = &fds_old[iold];
             section = (MemoryRegionSection) {
-                .address_space = as,
+                .dispatch = d,
                 .offset_within_address_space = int128_get64(fd->addr.start),
                 .size = fd->addr.size,
             };
@@ -753,7 +755,7 @@ static void address_space_add_del_ioeventfds(AddressSpace *as,
                                                          fds_old[iold]))) {
             fd = &fds_new[inew];
             section = (MemoryRegionSection) {
-                .address_space = as,
+                .dispatch = d,
                 .offset_within_address_space = int128_get64(fd->addr.start),
                 .size = fd->addr.size,
             };
@@ -1835,7 +1837,9 @@ void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
         view = address_space_get_flatview(as);
         FOR_EACH_FLAT_RANGE(fr, view) {
             if (fr->mr == mr) {
-                MemoryRegionSection mrs = section_from_flat_range(fr, as);
+                MemoryRegionSection mrs = section_from_flat_range(fr,
+                        address_space_to_dispatch(as));
+
                 listener->log_sync(listener, &mrs);
             }
         }
@@ -1938,7 +1942,7 @@ static void memory_region_update_coalesced_range_as(MemoryRegion *mr, AddressSpa
     FOR_EACH_FLAT_RANGE(fr, view) {
         if (fr->mr == mr) {
             section = (MemoryRegionSection) {
-                .address_space = as,
+                .dispatch = address_space_to_dispatch(as),
                 .offset_within_address_space = int128_get64(fr->addr.start),
                 .size = fr->addr.size,
             };
@@ -2300,7 +2304,7 @@ static MemoryRegionSection memory_region_find_rcu(MemoryRegion *mr,
     }
 
     ret.mr = fr->mr;
-    ret.address_space = as;
+    ret.dispatch = address_space_to_dispatch(as);
     range = addrrange_intersection(range, fr->addr);
     ret.offset_within_region = fr->offset_in_region;
     ret.offset_within_region += int128_get64(int128_sub(range.start,
@@ -2349,7 +2353,9 @@ void memory_global_dirty_log_sync(void)
         view = address_space_get_flatview(as);
         FOR_EACH_FLAT_RANGE(fr, view) {
             if (fr->dirty_log_mask) {
-                MemoryRegionSection mrs = section_from_flat_range(fr, as);
+                MemoryRegionSection mrs = section_from_flat_range(fr,
+                        address_space_to_dispatch(as));
+
                 listener->log_sync(listener, &mrs);
             }
         }
@@ -2434,7 +2440,7 @@ static void listener_add_address_space(MemoryListener *listener,
     FOR_EACH_FLAT_RANGE(fr, view) {
         MemoryRegionSection section = {
             .mr = fr->mr,
-            .address_space = as,
+            .dispatch = address_space_to_dispatch(as),
             .offset_within_region = fr->offset_in_region,
             .size = fr->addr.size,
             .offset_within_address_space = int128_get64(fr->addr.start),
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC PATCH qemu 3/4] memory: Share flat views and dispatch trees between address spaces
  2017-09-07  9:20 [Qemu-devel] [RFC PATCH qemu 0/4] memory: Reduce memory use Alexey Kardashevskiy
  2017-09-07  9:20 ` [Qemu-devel] [RFC PATCH qemu 1/4] memory: Postpone flatview and dispatch tree building till all devices are added Alexey Kardashevskiy
  2017-09-07  9:20 ` [Qemu-devel] [RFC PATCH qemu 2/4] memory: Prepare for shared flat views Alexey Kardashevskiy
@ 2017-09-07  9:20 ` Alexey Kardashevskiy
  2017-09-07 20:53   ` Philippe Mathieu-Daudé
  2017-09-11  7:40   ` Paolo Bonzini
  2017-09-07  9:20 ` [Qemu-devel] [RFC PATCH qemu 4/4] memory: Add flat views to HMP "info mtree" Alexey Kardashevskiy
  2017-09-07  9:51 ` [Qemu-devel] [RFC PATCH qemu 0/4] memory: Reduce memory use Dr. David Alan Gilbert
  4 siblings, 2 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-09-07  9:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, David Gibson, Paolo Bonzini,
	Stefan Hajnoczi, Peter Maydell

This allows sharing flat views between address spaces when the same root
memory region is used when creating a new address space.

This adds a global list of flat views and a list of attached address
spaces per a flat view. Each address space references a flat view.

This hard codes the dispatch tree building instead of doing so via
a memory listener.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

This was suggested by Paolo in
https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg05011.html

I am not putting "Suggested-by" as I want to make sure that this is doing
what was actually suggested.
---
 include/exec/memory-internal.h |   6 +-
 include/exec/memory.h          |   9 +-
 exec.c                         |  58 ++--------
 hw/pci/pci.c                   |   3 +-
 memory.c                       | 253 +++++++++++++++++++++++++++--------------
 5 files changed, 187 insertions(+), 142 deletions(-)

diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
index fb467acdba..8516e0b48f 100644
--- a/include/exec/memory-internal.h
+++ b/include/exec/memory-internal.h
@@ -22,9 +22,11 @@
 #ifndef CONFIG_USER_ONLY
 typedef struct AddressSpaceDispatch AddressSpaceDispatch;
 
-void address_space_init_dispatch(AddressSpace *as);
 void address_space_unregister(AddressSpace *as);
-void address_space_destroy_dispatch(AddressSpace *as);
+void address_space_dispatch_free(AddressSpaceDispatch *d);
+AddressSpaceDispatch *mem_begin(void);
+void mem_commit(AddressSpaceDispatch *d);
+void mem_add(AddressSpaceDispatch *d, MemoryRegionSection *section);
 
 extern const MemoryRegionOps unassigned_mem_ops;
 
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 83e82e90d5..41ab165302 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -27,6 +27,7 @@
 #include "qemu/rcu.h"
 #include "hw/qdev-core.h"
 
+typedef struct AddressSpaceDispatch AddressSpaceDispatch;
 #define RAM_ADDR_INVALID (~(ram_addr_t)0)
 
 #define MAX_PHYS_ADDR_SPACE_BITS 62
@@ -312,6 +313,7 @@ struct MemoryListener {
 };
 
 AddressSpaceDispatch *address_space_to_dispatch(AddressSpace *as);
+MemoryRegion *address_space_root(AddressSpace *as);
 
 /**
  * AddressSpace: describes a mapping of addresses to #MemoryRegion objects
@@ -320,20 +322,17 @@ struct AddressSpace {
     /* All fields are private. */
     struct rcu_head rcu;
     char *name;
-    MemoryRegion *root;
-    int ref_count;
-    bool malloced;
 
     /* Accessed via RCU.  */
     struct FlatView *current_map;
 
     int ioeventfd_nb;
     struct MemoryRegionIoeventfd *ioeventfds;
-    struct AddressSpaceDispatch *dispatch;
-    struct AddressSpaceDispatch *next_dispatch;
+
     MemoryListener dispatch_listener;
     QTAILQ_HEAD(memory_listeners_as, MemoryListener) listeners;
     QTAILQ_ENTRY(AddressSpace) address_spaces_link;
+    QTAILQ_ENTRY(AddressSpace) flat_view_link;
 };
 
 /**
diff --git a/exec.c b/exec.c
index 66f01f5fce..51243f57f4 100644
--- a/exec.c
+++ b/exec.c
@@ -188,15 +188,12 @@ typedef struct PhysPageMap {
 } PhysPageMap;
 
 struct AddressSpaceDispatch {
-    struct rcu_head rcu;
-
     MemoryRegionSection *mru_section;
     /* This is a multi-level map on the physical address space.
      * The bottom level has pointers to MemoryRegionSections.
      */
     PhysPageEntry phys_map;
     PhysPageMap map;
-    AddressSpace *as;
 };
 
 typedef struct AddressSpaceDispatch AddressSpaceDispatch;
@@ -240,11 +237,6 @@ struct DirtyBitmapSnapshot {
     unsigned long dirty[];
 };
 
-AddressSpaceDispatch *address_space_to_dispatch(AddressSpace *as)
-{
-    return atomic_rcu_read(&as->dispatch);
-}
-
 #endif
 
 #if !defined(CONFIG_USER_ONLY)
@@ -1354,10 +1346,8 @@ static void register_multipage(AddressSpaceDispatch *d,
     phys_page_set(d, start_addr >> TARGET_PAGE_BITS, num_pages, section_index);
 }
 
-static void mem_add(MemoryListener *listener, MemoryRegionSection *section)
+void mem_add(AddressSpaceDispatch *d, MemoryRegionSection *section)
 {
-    AddressSpace *as = container_of(listener, AddressSpace, dispatch_listener);
-    AddressSpaceDispatch *d = as->next_dispatch;
     MemoryRegionSection now = *section, remain = *section;
     Int128 page_size = int128_make64(TARGET_PAGE_SIZE);
 
@@ -2680,9 +2670,8 @@ static void io_mem_init(void)
                           NULL, UINT64_MAX);
 }
 
-static void mem_begin(MemoryListener *listener)
+AddressSpaceDispatch *mem_begin(void)
 {
-    AddressSpace *as = container_of(listener, AddressSpace, dispatch_listener);
     AddressSpaceDispatch *d = g_new0(AddressSpaceDispatch, 1);
     uint16_t n;
 
@@ -2696,27 +2685,19 @@ static void mem_begin(MemoryListener *listener)
     assert(n == PHYS_SECTION_WATCH);
 
     d->phys_map  = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .skip = 1 };
-    as->next_dispatch = d;
+
+    return d;
 }
 
-static void address_space_dispatch_free(AddressSpaceDispatch *d)
+void address_space_dispatch_free(AddressSpaceDispatch *d)
 {
     phys_sections_free(&d->map);
     g_free(d);
 }
 
-static void mem_commit(MemoryListener *listener)
+void mem_commit(AddressSpaceDispatch *d)
 {
-    AddressSpace *as = container_of(listener, AddressSpace, dispatch_listener);
-    AddressSpaceDispatch *cur = as->dispatch;
-    AddressSpaceDispatch *next = as->next_dispatch;
-
-    phys_page_compact_all(next, next->map.nodes_nb);
-
-    atomic_rcu_set(&as->dispatch, next);
-    if (cur) {
-        call_rcu(cur, address_space_dispatch_free, rcu);
-    }
+    phys_page_compact_all(d, d->map.nodes_nb);
 }
 
 static void tcg_commit(MemoryListener *listener)
@@ -2732,39 +2713,16 @@ static void tcg_commit(MemoryListener *listener)
      * We reload the dispatch pointer now because cpu_reloading_memory_map()
      * may have split the RCU critical section.
      */
-    d = atomic_rcu_read(&cpuas->as->dispatch);
+    d = address_space_to_dispatch(cpuas->as);
     atomic_rcu_set(&cpuas->memory_dispatch, d);
     tlb_flush(cpuas->cpu);
 }
 
-void address_space_init_dispatch(AddressSpace *as)
-{
-    as->dispatch = NULL;
-    as->dispatch_listener = (MemoryListener) {
-        .begin = mem_begin,
-        .commit = mem_commit,
-        .region_add = mem_add,
-        .region_nop = mem_add,
-        .priority = 0,
-    };
-    memory_listener_register(&as->dispatch_listener, as);
-}
-
 void address_space_unregister(AddressSpace *as)
 {
     memory_listener_unregister(&as->dispatch_listener);
 }
 
-void address_space_destroy_dispatch(AddressSpace *as)
-{
-    AddressSpaceDispatch *d = as->dispatch;
-
-    atomic_rcu_set(&as->dispatch, NULL);
-    if (d) {
-        call_rcu(d, address_space_dispatch_free, rcu);
-    }
-}
-
 static void memory_map_init(void)
 {
     system_memory = g_malloc(sizeof(*system_memory));
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 258fbe51e2..86b9e419fd 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -88,7 +88,8 @@ static void pci_init_bus_master(PCIDevice *pci_dev)
 
     memory_region_init_alias(&pci_dev->bus_master_enable_region,
                              OBJECT(pci_dev), "bus master",
-                             dma_as->root, 0, memory_region_size(dma_as->root));
+                             address_space_root(dma_as), 0,
+                             memory_region_size(address_space_root(dma_as)));
     memory_region_set_enabled(&pci_dev->bus_master_enable_region, false);
     memory_region_add_subregion(&pci_dev->bus_master_container_region, 0,
                                 &pci_dev->bus_master_enable_region);
diff --git a/memory.c b/memory.c
index c6904a7deb..385a507511 100644
--- a/memory.c
+++ b/memory.c
@@ -47,6 +47,9 @@ static QTAILQ_HEAD(memory_listeners, MemoryListener) memory_listeners
 static QTAILQ_HEAD(, AddressSpace) address_spaces
     = QTAILQ_HEAD_INITIALIZER(address_spaces);
 
+static QTAILQ_HEAD(FlatViewList, FlatView) flat_views
+    = QTAILQ_HEAD_INITIALIZER(flat_views);
+
 typedef struct AddrRange AddrRange;
 
 /*
@@ -230,6 +233,11 @@ struct FlatView {
     FlatRange *ranges;
     unsigned nr;
     unsigned nr_allocated;
+    MemoryRegion *root;
+    struct AddressSpaceDispatch *dispatch;
+
+    QTAILQ_ENTRY(FlatView) flat_views_link;
+    QTAILQ_HEAD(address_spaces, AddressSpace) address_spaces;
 };
 
 typedef struct AddressSpaceOps AddressSpaceOps;
@@ -259,12 +267,19 @@ static bool flatrange_equal(FlatRange *a, FlatRange *b)
         && a->readonly == b->readonly;
 }
 
-static void flatview_init(FlatView *view)
+static void flatview_ref(FlatView *view);
+
+static FlatView *flatview_alloc(MemoryRegion *mr_root)
 {
+    FlatView *view;
+
+    view = g_new0(FlatView, 1);
     view->ref = 1;
-    view->ranges = NULL;
-    view->nr = 0;
-    view->nr_allocated = 0;
+    view->root = mr_root;
+    memory_region_ref(mr_root);
+    QTAILQ_INIT(&view->address_spaces);
+
+    return view;
 }
 
 /* Insert a range into a given position.  Caller is responsible for maintaining
@@ -292,6 +307,10 @@ static void flatview_destroy(FlatView *view)
         memory_region_unref(view->ranges[i].mr);
     }
     g_free(view->ranges);
+    if (view->dispatch) {
+        address_space_dispatch_free(view->dispatch);
+    }
+    memory_region_unref(view->root);
     g_free(view);
 }
 
@@ -303,7 +322,7 @@ static void flatview_ref(FlatView *view)
 static void flatview_unref(FlatView *view)
 {
     if (atomic_fetch_dec(&view->ref) == 1) {
-        flatview_destroy(view);
+        call_rcu(view, flatview_destroy, rcu);
     }
 }
 
@@ -608,7 +627,7 @@ static AddressSpace *memory_region_to_address_space(MemoryRegion *mr)
         mr = mr->container;
     }
     QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
-        if (mr == as->root) {
+        if (mr == as->current_map->root) {
             return as;
         }
     }
@@ -702,23 +721,6 @@ static void render_memory_region(FlatView *view,
     }
 }
 
-/* Render a memory topology into a list of disjoint absolute ranges. */
-static FlatView *generate_memory_topology(MemoryRegion *mr)
-{
-    FlatView *view;
-
-    view = g_new(FlatView, 1);
-    flatview_init(view);
-
-    if (mr) {
-        render_memory_region(view, mr, int128_zero(),
-                             addrrange_make(int128_zero(), int128_2_64()), false);
-    }
-    flatview_simplify(view);
-
-    return view;
-}
-
 static void address_space_add_del_ioeventfds(AddressSpace *as,
                                              MemoryRegionIoeventfd *fds_new,
                                              unsigned fds_new_nb,
@@ -769,12 +771,10 @@ static void address_space_add_del_ioeventfds(AddressSpace *as,
     }
 }
 
-static FlatView *address_space_get_flatview(AddressSpace *as)
+static FlatView *flatview_get(FlatView *view)
 {
-    FlatView *view;
-
     rcu_read_lock();
-    view = atomic_rcu_read(&as->current_map);
+    view = atomic_rcu_read(&view);
     flatview_ref(view);
     rcu_read_unlock();
     return view;
@@ -789,7 +789,7 @@ static void address_space_update_ioeventfds(AddressSpace *as)
     AddrRange tmp;
     unsigned i;
 
-    view = address_space_get_flatview(as);
+    view = flatview_get(as->current_map);
     FOR_EACH_FLAT_RANGE(fr, view) {
         for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
             tmp = addrrange_shift(fr->mr->ioeventfds[i].addr,
@@ -881,28 +881,89 @@ static void address_space_update_topology_pass(AddressSpace *as,
     }
 }
 
-
-static void address_space_update_topology(AddressSpace *as)
+static FlatView *address_space_update_flatview(FlatView *view)
 {
-    FlatView *old_view = address_space_get_flatview(as);
-    FlatView *new_view = generate_memory_topology(as->root);
+    AddressSpace *as, *asnext;
+    FlatView *old_view = flatview_get(view);
+    MemoryRegion *root = old_view->root;
+    FlatView *new_view = flatview_alloc(root);
+    unsigned iold, inew;
+    FlatRange *frold, *frnew;
 
-    address_space_update_topology_pass(as, old_view, new_view, false);
-    address_space_update_topology_pass(as, old_view, new_view, true);
+    if (root) {
+        render_memory_region(new_view, root, int128_zero(),
+                             addrrange_make(int128_zero(), int128_2_64()),
+                             false);
+        flatview_simplify(new_view);
+    }
 
-    /* Writes are protected by the BQL.  */
-    atomic_rcu_set(&as->current_map, new_view);
-    call_rcu(old_view, flatview_unref, rcu);
+    new_view->dispatch = mem_begin();
 
-    /* Note that all the old MemoryRegions are still alive up to this
-     * point.  This relieves most MemoryListeners from the need to
-     * ref/unref the MemoryRegions they get---unless they use them
-     * outside the iothread mutex, in which case precise reference
-     * counting is necessary.
+    /*
+     * FIXME: this is cut-n-paste from address_space_update_topology_pass,
+     * simplify it
      */
+    iold = inew = 0;
+    while (iold < old_view->nr || inew < new_view->nr) {
+        if (iold < old_view->nr) {
+            frold = &old_view->ranges[iold];
+        } else {
+            frold = NULL;
+        }
+        if (inew < new_view->nr) {
+            frnew = &new_view->ranges[inew];
+        } else {
+            frnew = NULL;
+        }
+
+        if (frold
+            && (!frnew
+                || int128_lt(frold->addr.start, frnew->addr.start)
+                || (int128_eq(frold->addr.start, frnew->addr.start)
+                    && !flatrange_equal(frold, frnew)))) {
+            ++iold;
+        } else if (frold && frnew && flatrange_equal(frold, frnew)) {
+            /* In both and unchanged (except logging may have changed) */
+            MemoryRegionSection mrs = section_from_flat_range(frnew,
+                    new_view->dispatch);
+
+            mem_add(new_view->dispatch, &mrs);
+
+            ++iold;
+            ++inew;
+        } else {
+            /* In new */
+            MemoryRegionSection mrs = section_from_flat_range(frnew,
+                    new_view->dispatch);
+
+            mem_add(new_view->dispatch, &mrs);
+
+            ++inew;
+        }
+    }
+
+    mem_commit(new_view->dispatch);
+
+    QTAILQ_FOREACH(as, &old_view->address_spaces, flat_view_link) {
+        address_space_update_topology_pass(as, old_view, new_view, false);
+        address_space_update_topology_pass(as, old_view, new_view, true);
+    }
+
+    QTAILQ_FOREACH_SAFE(as, &old_view->address_spaces, flat_view_link, asnext) {
+        QTAILQ_REMOVE(&old_view->address_spaces, as, flat_view_link);
+        flatview_unref(old_view);
+
+        atomic_rcu_set(&as->current_map, new_view);
+
+        flatview_ref(new_view);
+        QTAILQ_INSERT_TAIL(&new_view->address_spaces, as, flat_view_link);
+
+        address_space_update_ioeventfds(as);
+    }
+
     flatview_unref(old_view);
 
-    address_space_update_ioeventfds(as);
+    return new_view;
 }
 
 void memory_region_transaction_begin(void)
@@ -921,11 +982,31 @@ void memory_region_transaction_commit(void)
     --memory_region_transaction_depth;
     if (!memory_region_transaction_depth) {
         if (memory_region_update_pending) {
+            FlatView *view, *vnext;
+            struct FlatViewList fwstmp = QTAILQ_HEAD_INITIALIZER(fwstmp);
+
             MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
 
-            QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
-                address_space_update_topology(as);
+            QTAILQ_FOREACH_SAFE(view, &flat_views, flat_views_link, vnext) {
+                FlatView *old_view, *new_view;
+
+                old_view = flatview_get(view);
+                new_view = address_space_update_flatview(old_view);
+
+                QTAILQ_REMOVE(&flat_views, old_view, flat_views_link);
+                flatview_unref(old_view);
+                flatview_unref(old_view);
+
+                QTAILQ_INSERT_HEAD(&fwstmp, new_view, flat_views_link);
+
+                flatview_unref(new_view);
             }
+
+            QTAILQ_FOREACH_SAFE(view, &fwstmp, flat_views_link, vnext) {
+                QTAILQ_REMOVE(&fwstmp, view, flat_views_link);
+                QTAILQ_INSERT_HEAD(&flat_views, view, flat_views_link);
+            }
+
             memory_region_update_pending = false;
             MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
         } else if (ioeventfd_update_pending) {
@@ -1834,7 +1915,7 @@ void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
             continue;
         }
         as = listener->address_space;
-        view = address_space_get_flatview(as);
+        view = flatview_get(as->current_map);
         FOR_EACH_FLAT_RANGE(fr, view) {
             if (fr->mr == mr) {
                 MemoryRegionSection mrs = section_from_flat_range(fr,
@@ -1938,7 +2019,7 @@ static void memory_region_update_coalesced_range_as(MemoryRegion *mr, AddressSpa
     AddrRange tmp;
     MemoryRegionSection section;
 
-    view = address_space_get_flatview(as);
+    view = flatview_get(as->current_map);
     FOR_EACH_FLAT_RANGE(fr, view) {
         if (fr->mr == mr) {
             section = (MemoryRegionSection) {
@@ -2350,7 +2431,7 @@ void memory_global_dirty_log_sync(void)
             continue;
         }
         as = listener->address_space;
-        view = address_space_get_flatview(as);
+        view = flatview_get(as->current_map);
         FOR_EACH_FLAT_RANGE(fr, view) {
             if (fr->dirty_log_mask) {
                 MemoryRegionSection mrs = section_from_flat_range(fr,
@@ -2436,7 +2517,7 @@ static void listener_add_address_space(MemoryListener *listener,
         }
     }
 
-    view = address_space_get_flatview(as);
+    view = flatview_get(as->current_map);
     FOR_EACH_FLAT_RANGE(fr, view) {
         MemoryRegionSection section = {
             .mr = fr->mr,
@@ -2615,67 +2696,72 @@ void memory_region_invalidate_mmio_ptr(MemoryRegion *mr, hwaddr offset,
 
 void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
 {
-    memory_region_ref(root);
-    memory_region_transaction_begin();
-    as->ref_count = 1;
-    as->root = root;
-    as->malloced = false;
-    as->current_map = g_new(FlatView, 1);
-    flatview_init(as->current_map);
+    FlatView *view;
+
+    as->current_map = NULL;
     as->ioeventfd_nb = 0;
     as->ioeventfds = NULL;
     QTAILQ_INIT(&as->listeners);
     QTAILQ_INSERT_TAIL(&address_spaces, as, address_spaces_link);
     as->name = g_strdup(name ? name : "anonymous");
-    address_space_init_dispatch(as);
-    memory_region_update_pending |= root->enabled;
-    memory_region_transaction_commit();
+
+    QTAILQ_FOREACH(view, &flat_views, flat_views_link) {
+        assert(root);
+        if (view->root == root) {
+            as->current_map = flatview_get(view);
+            break;
+        }
+    }
+
+    if (!as->current_map) {
+        as->current_map = flatview_alloc(root);
+
+        QTAILQ_INSERT_TAIL(&flat_views, as->current_map, flat_views_link);
+    }
+
+    QTAILQ_INSERT_TAIL(&as->current_map->address_spaces, as, flat_view_link);
+}
+
+MemoryRegion *address_space_root(AddressSpace *as)
+{
+    return as->current_map->root;
+}
+
+AddressSpaceDispatch *address_space_to_dispatch(AddressSpace *as)
+{
+    return atomic_rcu_read(&as->current_map)->dispatch;
 }
 
 static void do_address_space_destroy(AddressSpace *as)
 {
-    bool do_free = as->malloced;
+    FlatView *view = flatview_get(as->current_map);
 
-    address_space_destroy_dispatch(as);
     assert(QTAILQ_EMPTY(&as->listeners));
 
-    flatview_unref(as->current_map);
+    QTAILQ_REMOVE(&view->address_spaces, as, flat_view_link);
+    flatview_unref(view);
+
+    flatview_unref(view);
+
     g_free(as->name);
     g_free(as->ioeventfds);
-    memory_region_unref(as->root);
-    if (do_free) {
-        g_free(as);
-    }
+    g_free(as);
 }
 
 AddressSpace *address_space_init_shareable(MemoryRegion *root, const char *name)
 {
     AddressSpace *as;
 
-    QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
-        if (root == as->root && as->malloced) {
-            as->ref_count++;
-            return as;
-        }
-    }
-
     as = g_malloc0(sizeof *as);
     address_space_init(as, root, name);
-    as->malloced = true;
+
     return as;
 }
 
 void address_space_destroy(AddressSpace *as)
 {
-    MemoryRegion *root = as->root;
-
-    as->ref_count--;
-    if (as->ref_count) {
-        return;
-    }
     /* Flush out anything from MemoryListeners listening in on this */
     memory_region_transaction_begin();
-    as->root = NULL;
     memory_region_transaction_commit();
     QTAILQ_REMOVE(&address_spaces, as, address_spaces_link);
     address_space_unregister(as);
@@ -2684,7 +2770,6 @@ void address_space_destroy(AddressSpace *as)
      * entries that the guest should never use.  Wait for the old
      * values to expire before freeing the data.
      */
-    as->root = root;
     call_rcu(as, do_address_space_destroy, rcu);
 }
 
@@ -2816,7 +2901,7 @@ static void mtree_print_mr(fprintf_function mon_printf, void *f,
 static void mtree_print_flatview(fprintf_function p, void *f,
                                  AddressSpace *as)
 {
-    FlatView *view = address_space_get_flatview(as);
+    FlatView *view = flatview_get(as->current_map);
     FlatRange *range = &view->ranges[0];
     MemoryRegion *mr;
     int n = view->nr;
@@ -2873,7 +2958,7 @@ void mtree_info(fprintf_function mon_printf, void *f, bool flatview)
 
     QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
         mon_printf(f, "address-space: %s\n", as->name);
-        mtree_print_mr(mon_printf, f, as->root, 1, 0, &ml_head);
+        mtree_print_mr(mon_printf, f, as->current_map->root, 1, 0, &ml_head);
         mon_printf(f, "\n");
     }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC PATCH qemu 4/4] memory: Add flat views to HMP "info mtree"
  2017-09-07  9:20 [Qemu-devel] [RFC PATCH qemu 0/4] memory: Reduce memory use Alexey Kardashevskiy
                   ` (2 preceding siblings ...)
  2017-09-07  9:20 ` [Qemu-devel] [RFC PATCH qemu 3/4] memory: Share flat views and dispatch trees between address spaces Alexey Kardashevskiy
@ 2017-09-07  9:20 ` Alexey Kardashevskiy
  2017-09-07  9:51 ` [Qemu-devel] [RFC PATCH qemu 0/4] memory: Reduce memory use Dr. David Alan Gilbert
  4 siblings, 0 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-09-07  9:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, David Gibson, Paolo Bonzini,
	Stefan Hajnoczi, Peter Maydell

This adds a new switch to "info mtree" to print dispatch tree internals.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

Example:

aik@fstn1-p1:~$ echo "info mtree -f -d" | nc localhost 30000
QEMU 2.9.94 monitor - type 'help' for more information
(qemu) info mtree -f -d
FlatView #0
 AS "memory"
  0000000000000000-000000007fffffff (prio 0, ram): ppc_spapr.ram
 AS "cpu-memory"
  0000000000000000-000000007fffffff (prio 0, ram): ppc_spapr.ram
 AS "cpu-memory"
  0000000000000000-000000007fffffff (prio 0, ram): ppc_spapr.ram
  Dispatch
    Physical sections
      #0 @0000000000000000 root="(noname)" [unassigned]
      #1 @0000000000000000 root="(noname)" [not dirty]
      #2 @0000000000000000 root="(noname)" [ROM]
      #3 @0000000000000000 root="(noname)" [watch]
      #4 @0000000000000000 root="ppc_spapr.ram"
    Nodes (9 bits per level, 6 levels) ptr=[3] skip=4
      [0]
	  0       skip=3  ptr=[3]
	  1..511  skip=1  ptr=NIL
      [1]
	  0       skip=2  ptr=[3]
	  1..511  skip=1  ptr=NIL
      [2]
	  0       skip=1  ptr=[3]
	  1..511  skip=1  ptr=NIL
      [3]
	  0..1    skip=0  ptr=#4
	  2..511  skip=1  ptr=NIL

FlatView #1
 AS "I/O"
  0000000000000000-000000000000ffff (prio 0, i/o): io
  Dispatch
    Physical sections
      #0 @0000000000000000 root="(noname)" [unassigned]
      #1 @0000000000000000 root="(noname)" [not dirty]
      #2 @0000000000000000 root="(noname)" [ROM]
      #3 @0000000000000000 root="(noname)" [watch]
      #4 @0000000000000000 root="io"
    Nodes (9 bits per level, 6 levels) ptr=[5] skip=6
      [0]
	  0       skip=5  ptr=[5]
	  1..511  skip=1  ptr=NIL
      [1]
	  0       skip=4  ptr=[5]
	  1..511  skip=1  ptr=NIL
      [2]
	  0       skip=3  ptr=[5]
	  1..511  skip=1  ptr=NIL
      [3]
	  0       skip=2  ptr=[5]
	  1..511  skip=1  ptr=NIL
      [4]
	  0       skip=1  ptr=[5]
	  1..511  skip=1  ptr=NIL
      [5]
	  0..15   skip=0  ptr=#4
	 16..511  skip=0  ptr=#0

FlatView #2
 AS "pci@800000020000000"
  0000000000000000-000000003fffffff (prio 0, i/o): tce-iommu-80000000
  0000040000000000-000004000000ffff (prio 0, i/o): msi
  Dispatch
    Physical sections
      #0 @0000000000000000 root="(noname)" [unassigned]
      #1 @0000000000000000 root="(noname)" [not dirty]
      #2 @0000000000000000 root="(noname)" [ROM]
      #3 @0000000000000000 root="(noname)" [watch]
      #4 @0000000000000000 root="tce-iommu-80000000"
      #5 @0000040000000000 root="msi"
    Nodes (9 bits per level, 6 levels) ptr=[2] skip=3
      [0]
	  0       skip=2  ptr=[2]
	  1..511  skip=1  ptr=NIL
      [1]
	  0       skip=1  ptr=[2]
	  1..511  skip=1  ptr=NIL
      [2]
	  0       skip=0  ptr=#4
	  1..7    skip=1  ptr=NIL
	  8       skip=3  ptr=[6]
	  9..511  skip=1  ptr=NIL
      [3]
	  0       skip=0  ptr=#4
	  1..511  skip=1  ptr=NIL
      [4]
	  0       skip=2  ptr=[6]
	  1..511  skip=1  ptr=NIL
      [5]
	  0       skip=1  ptr=[6]
	  1..511  skip=1  ptr=NIL
      [6]
	  0..15   skip=0  ptr=#5
	 16..511  skip=0  ptr=#0

(qemu)
---
 include/exec/memory.h |  5 +++-
 exec.c                | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++
 memory.c              | 20 ++++++++++++----
 monitor.c             |  3 ++-
 hmp-commands-info.hx  |  7 +++---
 5 files changed, 92 insertions(+), 9 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 41ab165302..2a50bbe79f 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1527,7 +1527,10 @@ void memory_global_dirty_log_start(void);
  */
 void memory_global_dirty_log_stop(void);
 
-void mtree_info(fprintf_function mon_printf, void *f, bool flatview);
+void mtree_info(fprintf_function mon_printf, void *f, bool flatview,
+                bool dispatch_tree);
+void mtree_print_dispatch(fprintf_function mon, void *f,
+                          struct AddressSpaceDispatch *d);
 
 /**
  * memory_region_request_mmio_ptr: request a pointer to an mmio
diff --git a/exec.c b/exec.c
index 51243f57f4..8c565e9102 100644
--- a/exec.c
+++ b/exec.c
@@ -3605,3 +3605,69 @@ void page_size_init(void)
     }
     qemu_host_page_mask = -(intptr_t)qemu_host_page_size;
 }
+
+static void mtree_print_phys_entries(fprintf_function mon, void *f,
+                                     int start, int end, int skip, int ptr)
+{
+    if (start == end - 1) {
+        mon(f, "\t%3d      ", start);
+    } else {
+        mon(f, "\t%3d..%-3d ", start, end - 1);
+    }
+    mon(f, " skip=%d ", skip);
+    if (ptr == PHYS_MAP_NODE_NIL) {
+        mon(f, " ptr=NIL");
+    } else if (!skip) {
+        mon(f, " ptr=#%d", ptr);
+    } else {
+        mon(f, " ptr=[%d]", ptr);
+    }
+    mon(f, "\n");
+}
+
+void mtree_print_dispatch(fprintf_function mon, void *f,
+                          AddressSpaceDispatch *d)
+{
+    int i;
+
+    mon(f, "  Dispatch\n");
+    mon(f, "    Physical sections\n");
+
+    for (i = 0; i < d->map.sections_nb; ++i) {
+        MemoryRegionSection *s = d->map.sections + i;
+        const char *names[] = { "[unassigned]", "[not dirty]",
+                                "[ROM]", "[watch]" };
+
+        mon(f, "      #%d @%016lX root=\"%s\" %s\n", i,
+                   s->offset_within_address_space,
+                   s->mr->name ? s->mr->name : "(noname)",
+                   i < ARRAY_SIZE(names) ? names[i] : "");
+    }
+
+    mon(f, "    Nodes (%d bits per level, %d levels) ptr=[%d] skip=%d\n",
+               P_L2_BITS, P_L2_LEVELS, d->phys_map.ptr, d->phys_map.skip);
+    for (i = 0; i < d->map.nodes_nb; ++i) {
+        int j, jprev;
+        PhysPageEntry prev;
+        Node *n = d->map.nodes + i;
+
+        mon(f, "      [%d]\n", i);
+
+        for (j = 0, jprev = 0, prev = *n[0]; j < ARRAY_SIZE(*n); ++j) {
+            PhysPageEntry *pe = *n + j;
+
+            if (pe->ptr == prev.ptr && pe->skip == prev.skip) {
+                continue;
+            }
+
+            mtree_print_phys_entries(mon, f, jprev, j, prev.skip, prev.ptr);
+
+            jprev = j;
+            prev = *pe;
+        }
+
+        if (jprev != ARRAY_SIZE(*n)) {
+            mtree_print_phys_entries(mon, f, jprev, j, prev.skip, prev.ptr);
+        }
+    }
+}
diff --git a/memory.c b/memory.c
index 385a507511..24938036c8 100644
--- a/memory.c
+++ b/memory.c
@@ -2939,16 +2939,28 @@ static void mtree_print_flatview(fprintf_function p, void *f,
     flatview_unref(view);
 }
 
-void mtree_info(fprintf_function mon_printf, void *f, bool flatview)
+void mtree_info(fprintf_function mon_printf, void *f, bool flatview,
+                bool dispatch_tree)
 {
     MemoryRegionListHead ml_head;
     MemoryRegionList *ml, *ml2;
     AddressSpace *as;
+    FlatView *view;
+    int n;
 
     if (flatview) {
-        QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
-            mon_printf(f, "address-space (flat view): %s\n", as->name);
-            mtree_print_flatview(mon_printf, f, as);
+        n = 0;
+        QTAILQ_FOREACH(view, &flat_views, flat_views_link) {
+            mon_printf(f, "FlatView #%d\n", n);
+            ++n;
+
+            QTAILQ_FOREACH(as, &view->address_spaces, flat_view_link) {
+                mon_printf(f, " AS \"%s\"\n", as->name);
+                mtree_print_flatview(mon_printf, f, as);
+            }
+            if (dispatch_tree) {
+                mtree_print_dispatch(mon_printf, f, view->dispatch);
+            }
             mon_printf(f, "\n");
         }
         return;
diff --git a/monitor.c b/monitor.c
index e0f880107f..191c4b016d 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1703,8 +1703,9 @@ static void hmp_boot_set(Monitor *mon, const QDict *qdict)
 static void hmp_info_mtree(Monitor *mon, const QDict *qdict)
 {
     bool flatview = qdict_get_try_bool(qdict, "flatview", false);
+    bool dispatch_tree = qdict_get_try_bool(qdict, "dispatch_tree", false);
 
-    mtree_info((fprintf_function)monitor_printf, mon, flatview);
+    mtree_info((fprintf_function)monitor_printf, mon, flatview, dispatch_tree);
 }
 
 static void hmp_info_numa(Monitor *mon, const QDict *qdict)
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index d9df238a5f..3d2edff940 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -249,9 +249,10 @@ ETEXI
 
     {
         .name       = "mtree",
-        .args_type  = "flatview:-f",
-        .params     = "[-f]",
-        .help       = "show memory tree (-f: dump flat view for address spaces)",
+        .args_type  = "flatview:-f,dispatch_tree:-d",
+        .params     = "[-f][-d]",
+        .help       = "show memory tree (-f: dump flat view for address spaces;"
+                      "-d: dump dispatch tree, valid with -f only)",
         .cmd        = hmp_info_mtree,
     },
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 1/4] memory: Postpone flatview and dispatch tree building till all devices are added
  2017-09-07  9:20 ` [Qemu-devel] [RFC PATCH qemu 1/4] memory: Postpone flatview and dispatch tree building till all devices are added Alexey Kardashevskiy
@ 2017-09-07  9:30   ` Peter Maydell
  2017-09-07 14:27     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Maydell @ 2017-09-07  9:30 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: QEMU Developers, David Gibson, Paolo Bonzini, Stefan Hajnoczi

On 7 September 2017 at 10:20, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> Most devices use at least one address space and every time a new address
> space is added, flat views and dispatch trees are rebuild for all address
> spaces. This is not a problem for a relatively small amount of devices but
> even 50 virtio-pci devices use more than 8GB of RAM.
>
> What happens that on every flatview/dispatch rebuild, new arrays are
> allocated and old ones release but the release is done via RCU so until
> an entire machine is build, they are not released.
>
> This wraps devices creation into memory_region_transaction_begin/commit
> to massively reduce amount of flat view/dispatch tree (re)allocations.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  vl.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/vl.c b/vl.c
> index 8e247cc2a2..3c39cc8b3a 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -4655,12 +4655,16 @@ int main(int argc, char **argv, char **envp)
>      igd_gfx_passthru();
>
>      /* init generic devices */
> +    memory_region_transaction_begin();
> +
>      rom_set_order_override(FW_CFG_ORDER_OVERRIDE_DEVICE);
>      if (qemu_opts_foreach(qemu_find_opts("device"),
>                            device_init_func, NULL, NULL)) {
>          exit(1);
>      }
>
> +    memory_region_transaction_commit();

What happens if something in a device realize function tries
to do a read from an AddressSpace? I can't think of any examples
that need to do that and I suspect it's probably a bug if anybody
tries it, but I'm curious whether this change alters the failure
mode...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 0/4] memory: Reduce memory use
  2017-09-07  9:20 [Qemu-devel] [RFC PATCH qemu 0/4] memory: Reduce memory use Alexey Kardashevskiy
                   ` (3 preceding siblings ...)
  2017-09-07  9:20 ` [Qemu-devel] [RFC PATCH qemu 4/4] memory: Add flat views to HMP "info mtree" Alexey Kardashevskiy
@ 2017-09-07  9:51 ` Dr. David Alan Gilbert
  2017-09-07 10:08   ` David Gibson
  2017-09-07 14:44   ` Alexey Kardashevskiy
  4 siblings, 2 replies; 28+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-07  9:51 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: qemu-devel, Paolo Bonzini, Stefan Hajnoczi, Peter Maydell, David Gibson

* Alexey Kardashevskiy (aik@ozlabs.ru) wrote:
> This was inspired by https://bugzilla.redhat.com/show_bug.cgi?id=1481593
> 
> What happens ithere is that every virtio block device creates 2 address
> spaces - for modern config space (called "virtio-pci-cfg-as") and
> for busmaster (common pci thing, called after the device name,
> in my case "virtio-blk-pci").
> 
> Each address_space_init() updates topology for every address space.
> Every topology update (address_space_update_topology()) creates a new
> dispatch tree - AddressSpaceDispatch with nodes (1KB) and
> sections (48KB) and destroys the old one.
> 
> However the dispatch destructor is postponed via RCU which does not
> get a chance to execute until the machine is initialized but before
> we get there, memory is not returned to the pool, and this is a lot
> of memory which grows n^2.
> 
> These patches are trying to address the memory use and boot time
> issues but tbh only the first one provides visible outcome.

Do you have a feel for how much memory is saved?

Dave

> There are still things to polish and double check the use of RCU,
> I'd like to get any feedback before proceeding - is this going
> the right way or way too ugly?
> 
> 
> This is based on sha1
> 1ab5eb4efb Peter Maydell "Update version for v2.10.0 release".
> 
> Please comment. Thanks.
> 
> 
> 
> Alexey Kardashevskiy (4):
>   memory: Postpone flatview and dispatch tree building till all devices
>     are added
>   memory: Prepare for shared flat views
>   memory: Share flat views and dispatch trees between address spaces
>   memory: Add flat views to HMP "info mtree"
> 
>  include/exec/memory-internal.h |   6 +-
>  include/exec/memory.h          |  93 +++++++++----
>  exec.c                         | 242 +++++++++++++++++++--------------
>  hw/alpha/typhoon.c             |   2 +-
>  hw/dma/rc4030.c                |   4 +-
>  hw/i386/amd_iommu.c            |   2 +-
>  hw/i386/intel_iommu.c          |   9 +-
>  hw/intc/openpic_kvm.c          |   2 +-
>  hw/pci-host/apb.c              |   2 +-
>  hw/pci/pci.c                   |   3 +-
>  hw/ppc/spapr_iommu.c           |   4 +-
>  hw/s390x/s390-pci-bus.c        |   2 +-
>  hw/vfio/common.c               |   6 +-
>  hw/virtio/vhost.c              |   6 +-
>  memory.c                       | 299 +++++++++++++++++++++++++++--------------
>  monitor.c                      |   3 +-
>  vl.c                           |   4 +
>  hmp-commands-info.hx           |   7 +-
>  18 files changed, 448 insertions(+), 248 deletions(-)
> 
> -- 
> 2.11.0
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 0/4] memory: Reduce memory use
  2017-09-07  9:51 ` [Qemu-devel] [RFC PATCH qemu 0/4] memory: Reduce memory use Dr. David Alan Gilbert
@ 2017-09-07 10:08   ` David Gibson
  2017-09-07 14:44   ` Alexey Kardashevskiy
  1 sibling, 0 replies; 28+ messages in thread
From: David Gibson @ 2017-09-07 10:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Alexey Kardashevskiy, qemu-devel, Paolo Bonzini, Stefan Hajnoczi,
	Peter Maydell

[-- Attachment #1: Type: text/plain, Size: 1929 bytes --]

On Thu, Sep 07, 2017 at 10:51:42AM +0100, Dr. David Alan Gilbert wrote:
> * Alexey Kardashevskiy (aik@ozlabs.ru) wrote:
> > This was inspired by https://bugzilla.redhat.com/show_bug.cgi?id=1481593
> > 
> > What happens ithere is that every virtio block device creates 2 address
> > spaces - for modern config space (called "virtio-pci-cfg-as") and
> > for busmaster (common pci thing, called after the device name,
> > in my case "virtio-blk-pci").
> > 
> > Each address_space_init() updates topology for every address space.
> > Every topology update (address_space_update_topology()) creates a new
> > dispatch tree - AddressSpaceDispatch with nodes (1KB) and
> > sections (48KB) and destroys the old one.
> > 
> > However the dispatch destructor is postponed via RCU which does not
> > get a chance to execute until the machine is initialized but before
> > we get there, memory is not returned to the pool, and this is a lot
> > of memory which grows n^2.
> > 
> > These patches are trying to address the memory use and boot time
> > issues but tbh only the first one provides visible outcome.
> 
> Do you have a feel for how much memory is saved?

I think that's a bit hard to answer.

As noted above there's O(n^2) (or more) space complexity here - one
which shouldn't be required by the data we actually have to track.
That means the amount of "excess" memory depends on how many devices
there are.

I haven't yet looked at these patches in detail, to know if it truly
fixes that O(n^2) or just pares the constant.  If it does fix the
O(n^2) then the amount is going to vary from "probably not enough to
worry about" in normal use cases to hundreds of gigabytes in cases
with many devices.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 1/4] memory: Postpone flatview and dispatch tree building till all devices are added
  2017-09-07  9:30   ` Peter Maydell
@ 2017-09-07 14:27     ` Alexey Kardashevskiy
  2017-09-07 14:30       ` Peter Maydell
  0 siblings, 1 reply; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-09-07 14:27 UTC (permalink / raw)
  To: Peter Maydell
  Cc: QEMU Developers, David Gibson, Paolo Bonzini, Stefan Hajnoczi

On 07/09/17 19:30, Peter Maydell wrote:
> On 7 September 2017 at 10:20, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>> Most devices use at least one address space and every time a new address
>> space is added, flat views and dispatch trees are rebuild for all address
>> spaces. This is not a problem for a relatively small amount of devices but
>> even 50 virtio-pci devices use more than 8GB of RAM.
>>
>> What happens that on every flatview/dispatch rebuild, new arrays are
>> allocated and old ones release but the release is done via RCU so until
>> an entire machine is build, they are not released.
>>
>> This wraps devices creation into memory_region_transaction_begin/commit
>> to massively reduce amount of flat view/dispatch tree (re)allocations.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  vl.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/vl.c b/vl.c
>> index 8e247cc2a2..3c39cc8b3a 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -4655,12 +4655,16 @@ int main(int argc, char **argv, char **envp)
>>      igd_gfx_passthru();
>>
>>      /* init generic devices */
>> +    memory_region_transaction_begin();
>> +
>>      rom_set_order_override(FW_CFG_ORDER_OVERRIDE_DEVICE);
>>      if (qemu_opts_foreach(qemu_find_opts("device"),
>>                            device_init_func, NULL, NULL)) {
>>          exit(1);
>>      }
>>
>> +    memory_region_transaction_commit();
> 
> What happens if something in a device realize function tries
> to do a read from an AddressSpace? 


address_space_memory is created before that loop but yes, address spaces
for devices are not rendered and QEMU crashes, needs fixing.

> I can't think of any examples
> that need to do that and I suspect it's probably a bug if anybody
> tries it, but I'm curious whether this change alters the failure
> mode...




-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 1/4] memory: Postpone flatview and dispatch tree building till all devices are added
  2017-09-07 14:27     ` Alexey Kardashevskiy
@ 2017-09-07 14:30       ` Peter Maydell
  2017-09-08  6:21         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Maydell @ 2017-09-07 14:30 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: QEMU Developers, David Gibson, Paolo Bonzini, Stefan Hajnoczi

On 7 September 2017 at 15:27, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> On 07/09/17 19:30, Peter Maydell wrote:
>> What happens if something in a device realize function tries
>> to do a read from an AddressSpace?
>
>
> address_space_memory is created before that loop but yes, address spaces
> for devices are not rendered and QEMU crashes, needs fixing.

I'd be OK if we defined this to be not permitted, as long as
we're reasonably happy that we don't currently do it anywhere
and the error if it does happen is fairly clear about what's
gone wrong.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 0/4] memory: Reduce memory use
  2017-09-07  9:51 ` [Qemu-devel] [RFC PATCH qemu 0/4] memory: Reduce memory use Dr. David Alan Gilbert
  2017-09-07 10:08   ` David Gibson
@ 2017-09-07 14:44   ` Alexey Kardashevskiy
  2017-09-07 14:54     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-09-07 14:44 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Paolo Bonzini, Stefan Hajnoczi, Peter Maydell, David Gibson

On 07/09/17 19:51, Dr. David Alan Gilbert wrote:
> * Alexey Kardashevskiy (aik@ozlabs.ru) wrote:
>> This was inspired by https://bugzilla.redhat.com/show_bug.cgi?id=1481593
>>
>> What happens ithere is that every virtio block device creates 2 address
>> spaces - for modern config space (called "virtio-pci-cfg-as") and
>> for busmaster (common pci thing, called after the device name,
>> in my case "virtio-blk-pci").
>>
>> Each address_space_init() updates topology for every address space.
>> Every topology update (address_space_update_topology()) creates a new
>> dispatch tree - AddressSpaceDispatch with nodes (1KB) and
>> sections (48KB) and destroys the old one.
>>
>> However the dispatch destructor is postponed via RCU which does not
>> get a chance to execute until the machine is initialized but before
>> we get there, memory is not returned to the pool, and this is a lot
>> of memory which grows n^2.
>>
>> These patches are trying to address the memory use and boot time
>> issues but tbh only the first one provides visible outcome.
> 
> Do you have a feel for how much memory is saved?


The 1/4 saves ~33GB (~44GB -> 11GB) for a 2GB guest and 400 virtio-pci
devices. These GB figures are the peak values (but it does not matter for
OOM killer), memory gets released in one go when RCU kicks in, it just
happens too late.

The 3/4 saves less, I'd say 50KB per VCPU (more if you count peaks but so
much). Strangely, I do not see the difference in valgrind output when I run
a guest with 1024 or just 8 CPUs, probably "massif" is not the right tool
to catch this.

> 
> Dave
> 
>> There are still things to polish and double check the use of RCU,
>> I'd like to get any feedback before proceeding - is this going
>> the right way or way too ugly?
>>
>>
>> This is based on sha1
>> 1ab5eb4efb Peter Maydell "Update version for v2.10.0 release".
>>
>> Please comment. Thanks.
>>
>>
>>
>> Alexey Kardashevskiy (4):
>>   memory: Postpone flatview and dispatch tree building till all devices
>>     are added
>>   memory: Prepare for shared flat views
>>   memory: Share flat views and dispatch trees between address spaces
>>   memory: Add flat views to HMP "info mtree"
>>
>>  include/exec/memory-internal.h |   6 +-
>>  include/exec/memory.h          |  93 +++++++++----
>>  exec.c                         | 242 +++++++++++++++++++--------------
>>  hw/alpha/typhoon.c             |   2 +-
>>  hw/dma/rc4030.c                |   4 +-
>>  hw/i386/amd_iommu.c            |   2 +-
>>  hw/i386/intel_iommu.c          |   9 +-
>>  hw/intc/openpic_kvm.c          |   2 +-
>>  hw/pci-host/apb.c              |   2 +-
>>  hw/pci/pci.c                   |   3 +-
>>  hw/ppc/spapr_iommu.c           |   4 +-
>>  hw/s390x/s390-pci-bus.c        |   2 +-
>>  hw/vfio/common.c               |   6 +-
>>  hw/virtio/vhost.c              |   6 +-
>>  memory.c                       | 299 +++++++++++++++++++++++++++--------------
>>  monitor.c                      |   3 +-
>>  vl.c                           |   4 +
>>  hmp-commands-info.hx           |   7 +-
>>  18 files changed, 448 insertions(+), 248 deletions(-)
>>
>> -- 
>> 2.11.0
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 0/4] memory: Reduce memory use
  2017-09-07 14:44   ` Alexey Kardashevskiy
@ 2017-09-07 14:54     ` Dr. David Alan Gilbert
  2017-09-08  2:08       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 28+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-07 14:54 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: qemu-devel, Paolo Bonzini, Stefan Hajnoczi, Peter Maydell, David Gibson

* Alexey Kardashevskiy (aik@ozlabs.ru) wrote:
> On 07/09/17 19:51, Dr. David Alan Gilbert wrote:
> > * Alexey Kardashevskiy (aik@ozlabs.ru) wrote:
> >> This was inspired by https://bugzilla.redhat.com/show_bug.cgi?id=1481593
> >>
> >> What happens ithere is that every virtio block device creates 2 address
> >> spaces - for modern config space (called "virtio-pci-cfg-as") and
> >> for busmaster (common pci thing, called after the device name,
> >> in my case "virtio-blk-pci").
> >>
> >> Each address_space_init() updates topology for every address space.
> >> Every topology update (address_space_update_topology()) creates a new
> >> dispatch tree - AddressSpaceDispatch with nodes (1KB) and
> >> sections (48KB) and destroys the old one.
> >>
> >> However the dispatch destructor is postponed via RCU which does not
> >> get a chance to execute until the machine is initialized but before
> >> we get there, memory is not returned to the pool, and this is a lot
> >> of memory which grows n^2.
> >>
> >> These patches are trying to address the memory use and boot time
> >> issues but tbh only the first one provides visible outcome.
> > 
> > Do you have a feel for how much memory is saved?
> 
> 
> The 1/4 saves ~33GB (~44GB -> 11GB) for a 2GB guest and 400 virtio-pci
> devices. These GB figures are the peak values (but it does not matter for
> OOM killer), memory gets released in one go when RCU kicks in, it just
> happens too late.

Nice saving!  Still, why is it using 11GB?
What's it like for more sane configurations, say 2-3 virtio devices - is
there anything noticable or is it just the huge setups?

Dave


> The 3/4 saves less, I'd say 50KB per VCPU (more if you count peaks but so
> much). Strangely, I do not see the difference in valgrind output when I run
> a guest with 1024 or just 8 CPUs, probably "massif" is not the right tool
> to catch this.
> 
> > 
> > Dave
> > 
> >> There are still things to polish and double check the use of RCU,
> >> I'd like to get any feedback before proceeding - is this going
> >> the right way or way too ugly?
> >>
> >>
> >> This is based on sha1
> >> 1ab5eb4efb Peter Maydell "Update version for v2.10.0 release".
> >>
> >> Please comment. Thanks.
> >>
> >>
> >>
> >> Alexey Kardashevskiy (4):
> >>   memory: Postpone flatview and dispatch tree building till all devices
> >>     are added
> >>   memory: Prepare for shared flat views
> >>   memory: Share flat views and dispatch trees between address spaces
> >>   memory: Add flat views to HMP "info mtree"
> >>
> >>  include/exec/memory-internal.h |   6 +-
> >>  include/exec/memory.h          |  93 +++++++++----
> >>  exec.c                         | 242 +++++++++++++++++++--------------
> >>  hw/alpha/typhoon.c             |   2 +-
> >>  hw/dma/rc4030.c                |   4 +-
> >>  hw/i386/amd_iommu.c            |   2 +-
> >>  hw/i386/intel_iommu.c          |   9 +-
> >>  hw/intc/openpic_kvm.c          |   2 +-
> >>  hw/pci-host/apb.c              |   2 +-
> >>  hw/pci/pci.c                   |   3 +-
> >>  hw/ppc/spapr_iommu.c           |   4 +-
> >>  hw/s390x/s390-pci-bus.c        |   2 +-
> >>  hw/vfio/common.c               |   6 +-
> >>  hw/virtio/vhost.c              |   6 +-
> >>  memory.c                       | 299 +++++++++++++++++++++++++++--------------
> >>  monitor.c                      |   3 +-
> >>  vl.c                           |   4 +
> >>  hmp-commands-info.hx           |   7 +-
> >>  18 files changed, 448 insertions(+), 248 deletions(-)
> >>
> >> -- 
> >> 2.11.0
> >>
> >>
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
> 
> -- 
> Alexey
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 3/4] memory: Share flat views and dispatch trees between address spaces
  2017-09-07  9:20 ` [Qemu-devel] [RFC PATCH qemu 3/4] memory: Share flat views and dispatch trees between address spaces Alexey Kardashevskiy
@ 2017-09-07 20:53   ` Philippe Mathieu-Daudé
  2017-09-07 22:18     ` Alexey Kardashevskiy
  2017-09-11  7:40   ` Paolo Bonzini
  1 sibling, 1 reply; 28+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-09-07 20:53 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Paolo Bonzini, Stefan Hajnoczi, Peter Maydell, David Gibson

On 09/07/2017 06:20 AM, Alexey Kardashevskiy wrote:
> This allows sharing flat views between address spaces when the same root
> memory region is used when creating a new address space.
> 
> This adds a global list of flat views and a list of attached address
> spaces per a flat view. Each address space references a flat view.
> 
> This hard codes the dispatch tree building instead of doing so via
> a memory listener.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> This was suggested by Paolo in
> https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg05011.html
> 
> I am not putting "Suggested-by" as I want to make sure that this is doing
> what was actually suggested.
> ---
>   include/exec/memory-internal.h |   6 +-
>   include/exec/memory.h          |   9 +-
>   exec.c                         |  58 ++--------
>   hw/pci/pci.c                   |   3 +-
>   memory.c                       | 253 +++++++++++++++++++++++++++--------------
>   5 files changed, 187 insertions(+), 142 deletions(-)
> 
> diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
> index fb467acdba..8516e0b48f 100644
> --- a/include/exec/memory-internal.h
> +++ b/include/exec/memory-internal.h
> @@ -22,9 +22,11 @@
>   #ifndef CONFIG_USER_ONLY
>   typedef struct AddressSpaceDispatch AddressSpaceDispatch;
>   
> -void address_space_init_dispatch(AddressSpace *as);
>   void address_space_unregister(AddressSpace *as);
> -void address_space_destroy_dispatch(AddressSpace *as);
> +void address_space_dispatch_free(AddressSpaceDispatch *d);
> +AddressSpaceDispatch *mem_begin(void);
> +void mem_commit(AddressSpaceDispatch *d);
> +void mem_add(AddressSpaceDispatch *d, MemoryRegionSection *section);
>   
>   extern const MemoryRegionOps unassigned_mem_ops;
>   
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 83e82e90d5..41ab165302 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -27,6 +27,7 @@
>   #include "qemu/rcu.h"
>   #include "hw/qdev-core.h"
>   
> +typedef struct AddressSpaceDispatch AddressSpaceDispatch;
>   #define RAM_ADDR_INVALID (~(ram_addr_t)0)
>   
>   #define MAX_PHYS_ADDR_SPACE_BITS 62
> @@ -312,6 +313,7 @@ struct MemoryListener {
>   };
>   
>   AddressSpaceDispatch *address_space_to_dispatch(AddressSpace *as);
> +MemoryRegion *address_space_root(AddressSpace *as);
>   
>   /**
>    * AddressSpace: describes a mapping of addresses to #MemoryRegion objects
> @@ -320,20 +322,17 @@ struct AddressSpace {
>       /* All fields are private. */
>       struct rcu_head rcu;
>       char *name;
> -    MemoryRegion *root;
> -    int ref_count;
> -    bool malloced;
>   
>       /* Accessed via RCU.  */
>       struct FlatView *current_map;
>   
>       int ioeventfd_nb;
>       struct MemoryRegionIoeventfd *ioeventfds;
> -    struct AddressSpaceDispatch *dispatch;
> -    struct AddressSpaceDispatch *next_dispatch;
> +
>       MemoryListener dispatch_listener;
>       QTAILQ_HEAD(memory_listeners_as, MemoryListener) listeners;
>       QTAILQ_ENTRY(AddressSpace) address_spaces_link;
> +    QTAILQ_ENTRY(AddressSpace) flat_view_link;
>   };
>   
>   /**
> diff --git a/exec.c b/exec.c
> index 66f01f5fce..51243f57f4 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -188,15 +188,12 @@ typedef struct PhysPageMap {
>   } PhysPageMap;
>   
>   struct AddressSpaceDispatch {
> -    struct rcu_head rcu;
> -
>       MemoryRegionSection *mru_section;
>       /* This is a multi-level map on the physical address space.
>        * The bottom level has pointers to MemoryRegionSections.
>        */
>       PhysPageEntry phys_map;
>       PhysPageMap map;
> -    AddressSpace *as;
>   };
>   
>   typedef struct AddressSpaceDispatch AddressSpaceDispatch;
> @@ -240,11 +237,6 @@ struct DirtyBitmapSnapshot {
>       unsigned long dirty[];
>   };
>   
> -AddressSpaceDispatch *address_space_to_dispatch(AddressSpace *as)
> -{
> -    return atomic_rcu_read(&as->dispatch);
> -}
> -
>   #endif
>   
>   #if !defined(CONFIG_USER_ONLY)
> @@ -1354,10 +1346,8 @@ static void register_multipage(AddressSpaceDispatch *d,
>       phys_page_set(d, start_addr >> TARGET_PAGE_BITS, num_pages, section_index);
>   }
>   
> -static void mem_add(MemoryListener *listener, MemoryRegionSection *section)
> +void mem_add(AddressSpaceDispatch *d, MemoryRegionSection *section)
>   {
> -    AddressSpace *as = container_of(listener, AddressSpace, dispatch_listener);
> -    AddressSpaceDispatch *d = as->next_dispatch;
>       MemoryRegionSection now = *section, remain = *section;
>       Int128 page_size = int128_make64(TARGET_PAGE_SIZE);
>   
> @@ -2680,9 +2670,8 @@ static void io_mem_init(void)
>                             NULL, UINT64_MAX);
>   }
>   
> -static void mem_begin(MemoryListener *listener)
> +AddressSpaceDispatch *mem_begin(void)
>   {
> -    AddressSpace *as = container_of(listener, AddressSpace, dispatch_listener);
>       AddressSpaceDispatch *d = g_new0(AddressSpaceDispatch, 1);
>       uint16_t n;
>   
> @@ -2696,27 +2685,19 @@ static void mem_begin(MemoryListener *listener)
>       assert(n == PHYS_SECTION_WATCH);
>   
>       d->phys_map  = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .skip = 1 };
> -    as->next_dispatch = d;
> +
> +    return d;
>   }
>   
> -static void address_space_dispatch_free(AddressSpaceDispatch *d)
> +void address_space_dispatch_free(AddressSpaceDispatch *d)
>   {
>       phys_sections_free(&d->map);
>       g_free(d);
>   }
>   
> -static void mem_commit(MemoryListener *listener)
> +void mem_commit(AddressSpaceDispatch *d)
>   {
> -    AddressSpace *as = container_of(listener, AddressSpace, dispatch_listener);
> -    AddressSpaceDispatch *cur = as->dispatch;
> -    AddressSpaceDispatch *next = as->next_dispatch;
> -
> -    phys_page_compact_all(next, next->map.nodes_nb);
> -
> -    atomic_rcu_set(&as->dispatch, next);
> -    if (cur) {
> -        call_rcu(cur, address_space_dispatch_free, rcu);
> -    }
> +    phys_page_compact_all(d, d->map.nodes_nb);
>   }
>   
>   static void tcg_commit(MemoryListener *listener)
> @@ -2732,39 +2713,16 @@ static void tcg_commit(MemoryListener *listener)
>        * We reload the dispatch pointer now because cpu_reloading_memory_map()
>        * may have split the RCU critical section.
>        */
> -    d = atomic_rcu_read(&cpuas->as->dispatch);
> +    d = address_space_to_dispatch(cpuas->as);
>       atomic_rcu_set(&cpuas->memory_dispatch, d);
>       tlb_flush(cpuas->cpu);
>   }
>   
> -void address_space_init_dispatch(AddressSpace *as)
> -{
> -    as->dispatch = NULL;
> -    as->dispatch_listener = (MemoryListener) {
> -        .begin = mem_begin,
> -        .commit = mem_commit,
> -        .region_add = mem_add,
> -        .region_nop = mem_add,
> -        .priority = 0,
> -    };
> -    memory_listener_register(&as->dispatch_listener, as);
> -}
> -
>   void address_space_unregister(AddressSpace *as)
>   {
>       memory_listener_unregister(&as->dispatch_listener);
>   }
>   
> -void address_space_destroy_dispatch(AddressSpace *as)
> -{
> -    AddressSpaceDispatch *d = as->dispatch;
> -
> -    atomic_rcu_set(&as->dispatch, NULL);
> -    if (d) {
> -        call_rcu(d, address_space_dispatch_free, rcu);
> -    }
> -}
> -
>   static void memory_map_init(void)
>   {
>       system_memory = g_malloc(sizeof(*system_memory));
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 258fbe51e2..86b9e419fd 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -88,7 +88,8 @@ static void pci_init_bus_master(PCIDevice *pci_dev)
>   
>       memory_region_init_alias(&pci_dev->bus_master_enable_region,
>                                OBJECT(pci_dev), "bus master",
> -                             dma_as->root, 0, memory_region_size(dma_as->root));
> +                             address_space_root(dma_as), 0,
> +                             memory_region_size(address_space_root(dma_as)));
>       memory_region_set_enabled(&pci_dev->bus_master_enable_region, false);
>       memory_region_add_subregion(&pci_dev->bus_master_container_region, 0,
>                                   &pci_dev->bus_master_enable_region);
> diff --git a/memory.c b/memory.c
> index c6904a7deb..385a507511 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -47,6 +47,9 @@ static QTAILQ_HEAD(memory_listeners, MemoryListener) memory_listeners
>   static QTAILQ_HEAD(, AddressSpace) address_spaces
>       = QTAILQ_HEAD_INITIALIZER(address_spaces);
>   
> +static QTAILQ_HEAD(FlatViewList, FlatView) flat_views
> +    = QTAILQ_HEAD_INITIALIZER(flat_views);
> +
>   typedef struct AddrRange AddrRange;
>   
>   /*
> @@ -230,6 +233,11 @@ struct FlatView {
>       FlatRange *ranges;
>       unsigned nr;
>       unsigned nr_allocated;
> +    MemoryRegion *root;
> +    struct AddressSpaceDispatch *dispatch;
> +
> +    QTAILQ_ENTRY(FlatView) flat_views_link;
> +    QTAILQ_HEAD(address_spaces, AddressSpace) address_spaces;
>   };
>   
>   typedef struct AddressSpaceOps AddressSpaceOps;
> @@ -259,12 +267,19 @@ static bool flatrange_equal(FlatRange *a, FlatRange *b)
>           && a->readonly == b->readonly;
>   }
>   
> -static void flatview_init(FlatView *view)
> +static void flatview_ref(FlatView *view);
> +
> +static FlatView *flatview_alloc(MemoryRegion *mr_root)
>   {
> +    FlatView *view;
> +
> +    view = g_new0(FlatView, 1);
>       view->ref = 1;
> -    view->ranges = NULL;
> -    view->nr = 0;
> -    view->nr_allocated = 0;
> +    view->root = mr_root;
> +    memory_region_ref(mr_root);
> +    QTAILQ_INIT(&view->address_spaces);
> +
> +    return view;
>   }
>   
>   /* Insert a range into a given position.  Caller is responsible for maintaining
> @@ -292,6 +307,10 @@ static void flatview_destroy(FlatView *view)
>           memory_region_unref(view->ranges[i].mr);
>       }
>       g_free(view->ranges);
> +    if (view->dispatch) {
> +        address_space_dispatch_free(view->dispatch);
> +    }
> +    memory_region_unref(view->root);
>       g_free(view);
>   }
>   
> @@ -303,7 +322,7 @@ static void flatview_ref(FlatView *view)
>   static void flatview_unref(FlatView *view)
>   {
>       if (atomic_fetch_dec(&view->ref) == 1) {
> -        flatview_destroy(view);
> +        call_rcu(view, flatview_destroy, rcu);
>       }
>   }
>   
> @@ -608,7 +627,7 @@ static AddressSpace *memory_region_to_address_space(MemoryRegion *mr)
>           mr = mr->container;
>       }
>       QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
> -        if (mr == as->root) {
> +        if (mr == as->current_map->root) {
>               return as;
>           }
>       }
> @@ -702,23 +721,6 @@ static void render_memory_region(FlatView *view,
>       }
>   }
>   
> -/* Render a memory topology into a list of disjoint absolute ranges. */
> -static FlatView *generate_memory_topology(MemoryRegion *mr)
> -{
> -    FlatView *view;
> -
> -    view = g_new(FlatView, 1);
> -    flatview_init(view);
> -
> -    if (mr) {
> -        render_memory_region(view, mr, int128_zero(),
> -                             addrrange_make(int128_zero(), int128_2_64()), false);
> -    }
> -    flatview_simplify(view);
> -
> -    return view;
> -}
> -
>   static void address_space_add_del_ioeventfds(AddressSpace *as,
>                                                MemoryRegionIoeventfd *fds_new,
>                                                unsigned fds_new_nb,
> @@ -769,12 +771,10 @@ static void address_space_add_del_ioeventfds(AddressSpace *as,
>       }
>   }
>   
> -static FlatView *address_space_get_flatview(AddressSpace *as)
> +static FlatView *flatview_get(FlatView *view)
>   {
> -    FlatView *view;
> -
>       rcu_read_lock();
> -    view = atomic_rcu_read(&as->current_map);
> +    view = atomic_rcu_read(&view);
>       flatview_ref(view);
>       rcu_read_unlock();
>       return view;
> @@ -789,7 +789,7 @@ static void address_space_update_ioeventfds(AddressSpace *as)
>       AddrRange tmp;
>       unsigned i;
>   
> -    view = address_space_get_flatview(as);
> +    view = flatview_get(as->current_map);
>       FOR_EACH_FLAT_RANGE(fr, view) {
>           for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
>               tmp = addrrange_shift(fr->mr->ioeventfds[i].addr,
> @@ -881,28 +881,89 @@ static void address_space_update_topology_pass(AddressSpace *as,
>       }
>   }
>   
> -
> -static void address_space_update_topology(AddressSpace *as)
> +static FlatView *address_space_update_flatview(FlatView *view)
>   {
> -    FlatView *old_view = address_space_get_flatview(as);
> -    FlatView *new_view = generate_memory_topology(as->root);
> +    AddressSpace *as, *asnext;
> +    FlatView *old_view = flatview_get(view);
> +    MemoryRegion *root = old_view->root;
> +    FlatView *new_view = flatview_alloc(root);
> +    unsigned iold, inew;
> +    FlatRange *frold, *frnew;
>   
> -    address_space_update_topology_pass(as, old_view, new_view, false);
> -    address_space_update_topology_pass(as, old_view, new_view, true);
> +    if (root) {
> +        render_memory_region(new_view, root, int128_zero(),
> +                             addrrange_make(int128_zero(), int128_2_64()),
> +                             false);
> +        flatview_simplify(new_view);
> +    }
>   
> -    /* Writes are protected by the BQL.  */
> -    atomic_rcu_set(&as->current_map, new_view);
> -    call_rcu(old_view, flatview_unref, rcu);
> +    new_view->dispatch = mem_begin();
>   
> -    /* Note that all the old MemoryRegions are still alive up to this
> -     * point.  This relieves most MemoryListeners from the need to
> -     * ref/unref the MemoryRegions they get---unless they use them
> -     * outside the iothread mutex, in which case precise reference
> -     * counting is necessary.
> +    /*
> +     * FIXME: this is cut-n-paste from address_space_update_topology_pass,
> +     * simplify it
>        */
> +    iold = inew = 0;
> +    while (iold < old_view->nr || inew < new_view->nr) {
> +        if (iold < old_view->nr) {
> +            frold = &old_view->ranges[iold];
> +        } else {
> +            frold = NULL;
> +        }
> +        if (inew < new_view->nr) {
> +            frnew = &new_view->ranges[inew];
> +        } else {
> +            frnew = NULL;
> +        }
> +
> +        if (frold
> +            && (!frnew
> +                || int128_lt(frold->addr.start, frnew->addr.start)
> +                || (int128_eq(frold->addr.start, frnew->addr.start)
> +                    && !flatrange_equal(frold, frnew)))) {
> +            ++iold;
> +        } else if (frold && frnew && flatrange_equal(frold, frnew)) {
> +            /* In both and unchanged (except logging may have changed) */
> +            MemoryRegionSection mrs = section_from_flat_range(frnew,
> +                    new_view->dispatch);
> +
> +            mem_add(new_view->dispatch, &mrs);
> +
> +            ++iold;
> +            ++inew;
> +        } else {
> +            /* In new */
> +            MemoryRegionSection mrs = section_from_flat_range(frnew,
> +                    new_view->dispatch);
> +
> +            mem_add(new_view->dispatch, &mrs);
> +
> +            ++inew;
> +        }
> +    }
> +
> +    mem_commit(new_view->dispatch);
> +
> +    QTAILQ_FOREACH(as, &old_view->address_spaces, flat_view_link) {
> +        address_space_update_topology_pass(as, old_view, new_view, false);
> +        address_space_update_topology_pass(as, old_view, new_view, true);
> +    }
> +
> +    QTAILQ_FOREACH_SAFE(as, &old_view->address_spaces, flat_view_link, asnext) {
> +        QTAILQ_REMOVE(&old_view->address_spaces, as, flat_view_link);
> +        flatview_unref(old_view);
> +
> +        atomic_rcu_set(&as->current_map, new_view);
> +
> +        flatview_ref(new_view);
> +        QTAILQ_INSERT_TAIL(&new_view->address_spaces, as, flat_view_link);
> +
> +        address_space_update_ioeventfds(as);
> +    }
> +
>       flatview_unref(old_view);
>   
> -    address_space_update_ioeventfds(as);
> +    return new_view;
>   }
>   
>   void memory_region_transaction_begin(void)
> @@ -921,11 +982,31 @@ void memory_region_transaction_commit(void)
>       --memory_region_transaction_depth;
>       if (!memory_region_transaction_depth) {
>           if (memory_region_update_pending) {
> +            FlatView *view, *vnext;
> +            struct FlatViewList fwstmp = QTAILQ_HEAD_INITIALIZER(fwstmp);
> +
>               MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
>   
> -            QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
> -                address_space_update_topology(as);
> +            QTAILQ_FOREACH_SAFE(view, &flat_views, flat_views_link, vnext) {
> +                FlatView *old_view, *new_view;
> +
> +                old_view = flatview_get(view);
> +                new_view = address_space_update_flatview(old_view);
> +
> +                QTAILQ_REMOVE(&flat_views, old_view, flat_views_link);
> +                flatview_unref(old_view);
> +                flatview_unref(old_view);
> +
> +                QTAILQ_INSERT_HEAD(&fwstmp, new_view, flat_views_link);
> +
> +                flatview_unref(new_view);
>               }
> +
> +            QTAILQ_FOREACH_SAFE(view, &fwstmp, flat_views_link, vnext) {
> +                QTAILQ_REMOVE(&fwstmp, view, flat_views_link);
> +                QTAILQ_INSERT_HEAD(&flat_views, view, flat_views_link);
> +            }
> +
>               memory_region_update_pending = false;
>               MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
>           } else if (ioeventfd_update_pending) {
> @@ -1834,7 +1915,7 @@ void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
>               continue;
>           }
>           as = listener->address_space;
> -        view = address_space_get_flatview(as);
> +        view = flatview_get(as->current_map);
>           FOR_EACH_FLAT_RANGE(fr, view) {
>               if (fr->mr == mr) {
>                   MemoryRegionSection mrs = section_from_flat_range(fr,
> @@ -1938,7 +2019,7 @@ static void memory_region_update_coalesced_range_as(MemoryRegion *mr, AddressSpa
>       AddrRange tmp;
>       MemoryRegionSection section;
>   
> -    view = address_space_get_flatview(as);
> +    view = flatview_get(as->current_map);
>       FOR_EACH_FLAT_RANGE(fr, view) {
>           if (fr->mr == mr) {
>               section = (MemoryRegionSection) {
> @@ -2350,7 +2431,7 @@ void memory_global_dirty_log_sync(void)
>               continue;
>           }
>           as = listener->address_space;
> -        view = address_space_get_flatview(as);
> +        view = flatview_get(as->current_map);
>           FOR_EACH_FLAT_RANGE(fr, view) {
>               if (fr->dirty_log_mask) {
>                   MemoryRegionSection mrs = section_from_flat_range(fr,
> @@ -2436,7 +2517,7 @@ static void listener_add_address_space(MemoryListener *listener,
>           }
>       }
>   
> -    view = address_space_get_flatview(as);
> +    view = flatview_get(as->current_map);
>       FOR_EACH_FLAT_RANGE(fr, view) {
>           MemoryRegionSection section = {
>               .mr = fr->mr,
> @@ -2615,67 +2696,72 @@ void memory_region_invalidate_mmio_ptr(MemoryRegion *mr, hwaddr offset,
>   
>   void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
>   {
> -    memory_region_ref(root);
> -    memory_region_transaction_begin();
> -    as->ref_count = 1;
> -    as->root = root;
> -    as->malloced = false;
> -    as->current_map = g_new(FlatView, 1);
> -    flatview_init(as->current_map);
> +    FlatView *view;
> +
> +    as->current_map = NULL;
>       as->ioeventfd_nb = 0;
>       as->ioeventfds = NULL;
>       QTAILQ_INIT(&as->listeners);
>       QTAILQ_INSERT_TAIL(&address_spaces, as, address_spaces_link);
>       as->name = g_strdup(name ? name : "anonymous");
> -    address_space_init_dispatch(as);
> -    memory_region_update_pending |= root->enabled;
> -    memory_region_transaction_commit();
> +
> +    QTAILQ_FOREACH(view, &flat_views, flat_views_link) {
> +        assert(root);
> +        if (view->root == root) {
> +            as->current_map = flatview_get(view);
> +            break;
> +        }
> +    }
> +
> +    if (!as->current_map) {
> +        as->current_map = flatview_alloc(root);
> +
> +        QTAILQ_INSERT_TAIL(&flat_views, as->current_map, flat_views_link);
> +    }
> +
> +    QTAILQ_INSERT_TAIL(&as->current_map->address_spaces, as, flat_view_link);
> +}
> +
> +MemoryRegion *address_space_root(AddressSpace *as)
> +{
> +    return as->current_map->root;
> +}
> +
> +AddressSpaceDispatch *address_space_to_dispatch(AddressSpace *as)
> +{
> +    return atomic_rcu_read(&as->current_map)->dispatch;
>   }
>   
>   static void do_address_space_destroy(AddressSpace *as)
>   {
> -    bool do_free = as->malloced;
> +    FlatView *view = flatview_get(as->current_map);
>   
> -    address_space_destroy_dispatch(as);
>       assert(QTAILQ_EMPTY(&as->listeners));
>   
> -    flatview_unref(as->current_map);
> +    QTAILQ_REMOVE(&view->address_spaces, as, flat_view_link);
> +    flatview_unref(view);
> +
> +    flatview_unref(view);

incorrect copy/paste?

> +
>       g_free(as->name);
>       g_free(as->ioeventfds);
> -    memory_region_unref(as->root);
> -    if (do_free) {
> -        g_free(as);
> -    }
> +    g_free(as);
>   }
>   
>   AddressSpace *address_space_init_shareable(MemoryRegion *root, const char *name)
>   {
>       AddressSpace *as;
>   
> -    QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
> -        if (root == as->root && as->malloced) {
> -            as->ref_count++;
> -            return as;
> -        }
> -    }
> -
>       as = g_malloc0(sizeof *as);
>       address_space_init(as, root, name);
> -    as->malloced = true;
> +
>       return as;
>   }
>   
>   void address_space_destroy(AddressSpace *as)
>   {
> -    MemoryRegion *root = as->root;
> -
> -    as->ref_count--;
> -    if (as->ref_count) {
> -        return;
> -    }
>       /* Flush out anything from MemoryListeners listening in on this */
>       memory_region_transaction_begin();
> -    as->root = NULL;
>       memory_region_transaction_commit();
>       QTAILQ_REMOVE(&address_spaces, as, address_spaces_link);
>       address_space_unregister(as);
> @@ -2684,7 +2770,6 @@ void address_space_destroy(AddressSpace *as)
>        * entries that the guest should never use.  Wait for the old
>        * values to expire before freeing the data.
>        */
> -    as->root = root;
>       call_rcu(as, do_address_space_destroy, rcu);
>   }
>   
> @@ -2816,7 +2901,7 @@ static void mtree_print_mr(fprintf_function mon_printf, void *f,
>   static void mtree_print_flatview(fprintf_function p, void *f,
>                                    AddressSpace *as)
>   {
> -    FlatView *view = address_space_get_flatview(as);
> +    FlatView *view = flatview_get(as->current_map);
>       FlatRange *range = &view->ranges[0];
>       MemoryRegion *mr;
>       int n = view->nr;
> @@ -2873,7 +2958,7 @@ void mtree_info(fprintf_function mon_printf, void *f, bool flatview)
>   
>       QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
>           mon_printf(f, "address-space: %s\n", as->name);
> -        mtree_print_mr(mon_printf, f, as->root, 1, 0, &ml_head);
> +        mtree_print_mr(mon_printf, f, as->current_map->root, 1, 0, &ml_head);
>           mon_printf(f, "\n");
>       }
>   
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 3/4] memory: Share flat views and dispatch trees between address spaces
  2017-09-07 20:53   ` Philippe Mathieu-Daudé
@ 2017-09-07 22:18     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-09-07 22:18 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Paolo Bonzini, Stefan Hajnoczi, Peter Maydell, David Gibson

On 08/09/17 06:53, Philippe Mathieu-Daudé wrote:
> On 09/07/2017 06:20 AM, Alexey Kardashevskiy wrote:
>> This allows sharing flat views between address spaces when the same root
>> memory region is used when creating a new address space.
>>
>> This adds a global list of flat views and a list of attached address
>> spaces per a flat view. Each address space references a flat view.
>>
>> This hard codes the dispatch tree building instead of doing so via
>> a memory listener.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>
>> This was suggested by Paolo in
>> https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg05011.html
>>
>> I am not putting "Suggested-by" as I want to make sure that this is doing
>> what was actually suggested.
>> ---
>>   include/exec/memory-internal.h |   6 +-
>>   include/exec/memory.h          |   9 +-
>>   exec.c                         |  58 ++--------
>>   hw/pci/pci.c                   |   3 +-
>>   memory.c                       | 253
>> +++++++++++++++++++++++++++--------------
>>   5 files changed, 187 insertions(+), 142 deletions(-)
>>
>> diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
>> index fb467acdba..8516e0b48f 100644
>> --- a/include/exec/memory-internal.h
>> +++ b/include/exec/memory-internal.h
>> @@ -22,9 +22,11 @@
>>   #ifndef CONFIG_USER_ONLY
>>   typedef struct AddressSpaceDispatch AddressSpaceDispatch;
>>   -void address_space_init_dispatch(AddressSpace *as);
>>   void address_space_unregister(AddressSpace *as);
>> -void address_space_destroy_dispatch(AddressSpace *as);
>> +void address_space_dispatch_free(AddressSpaceDispatch *d);
>> +AddressSpaceDispatch *mem_begin(void);
>> +void mem_commit(AddressSpaceDispatch *d);
>> +void mem_add(AddressSpaceDispatch *d, MemoryRegionSection *section);
>>     extern const MemoryRegionOps unassigned_mem_ops;
>>   diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 83e82e90d5..41ab165302 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -27,6 +27,7 @@
>>   #include "qemu/rcu.h"
>>   #include "hw/qdev-core.h"
>>   +typedef struct AddressSpaceDispatch AddressSpaceDispatch;
>>   #define RAM_ADDR_INVALID (~(ram_addr_t)0)
>>     #define MAX_PHYS_ADDR_SPACE_BITS 62
>> @@ -312,6 +313,7 @@ struct MemoryListener {
>>   };
>>     AddressSpaceDispatch *address_space_to_dispatch(AddressSpace *as);
>> +MemoryRegion *address_space_root(AddressSpace *as);
>>     /**
>>    * AddressSpace: describes a mapping of addresses to #MemoryRegion objects
>> @@ -320,20 +322,17 @@ struct AddressSpace {
>>       /* All fields are private. */
>>       struct rcu_head rcu;
>>       char *name;
>> -    MemoryRegion *root;
>> -    int ref_count;
>> -    bool malloced;
>>         /* Accessed via RCU.  */
>>       struct FlatView *current_map;
>>         int ioeventfd_nb;
>>       struct MemoryRegionIoeventfd *ioeventfds;
>> -    struct AddressSpaceDispatch *dispatch;
>> -    struct AddressSpaceDispatch *next_dispatch;
>> +
>>       MemoryListener dispatch_listener;
>>       QTAILQ_HEAD(memory_listeners_as, MemoryListener) listeners;
>>       QTAILQ_ENTRY(AddressSpace) address_spaces_link;
>> +    QTAILQ_ENTRY(AddressSpace) flat_view_link;
>>   };
>>     /**
>> diff --git a/exec.c b/exec.c
>> index 66f01f5fce..51243f57f4 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -188,15 +188,12 @@ typedef struct PhysPageMap {
>>   } PhysPageMap;
>>     struct AddressSpaceDispatch {
>> -    struct rcu_head rcu;
>> -
>>       MemoryRegionSection *mru_section;
>>       /* This is a multi-level map on the physical address space.
>>        * The bottom level has pointers to MemoryRegionSections.
>>        */
>>       PhysPageEntry phys_map;
>>       PhysPageMap map;
>> -    AddressSpace *as;
>>   };
>>     typedef struct AddressSpaceDispatch AddressSpaceDispatch;
>> @@ -240,11 +237,6 @@ struct DirtyBitmapSnapshot {
>>       unsigned long dirty[];
>>   };
>>   -AddressSpaceDispatch *address_space_to_dispatch(AddressSpace *as)
>> -{
>> -    return atomic_rcu_read(&as->dispatch);
>> -}
>> -
>>   #endif
>>     #if !defined(CONFIG_USER_ONLY)
>> @@ -1354,10 +1346,8 @@ static void
>> register_multipage(AddressSpaceDispatch *d,
>>       phys_page_set(d, start_addr >> TARGET_PAGE_BITS, num_pages,
>> section_index);
>>   }
>>   -static void mem_add(MemoryListener *listener, MemoryRegionSection
>> *section)
>> +void mem_add(AddressSpaceDispatch *d, MemoryRegionSection *section)
>>   {
>> -    AddressSpace *as = container_of(listener, AddressSpace,
>> dispatch_listener);
>> -    AddressSpaceDispatch *d = as->next_dispatch;
>>       MemoryRegionSection now = *section, remain = *section;
>>       Int128 page_size = int128_make64(TARGET_PAGE_SIZE);
>>   @@ -2680,9 +2670,8 @@ static void io_mem_init(void)
>>                             NULL, UINT64_MAX);
>>   }
>>   -static void mem_begin(MemoryListener *listener)
>> +AddressSpaceDispatch *mem_begin(void)
>>   {
>> -    AddressSpace *as = container_of(listener, AddressSpace,
>> dispatch_listener);
>>       AddressSpaceDispatch *d = g_new0(AddressSpaceDispatch, 1);
>>       uint16_t n;
>>   @@ -2696,27 +2685,19 @@ static void mem_begin(MemoryListener *listener)
>>       assert(n == PHYS_SECTION_WATCH);
>>         d->phys_map  = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .skip
>> = 1 };
>> -    as->next_dispatch = d;
>> +
>> +    return d;
>>   }
>>   -static void address_space_dispatch_free(AddressSpaceDispatch *d)
>> +void address_space_dispatch_free(AddressSpaceDispatch *d)
>>   {
>>       phys_sections_free(&d->map);
>>       g_free(d);
>>   }
>>   -static void mem_commit(MemoryListener *listener)
>> +void mem_commit(AddressSpaceDispatch *d)
>>   {
>> -    AddressSpace *as = container_of(listener, AddressSpace,
>> dispatch_listener);
>> -    AddressSpaceDispatch *cur = as->dispatch;
>> -    AddressSpaceDispatch *next = as->next_dispatch;
>> -
>> -    phys_page_compact_all(next, next->map.nodes_nb);
>> -
>> -    atomic_rcu_set(&as->dispatch, next);
>> -    if (cur) {
>> -        call_rcu(cur, address_space_dispatch_free, rcu);
>> -    }
>> +    phys_page_compact_all(d, d->map.nodes_nb);
>>   }
>>     static void tcg_commit(MemoryListener *listener)
>> @@ -2732,39 +2713,16 @@ static void tcg_commit(MemoryListener *listener)
>>        * We reload the dispatch pointer now because
>> cpu_reloading_memory_map()
>>        * may have split the RCU critical section.
>>        */
>> -    d = atomic_rcu_read(&cpuas->as->dispatch);
>> +    d = address_space_to_dispatch(cpuas->as);
>>       atomic_rcu_set(&cpuas->memory_dispatch, d);
>>       tlb_flush(cpuas->cpu);
>>   }
>>   -void address_space_init_dispatch(AddressSpace *as)
>> -{
>> -    as->dispatch = NULL;
>> -    as->dispatch_listener = (MemoryListener) {
>> -        .begin = mem_begin,
>> -        .commit = mem_commit,
>> -        .region_add = mem_add,
>> -        .region_nop = mem_add,
>> -        .priority = 0,
>> -    };
>> -    memory_listener_register(&as->dispatch_listener, as);
>> -}
>> -
>>   void address_space_unregister(AddressSpace *as)
>>   {
>>       memory_listener_unregister(&as->dispatch_listener);
>>   }
>>   -void address_space_destroy_dispatch(AddressSpace *as)
>> -{
>> -    AddressSpaceDispatch *d = as->dispatch;
>> -
>> -    atomic_rcu_set(&as->dispatch, NULL);
>> -    if (d) {
>> -        call_rcu(d, address_space_dispatch_free, rcu);
>> -    }
>> -}
>> -
>>   static void memory_map_init(void)
>>   {
>>       system_memory = g_malloc(sizeof(*system_memory));
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index 258fbe51e2..86b9e419fd 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -88,7 +88,8 @@ static void pci_init_bus_master(PCIDevice *pci_dev)
>>         memory_region_init_alias(&pci_dev->bus_master_enable_region,
>>                                OBJECT(pci_dev), "bus master",
>> -                             dma_as->root, 0,
>> memory_region_size(dma_as->root));
>> +                             address_space_root(dma_as), 0,
>> +                            
>> memory_region_size(address_space_root(dma_as)));
>>       memory_region_set_enabled(&pci_dev->bus_master_enable_region, false);
>>       memory_region_add_subregion(&pci_dev->bus_master_container_region, 0,
>>                                   &pci_dev->bus_master_enable_region);
>> diff --git a/memory.c b/memory.c
>> index c6904a7deb..385a507511 100644
>> --- a/memory.c
>> +++ b/memory.c
>> @@ -47,6 +47,9 @@ static QTAILQ_HEAD(memory_listeners, MemoryListener)
>> memory_listeners
>>   static QTAILQ_HEAD(, AddressSpace) address_spaces
>>       = QTAILQ_HEAD_INITIALIZER(address_spaces);
>>   +static QTAILQ_HEAD(FlatViewList, FlatView) flat_views
>> +    = QTAILQ_HEAD_INITIALIZER(flat_views);
>> +
>>   typedef struct AddrRange AddrRange;
>>     /*
>> @@ -230,6 +233,11 @@ struct FlatView {
>>       FlatRange *ranges;
>>       unsigned nr;
>>       unsigned nr_allocated;
>> +    MemoryRegion *root;
>> +    struct AddressSpaceDispatch *dispatch;
>> +
>> +    QTAILQ_ENTRY(FlatView) flat_views_link;
>> +    QTAILQ_HEAD(address_spaces, AddressSpace) address_spaces;
>>   };
>>     typedef struct AddressSpaceOps AddressSpaceOps;
>> @@ -259,12 +267,19 @@ static bool flatrange_equal(FlatRange *a, FlatRange
>> *b)
>>           && a->readonly == b->readonly;
>>   }
>>   -static void flatview_init(FlatView *view)
>> +static void flatview_ref(FlatView *view);
>> +
>> +static FlatView *flatview_alloc(MemoryRegion *mr_root)
>>   {
>> +    FlatView *view;
>> +
>> +    view = g_new0(FlatView, 1);
>>       view->ref = 1;
>> -    view->ranges = NULL;
>> -    view->nr = 0;
>> -    view->nr_allocated = 0;
>> +    view->root = mr_root;
>> +    memory_region_ref(mr_root);
>> +    QTAILQ_INIT(&view->address_spaces);
>> +
>> +    return view;
>>   }
>>     /* Insert a range into a given position.  Caller is responsible for
>> maintaining
>> @@ -292,6 +307,10 @@ static void flatview_destroy(FlatView *view)
>>           memory_region_unref(view->ranges[i].mr);
>>       }
>>       g_free(view->ranges);
>> +    if (view->dispatch) {
>> +        address_space_dispatch_free(view->dispatch);
>> +    }
>> +    memory_region_unref(view->root);
>>       g_free(view);
>>   }
>>   @@ -303,7 +322,7 @@ static void flatview_ref(FlatView *view)
>>   static void flatview_unref(FlatView *view)
>>   {
>>       if (atomic_fetch_dec(&view->ref) == 1) {
>> -        flatview_destroy(view);
>> +        call_rcu(view, flatview_destroy, rcu);
>>       }
>>   }
>>   @@ -608,7 +627,7 @@ static AddressSpace
>> *memory_region_to_address_space(MemoryRegion *mr)
>>           mr = mr->container;
>>       }
>>       QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
>> -        if (mr == as->root) {
>> +        if (mr == as->current_map->root) {
>>               return as;
>>           }
>>       }
>> @@ -702,23 +721,6 @@ static void render_memory_region(FlatView *view,
>>       }
>>   }
>>   -/* Render a memory topology into a list of disjoint absolute ranges. */
>> -static FlatView *generate_memory_topology(MemoryRegion *mr)
>> -{
>> -    FlatView *view;
>> -
>> -    view = g_new(FlatView, 1);
>> -    flatview_init(view);
>> -
>> -    if (mr) {
>> -        render_memory_region(view, mr, int128_zero(),
>> -                             addrrange_make(int128_zero(),
>> int128_2_64()), false);
>> -    }
>> -    flatview_simplify(view);
>> -
>> -    return view;
>> -}
>> -
>>   static void address_space_add_del_ioeventfds(AddressSpace *as,
>>                                                MemoryRegionIoeventfd
>> *fds_new,
>>                                                unsigned fds_new_nb,
>> @@ -769,12 +771,10 @@ static void
>> address_space_add_del_ioeventfds(AddressSpace *as,
>>       }
>>   }
>>   -static FlatView *address_space_get_flatview(AddressSpace *as)
>> +static FlatView *flatview_get(FlatView *view)
>>   {
>> -    FlatView *view;
>> -
>>       rcu_read_lock();
>> -    view = atomic_rcu_read(&as->current_map);
>> +    view = atomic_rcu_read(&view);
>>       flatview_ref(view);
>>       rcu_read_unlock();
>>       return view;
>> @@ -789,7 +789,7 @@ static void
>> address_space_update_ioeventfds(AddressSpace *as)
>>       AddrRange tmp;
>>       unsigned i;
>>   -    view = address_space_get_flatview(as);
>> +    view = flatview_get(as->current_map);
>>       FOR_EACH_FLAT_RANGE(fr, view) {
>>           for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
>>               tmp = addrrange_shift(fr->mr->ioeventfds[i].addr,
>> @@ -881,28 +881,89 @@ static void
>> address_space_update_topology_pass(AddressSpace *as,
>>       }
>>   }
>>   -
>> -static void address_space_update_topology(AddressSpace *as)
>> +static FlatView *address_space_update_flatview(FlatView *view)
>>   {
>> -    FlatView *old_view = address_space_get_flatview(as);
>> -    FlatView *new_view = generate_memory_topology(as->root);
>> +    AddressSpace *as, *asnext;
>> +    FlatView *old_view = flatview_get(view);
>> +    MemoryRegion *root = old_view->root;
>> +    FlatView *new_view = flatview_alloc(root);
>> +    unsigned iold, inew;
>> +    FlatRange *frold, *frnew;
>>   -    address_space_update_topology_pass(as, old_view, new_view, false);
>> -    address_space_update_topology_pass(as, old_view, new_view, true);
>> +    if (root) {
>> +        render_memory_region(new_view, root, int128_zero(),
>> +                             addrrange_make(int128_zero(), int128_2_64()),
>> +                             false);
>> +        flatview_simplify(new_view);
>> +    }
>>   -    /* Writes are protected by the BQL.  */
>> -    atomic_rcu_set(&as->current_map, new_view);
>> -    call_rcu(old_view, flatview_unref, rcu);
>> +    new_view->dispatch = mem_begin();
>>   -    /* Note that all the old MemoryRegions are still alive up to this
>> -     * point.  This relieves most MemoryListeners from the need to
>> -     * ref/unref the MemoryRegions they get---unless they use them
>> -     * outside the iothread mutex, in which case precise reference
>> -     * counting is necessary.
>> +    /*
>> +     * FIXME: this is cut-n-paste from address_space_update_topology_pass,
>> +     * simplify it
>>        */
>> +    iold = inew = 0;
>> +    while (iold < old_view->nr || inew < new_view->nr) {
>> +        if (iold < old_view->nr) {
>> +            frold = &old_view->ranges[iold];
>> +        } else {
>> +            frold = NULL;
>> +        }
>> +        if (inew < new_view->nr) {
>> +            frnew = &new_view->ranges[inew];
>> +        } else {
>> +            frnew = NULL;
>> +        }
>> +
>> +        if (frold
>> +            && (!frnew
>> +                || int128_lt(frold->addr.start, frnew->addr.start)
>> +                || (int128_eq(frold->addr.start, frnew->addr.start)
>> +                    && !flatrange_equal(frold, frnew)))) {
>> +            ++iold;
>> +        } else if (frold && frnew && flatrange_equal(frold, frnew)) {
>> +            /* In both and unchanged (except logging may have changed) */
>> +            MemoryRegionSection mrs = section_from_flat_range(frnew,
>> +                    new_view->dispatch);
>> +
>> +            mem_add(new_view->dispatch, &mrs);
>> +
>> +            ++iold;
>> +            ++inew;
>> +        } else {
>> +            /* In new */
>> +            MemoryRegionSection mrs = section_from_flat_range(frnew,
>> +                    new_view->dispatch);
>> +
>> +            mem_add(new_view->dispatch, &mrs);
>> +
>> +            ++inew;
>> +        }
>> +    }
>> +
>> +    mem_commit(new_view->dispatch);
>> +
>> +    QTAILQ_FOREACH(as, &old_view->address_spaces, flat_view_link) {
>> +        address_space_update_topology_pass(as, old_view, new_view, false);
>> +        address_space_update_topology_pass(as, old_view, new_view, true);
>> +    }
>> +
>> +    QTAILQ_FOREACH_SAFE(as, &old_view->address_spaces, flat_view_link,
>> asnext) {
>> +        QTAILQ_REMOVE(&old_view->address_spaces, as, flat_view_link);
>> +        flatview_unref(old_view);
>> +
>> +        atomic_rcu_set(&as->current_map, new_view);
>> +
>> +        flatview_ref(new_view);
>> +        QTAILQ_INSERT_TAIL(&new_view->address_spaces, as, flat_view_link);
>> +
>> +        address_space_update_ioeventfds(as);
>> +    }
>> +
>>       flatview_unref(old_view);
>>   -    address_space_update_ioeventfds(as);
>> +    return new_view;
>>   }
>>     void memory_region_transaction_begin(void)
>> @@ -921,11 +982,31 @@ void memory_region_transaction_commit(void)
>>       --memory_region_transaction_depth;
>>       if (!memory_region_transaction_depth) {
>>           if (memory_region_update_pending) {
>> +            FlatView *view, *vnext;
>> +            struct FlatViewList fwstmp = QTAILQ_HEAD_INITIALIZER(fwstmp);
>> +
>>               MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
>>   -            QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
>> -                address_space_update_topology(as);
>> +            QTAILQ_FOREACH_SAFE(view, &flat_views, flat_views_link,
>> vnext) {
>> +                FlatView *old_view, *new_view;
>> +
>> +                old_view = flatview_get(view);
>> +                new_view = address_space_update_flatview(old_view);
>> +
>> +                QTAILQ_REMOVE(&flat_views, old_view, flat_views_link);
>> +                flatview_unref(old_view);
>> +                flatview_unref(old_view);
>> +
>> +                QTAILQ_INSERT_HEAD(&fwstmp, new_view, flat_views_link);
>> +
>> +                flatview_unref(new_view);
>>               }
>> +
>> +            QTAILQ_FOREACH_SAFE(view, &fwstmp, flat_views_link, vnext) {
>> +                QTAILQ_REMOVE(&fwstmp, view, flat_views_link);
>> +                QTAILQ_INSERT_HEAD(&flat_views, view, flat_views_link);
>> +            }
>> +
>>               memory_region_update_pending = false;
>>               MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
>>           } else if (ioeventfd_update_pending) {
>> @@ -1834,7 +1915,7 @@ void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
>>               continue;
>>           }
>>           as = listener->address_space;
>> -        view = address_space_get_flatview(as);
>> +        view = flatview_get(as->current_map);
>>           FOR_EACH_FLAT_RANGE(fr, view) {
>>               if (fr->mr == mr) {
>>                   MemoryRegionSection mrs = section_from_flat_range(fr,
>> @@ -1938,7 +2019,7 @@ static void
>> memory_region_update_coalesced_range_as(MemoryRegion *mr, AddressSpa
>>       AddrRange tmp;
>>       MemoryRegionSection section;
>>   -    view = address_space_get_flatview(as);
>> +    view = flatview_get(as->current_map);
>>       FOR_EACH_FLAT_RANGE(fr, view) {
>>           if (fr->mr == mr) {
>>               section = (MemoryRegionSection) {
>> @@ -2350,7 +2431,7 @@ void memory_global_dirty_log_sync(void)
>>               continue;
>>           }
>>           as = listener->address_space;
>> -        view = address_space_get_flatview(as);
>> +        view = flatview_get(as->current_map);
>>           FOR_EACH_FLAT_RANGE(fr, view) {
>>               if (fr->dirty_log_mask) {
>>                   MemoryRegionSection mrs = section_from_flat_range(fr,
>> @@ -2436,7 +2517,7 @@ static void
>> listener_add_address_space(MemoryListener *listener,
>>           }
>>       }
>>   -    view = address_space_get_flatview(as);
>> +    view = flatview_get(as->current_map);
>>       FOR_EACH_FLAT_RANGE(fr, view) {
>>           MemoryRegionSection section = {
>>               .mr = fr->mr,
>> @@ -2615,67 +2696,72 @@ void
>> memory_region_invalidate_mmio_ptr(MemoryRegion *mr, hwaddr offset,
>>     void address_space_init(AddressSpace *as, MemoryRegion *root, const
>> char *name)
>>   {
>> -    memory_region_ref(root);
>> -    memory_region_transaction_begin();
>> -    as->ref_count = 1;
>> -    as->root = root;
>> -    as->malloced = false;
>> -    as->current_map = g_new(FlatView, 1);
>> -    flatview_init(as->current_map);
>> +    FlatView *view;
>> +
>> +    as->current_map = NULL;
>>       as->ioeventfd_nb = 0;
>>       as->ioeventfds = NULL;
>>       QTAILQ_INIT(&as->listeners);
>>       QTAILQ_INSERT_TAIL(&address_spaces, as, address_spaces_link);
>>       as->name = g_strdup(name ? name : "anonymous");
>> -    address_space_init_dispatch(as);
>> -    memory_region_update_pending |= root->enabled;
>> -    memory_region_transaction_commit();
>> +
>> +    QTAILQ_FOREACH(view, &flat_views, flat_views_link) {
>> +        assert(root);
>> +        if (view->root == root) {
>> +            as->current_map = flatview_get(view);
>> +            break;
>> +        }
>> +    }
>> +
>> +    if (!as->current_map) {
>> +        as->current_map = flatview_alloc(root);
>> +
>> +        QTAILQ_INSERT_TAIL(&flat_views, as->current_map, flat_views_link);
>> +    }
>> +
>> +    QTAILQ_INSERT_TAIL(&as->current_map->address_spaces, as,
>> flat_view_link);
>> +}
>> +
>> +MemoryRegion *address_space_root(AddressSpace *as)
>> +{
>> +    return as->current_map->root;
>> +}
>> +
>> +AddressSpaceDispatch *address_space_to_dispatch(AddressSpace *as)
>> +{
>> +    return atomic_rcu_read(&as->current_map)->dispatch;
>>   }
>>     static void do_address_space_destroy(AddressSpace *as)
>>   {
>> -    bool do_free = as->malloced;
>> +    FlatView *view = flatview_get(as->current_map);
>>   -    address_space_destroy_dispatch(as);
>>       assert(QTAILQ_EMPTY(&as->listeners));
>>   -    flatview_unref(as->current_map);
>> +    QTAILQ_REMOVE(&view->address_spaces, as, flat_view_link);
>> +    flatview_unref(view);
>> +
>> +    flatview_unref(view);


This one removes a reference after unlinking an address space from the
flatview.

> 
> incorrect copy/paste?

Nope, this one pairs flatview_get() from few lines above.


-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 0/4] memory: Reduce memory use
  2017-09-07 14:54     ` Dr. David Alan Gilbert
@ 2017-09-08  2:08       ` Alexey Kardashevskiy
  2017-09-08  4:04         ` Alexey Kardashevskiy
  2017-09-08 11:12         ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-09-08  2:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Paolo Bonzini, Stefan Hajnoczi, Peter Maydell, David Gibson

On 08/09/17 00:54, Dr. David Alan Gilbert wrote:
> * Alexey Kardashevskiy (aik@ozlabs.ru) wrote:
>> On 07/09/17 19:51, Dr. David Alan Gilbert wrote:
>>> * Alexey Kardashevskiy (aik@ozlabs.ru) wrote:
>>>> This was inspired by https://bugzilla.redhat.com/show_bug.cgi?id=1481593
>>>>
>>>> What happens ithere is that every virtio block device creates 2 address
>>>> spaces - for modern config space (called "virtio-pci-cfg-as") and
>>>> for busmaster (common pci thing, called after the device name,
>>>> in my case "virtio-blk-pci").
>>>>
>>>> Each address_space_init() updates topology for every address space.
>>>> Every topology update (address_space_update_topology()) creates a new
>>>> dispatch tree - AddressSpaceDispatch with nodes (1KB) and
>>>> sections (48KB) and destroys the old one.
>>>>
>>>> However the dispatch destructor is postponed via RCU which does not
>>>> get a chance to execute until the machine is initialized but before
>>>> we get there, memory is not returned to the pool, and this is a lot
>>>> of memory which grows n^2.
>>>>
>>>> These patches are trying to address the memory use and boot time
>>>> issues but tbh only the first one provides visible outcome.
>>>
>>> Do you have a feel for how much memory is saved?
>>
>>
>> The 1/4 saves ~33GB (~44GB -> 11GB) for a 2GB guest and 400 virtio-pci
>> devices. These GB figures are the peak values (but it does not matter for
>> OOM killer), memory gets released in one go when RCU kicks in, it just
>> happens too late.
> 
> Nice saving!  Still, why is it using 11GB?

Yet to be discovered :) Not clear at the moment.


> What's it like for more sane configurations, say 2-3 virtio devices - is
> there anything noticable or is it just the huge setups?
> 
> Dave
> 
> 
>> The 3/4 saves less, I'd say 50KB per VCPU (more if you count peaks but so
>> much). Strangely, I do not see the difference in valgrind output when I run
>> a guest with 1024 or just 8 CPUs, probably "massif" is not the right tool
>> to catch this.

I did some more tests.

v2.10:
1024 CPUs, no virtio:     0:47   490.8MB   38/34
1 CPU, 500 virtio-block:  5:03   59.69GB   2354438/3

1/4 applied:
1024 CPUs, no virtio:	  0:49   490.8MB   38/34
1 CPU, 500 virtio-block:  1:57   17.74GB   2186/3

3/4 applied:
1024 CPUs, no virtio:     0:53   491.1MB   20/17
1 CPU, 500 virtio-block:  2:01    17.7GB   2167/0


Time is what it takes to start QEMU with -S and then Q-Ax.

Memory amount is peak use from valgrind massif.

Last 2 numbers - "38/34" for example - 38 is the number of g_new(FlatView,
1), 34 is the number of g_free(view); the numbers are printed at
https://git.qemu.org/?p=qemu.git;a=blob;f=vl.c;h=8e247cc2a239ae8fb3d3cdf6d4ee78fd723d1053;hb=1ab5eb4efb91a3d4569b0df6e824cc08ab4bd8ec#l4666
before RCU kicks in.

500 virtio-block + bridges use around 1100 address spaces.





>>
>>>
>>> Dave
>>>
>>>> There are still things to polish and double check the use of RCU,
>>>> I'd like to get any feedback before proceeding - is this going
>>>> the right way or way too ugly?
>>>>
>>>>
>>>> This is based on sha1
>>>> 1ab5eb4efb Peter Maydell "Update version for v2.10.0 release".
>>>>
>>>> Please comment. Thanks.
>>>>
>>>>
>>>>
>>>> Alexey Kardashevskiy (4):
>>>>   memory: Postpone flatview and dispatch tree building till all devices
>>>>     are added
>>>>   memory: Prepare for shared flat views
>>>>   memory: Share flat views and dispatch trees between address spaces
>>>>   memory: Add flat views to HMP "info mtree"
>>>>
>>>>  include/exec/memory-internal.h |   6 +-
>>>>  include/exec/memory.h          |  93 +++++++++----
>>>>  exec.c                         | 242 +++++++++++++++++++--------------
>>>>  hw/alpha/typhoon.c             |   2 +-
>>>>  hw/dma/rc4030.c                |   4 +-
>>>>  hw/i386/amd_iommu.c            |   2 +-
>>>>  hw/i386/intel_iommu.c          |   9 +-
>>>>  hw/intc/openpic_kvm.c          |   2 +-
>>>>  hw/pci-host/apb.c              |   2 +-
>>>>  hw/pci/pci.c                   |   3 +-
>>>>  hw/ppc/spapr_iommu.c           |   4 +-
>>>>  hw/s390x/s390-pci-bus.c        |   2 +-
>>>>  hw/vfio/common.c               |   6 +-
>>>>  hw/virtio/vhost.c              |   6 +-
>>>>  memory.c                       | 299 +++++++++++++++++++++++++++--------------
>>>>  monitor.c                      |   3 +-
>>>>  vl.c                           |   4 +
>>>>  hmp-commands-info.hx           |   7 +-
>>>>  18 files changed, 448 insertions(+), 248 deletions(-)
>>>>
>>>> -- 
>>>> 2.11.0
>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>
>>
>> -- 
>> Alexey
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 0/4] memory: Reduce memory use
  2017-09-08  2:08       ` Alexey Kardashevskiy
@ 2017-09-08  4:04         ` Alexey Kardashevskiy
  2017-09-08 11:12         ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-09-08  4:04 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Paolo Bonzini, Stefan Hajnoczi, Peter Maydell, David Gibson

On 08/09/17 12:08, Alexey Kardashevskiy wrote:
> On 08/09/17 00:54, Dr. David Alan Gilbert wrote:
>> * Alexey Kardashevskiy (aik@ozlabs.ru) wrote:
>>> On 07/09/17 19:51, Dr. David Alan Gilbert wrote:
>>>> * Alexey Kardashevskiy (aik@ozlabs.ru) wrote:
>>>>> This was inspired by https://bugzilla.redhat.com/show_bug.cgi?id=1481593
>>>>>
>>>>> What happens ithere is that every virtio block device creates 2 address
>>>>> spaces - for modern config space (called "virtio-pci-cfg-as") and
>>>>> for busmaster (common pci thing, called after the device name,
>>>>> in my case "virtio-blk-pci").
>>>>>
>>>>> Each address_space_init() updates topology for every address space.
>>>>> Every topology update (address_space_update_topology()) creates a new
>>>>> dispatch tree - AddressSpaceDispatch with nodes (1KB) and
>>>>> sections (48KB) and destroys the old one.
>>>>>
>>>>> However the dispatch destructor is postponed via RCU which does not
>>>>> get a chance to execute until the machine is initialized but before
>>>>> we get there, memory is not returned to the pool, and this is a lot
>>>>> of memory which grows n^2.
>>>>>
>>>>> These patches are trying to address the memory use and boot time
>>>>> issues but tbh only the first one provides visible outcome.
>>>>
>>>> Do you have a feel for how much memory is saved?
>>>
>>>
>>> The 1/4 saves ~33GB (~44GB -> 11GB) for a 2GB guest and 400 virtio-pci
>>> devices. These GB figures are the peak values (but it does not matter for
>>> OOM killer), memory gets released in one go when RCU kicks in, it just
>>> happens too late.
>>
>> Nice saving!  Still, why is it using 11GB?
> 
> Yet to be discovered :) Not clear at the moment.
> 
> 
>> What's it like for more sane configurations, say 2-3 virtio devices - is
>> there anything noticable or is it just the huge setups?
>>
>> Dave
>>
>>
>>> The 3/4 saves less, I'd say 50KB per VCPU (more if you count peaks but so
>>> much). Strangely, I do not see the difference in valgrind output when I run
>>> a guest with 1024 or just 8 CPUs, probably "massif" is not the right tool
>>> to catch this.
> 
> I did some more tests.
> 
> v2.10:
> 1024 CPUs, no virtio:     0:47   490.8MB   38/34
> 1 CPU, 500 virtio-block:  5:03   59.69GB   2354438/3
> 
> 1/4 applied:
> 1024 CPUs, no virtio:	  0:49   490.8MB   38/34
> 1 CPU, 500 virtio-block:  1:57   17.74GB   2186/3
> 
> 3/4 applied:
> 1024 CPUs, no virtio:     0:53   491.1MB   20/17
> 1 CPU, 500 virtio-block:  2:01    17.7GB   2167/0
> 
> 
> Time is what it takes to start QEMU with -S and then Q-Ax.
> 
> Memory amount is peak use from valgrind massif.
> 
> Last 2 numbers - "38/34" for example - 38 is the number of g_new(FlatView,
> 1), 34 is the number of g_free(view); the numbers are printed at
> https://git.qemu.org/?p=qemu.git;a=blob;f=vl.c;h=8e247cc2a239ae8fb3d3cdf6d4ee78fd723d1053;hb=1ab5eb4efb91a3d4569b0df6e824cc08ab4bd8ec#l4666
> before RCU kicks in.
> 
> 500 virtio-block + bridges use around 1100 address spaces.


Numbers are quite wrong for the [3/4] patch as
qemu_run_machine_init_done_notifiers() does the same dance all over again.

I can bring memory peak use down to 3.75GB (yay!) for "1 CPU, 500
virtio-block" with the below fix. It belongs to [1/4] and we might want
just extend that begin+commit to cover both devices creation and notifiers
initialization.

My guess that the general problem is that memory_region_transaction_begin()
and memory_region_transaction_commit() update all address spaces (or flat
views after [3/4] applied) while they do not really need to - they could
receive a hint (an AS? a flatview) and update only respective bits, I just
do not see an easy way to provide with a hint other than storing an AS
pointer in an MR.

In my test, I have 500 devices and 2 layers of PCI bridges. For every PCI
device probed by SLOF (our boot firmware), _all_ flatviews are rebuild 8
times (at least, sometime more, like 20, when a next bridge gets activated,
dunno):

- config space writes at 0x20 0x24 0x28 0x2c 0x1c - we end up in
pci_bridge_update_mappings();
- config space writes at 0x4 for a device and a bridge - pci_update_mappings();
- memory_region_set_enabled() on d->bus_master_enable_region
enable bridge.

Each of these operations re-renders all 1000-ish flatviews and dispatch
trees, making device probing really, really slow. But at least memory is
not sort of leaking or hold by RCU :) And I cannot wrap these config writes
into memory_region_transaction_begin/commit as every config space access
comes independently from SLOF.



diff --git a/vl.c b/vl.c
index 89fb58c1de..4317ef01b4 100644
--- a/vl.c
+++ b/vl.c
@@ -4754,8 +4754,13 @@ int main(int argc, char **argv, char **envp)
     /* TODO: once all bus devices are qdevified, this should be done
      * when bus is created by qdev.c */
     qemu_register_reset(qbus_reset_all_fn, sysbus_get_default());
+
+    memory_region_transaction_begin();
+
     qemu_run_machine_init_done_notifiers();

+    memory_region_transaction_commit();
+
     if (rom_check_and_register_reset() != 0) {
         error_report("rom check and register reset failed");
         exit(1);




> 
> 
> 
> 
> 
>>>
>>>>
>>>> Dave
>>>>
>>>>> There are still things to polish and double check the use of RCU,
>>>>> I'd like to get any feedback before proceeding - is this going
>>>>> the right way or way too ugly?
>>>>>
>>>>>
>>>>> This is based on sha1
>>>>> 1ab5eb4efb Peter Maydell "Update version for v2.10.0 release".
>>>>>
>>>>> Please comment. Thanks.
>>>>>
>>>>>
>>>>>
>>>>> Alexey Kardashevskiy (4):
>>>>>   memory: Postpone flatview and dispatch tree building till all devices
>>>>>     are added
>>>>>   memory: Prepare for shared flat views
>>>>>   memory: Share flat views and dispatch trees between address spaces
>>>>>   memory: Add flat views to HMP "info mtree"
>>>>>
>>>>>  include/exec/memory-internal.h |   6 +-
>>>>>  include/exec/memory.h          |  93 +++++++++----
>>>>>  exec.c                         | 242 +++++++++++++++++++--------------
>>>>>  hw/alpha/typhoon.c             |   2 +-
>>>>>  hw/dma/rc4030.c                |   4 +-
>>>>>  hw/i386/amd_iommu.c            |   2 +-
>>>>>  hw/i386/intel_iommu.c          |   9 +-
>>>>>  hw/intc/openpic_kvm.c          |   2 +-
>>>>>  hw/pci-host/apb.c              |   2 +-
>>>>>  hw/pci/pci.c                   |   3 +-
>>>>>  hw/ppc/spapr_iommu.c           |   4 +-
>>>>>  hw/s390x/s390-pci-bus.c        |   2 +-
>>>>>  hw/vfio/common.c               |   6 +-
>>>>>  hw/virtio/vhost.c              |   6 +-
>>>>>  memory.c                       | 299 +++++++++++++++++++++++++++--------------
>>>>>  monitor.c                      |   3 +-
>>>>>  vl.c                           |   4 +
>>>>>  hmp-commands-info.hx           |   7 +-
>>>>>  18 files changed, 448 insertions(+), 248 deletions(-)
>>>>>
>>>>> -- 
>>>>> 2.11.0
>>>>>
>>>>>
>>>> --
>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>
>>>
>>>
>>> -- 
>>> Alexey
>> --
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>
> 
> 


-- 
Alexey

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 1/4] memory: Postpone flatview and dispatch tree building till all devices are added
  2017-09-07 14:30       ` Peter Maydell
@ 2017-09-08  6:21         ` Alexey Kardashevskiy
  0 siblings, 0 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-09-08  6:21 UTC (permalink / raw)
  To: Peter Maydell
  Cc: QEMU Developers, David Gibson, Paolo Bonzini, Stefan Hajnoczi

On 08/09/17 00:30, Peter Maydell wrote:
> On 7 September 2017 at 15:27, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>> On 07/09/17 19:30, Peter Maydell wrote:
>>> What happens if something in a device realize function tries
>>> to do a read from an AddressSpace?
>>
>>
>> address_space_memory is created before that loop but yes, address spaces
>> for devices are not rendered and QEMU crashes, needs fixing.
> 
> I'd be OK if we defined this to be not permitted, as long as
> we're reasonably happy that we don't currently do it anywhere
> and the error if it does happen is fairly clear about what's
> gone wrong.


QEMU actually only crashes with 2/4 + 3/4, with just 1/4 QEMU gets to
unassigned_mem_accepts() silently. Not sure how to convert this to a clear
error message.



-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 0/4] memory: Reduce memory use
  2017-09-08  2:08       ` Alexey Kardashevskiy
  2017-09-08  4:04         ` Alexey Kardashevskiy
@ 2017-09-08 11:12         ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 28+ messages in thread
From: Dr. David Alan Gilbert @ 2017-09-08 11:12 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: qemu-devel, Paolo Bonzini, Stefan Hajnoczi, Peter Maydell, David Gibson

* Alexey Kardashevskiy (aik@ozlabs.ru) wrote:
> On 08/09/17 00:54, Dr. David Alan Gilbert wrote:
> > * Alexey Kardashevskiy (aik@ozlabs.ru) wrote:
> >> On 07/09/17 19:51, Dr. David Alan Gilbert wrote:
> >>> * Alexey Kardashevskiy (aik@ozlabs.ru) wrote:
> >>>> This was inspired by https://bugzilla.redhat.com/show_bug.cgi?id=1481593
> >>>>
> >>>> What happens ithere is that every virtio block device creates 2 address
> >>>> spaces - for modern config space (called "virtio-pci-cfg-as") and
> >>>> for busmaster (common pci thing, called after the device name,
> >>>> in my case "virtio-blk-pci").
> >>>>
> >>>> Each address_space_init() updates topology for every address space.
> >>>> Every topology update (address_space_update_topology()) creates a new
> >>>> dispatch tree - AddressSpaceDispatch with nodes (1KB) and
> >>>> sections (48KB) and destroys the old one.
> >>>>
> >>>> However the dispatch destructor is postponed via RCU which does not
> >>>> get a chance to execute until the machine is initialized but before
> >>>> we get there, memory is not returned to the pool, and this is a lot
> >>>> of memory which grows n^2.
> >>>>
> >>>> These patches are trying to address the memory use and boot time
> >>>> issues but tbh only the first one provides visible outcome.
> >>>
> >>> Do you have a feel for how much memory is saved?
> >>
> >>
> >> The 1/4 saves ~33GB (~44GB -> 11GB) for a 2GB guest and 400 virtio-pci
> >> devices. These GB figures are the peak values (but it does not matter for
> >> OOM killer), memory gets released in one go when RCU kicks in, it just
> >> happens too late.
> > 
> > Nice saving!  Still, why is it using 11GB?
> 
> Yet to be discovered :) Not clear at the moment.
> 
> 
> > What's it like for more sane configurations, say 2-3 virtio devices - is
> > there anything noticable or is it just the huge setups?
> > 
> > Dave
> > 
> > 
> >> The 3/4 saves less, I'd say 50KB per VCPU (more if you count peaks but so
> >> much). Strangely, I do not see the difference in valgrind output when I run
> >> a guest with 1024 or just 8 CPUs, probably "massif" is not the right tool
> >> to catch this.
> 
> I did some more tests.
> 
> v2.10:
> 1024 CPUs, no virtio:     0:47   490.8MB   38/34
> 1 CPU, 500 virtio-block:  5:03   59.69GB   2354438/3
> 
> 1/4 applied:
> 1024 CPUs, no virtio:	  0:49   490.8MB   38/34
> 1 CPU, 500 virtio-block:  1:57   17.74GB   2186/3
> 
> 3/4 applied:
> 1024 CPUs, no virtio:     0:53   491.1MB   20/17
> 1 CPU, 500 virtio-block:  2:01    17.7GB   2167/0
> 
> 
> Time is what it takes to start QEMU with -S and then Q-Ax.
> 
> Memory amount is peak use from valgrind massif.
> 
> Last 2 numbers - "38/34" for example - 38 is the number of g_new(FlatView,
> 1), 34 is the number of g_free(view); the numbers are printed at
> https://git.qemu.org/?p=qemu.git;a=blob;f=vl.c;h=8e247cc2a239ae8fb3d3cdf6d4ee78fd723d1053;hb=1ab5eb4efb91a3d4569b0df6e824cc08ab4bd8ec#l4666
> before RCU kicks in.
> 
> 500 virtio-block + bridges use around 1100 address spaces.

What I find interesting is the effect even on small VMs, I'm using
valgrind --tool=exp-dhat as per your bz comment, on a qemu close to
head:

valgrind --tool=exp-dhat ~/try/x86_64-softmmu/qemu-system-x86_64 -nographic -device sga -m 1G -M pc,accel=kvm -drive file=/home/vmimages/littlefed20.img,id=d1,if=none -device virtio-blk,drive=d1 -drive file=/home/vmimages/dummy1,id=d2,if=none -device virtio-blk,drive=d2 -drive file=/home/vmimages/dummy2,id=d3,if=none -device virtio-blk,drive=d3 -device virtio-serial -device virtio-serial -device virtio-serial -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0

==5945== guest_insns:  2,845,498,404
==5945== max_live:     73,745,261 in 45,395 blocks
==5945== tot_alloc:    615,696,752 in 515,110 blocks

with your 1-4 patches:
==14661== guest_insns:  2,626,826,254
==14661== max_live:     27,825,659 in 28,950 blocks
==14661== tot_alloc:    529,978,686 in 444,043 blocks

so that's a 45MB saving on a simple VM - those type of numbers add up
for people running lots of small VMs; they notice when their total
qemu RAM overhead for their box goes up by a few GB.

Dave

> 
> 
> 
> 
> >>
> >>>
> >>> Dave
> >>>
> >>>> There are still things to polish and double check the use of RCU,
> >>>> I'd like to get any feedback before proceeding - is this going
> >>>> the right way or way too ugly?
> >>>>
> >>>>
> >>>> This is based on sha1
> >>>> 1ab5eb4efb Peter Maydell "Update version for v2.10.0 release".
> >>>>
> >>>> Please comment. Thanks.
> >>>>
> >>>>
> >>>>
> >>>> Alexey Kardashevskiy (4):
> >>>>   memory: Postpone flatview and dispatch tree building till all devices
> >>>>     are added
> >>>>   memory: Prepare for shared flat views
> >>>>   memory: Share flat views and dispatch trees between address spaces
> >>>>   memory: Add flat views to HMP "info mtree"
> >>>>
> >>>>  include/exec/memory-internal.h |   6 +-
> >>>>  include/exec/memory.h          |  93 +++++++++----
> >>>>  exec.c                         | 242 +++++++++++++++++++--------------
> >>>>  hw/alpha/typhoon.c             |   2 +-
> >>>>  hw/dma/rc4030.c                |   4 +-
> >>>>  hw/i386/amd_iommu.c            |   2 +-
> >>>>  hw/i386/intel_iommu.c          |   9 +-
> >>>>  hw/intc/openpic_kvm.c          |   2 +-
> >>>>  hw/pci-host/apb.c              |   2 +-
> >>>>  hw/pci/pci.c                   |   3 +-
> >>>>  hw/ppc/spapr_iommu.c           |   4 +-
> >>>>  hw/s390x/s390-pci-bus.c        |   2 +-
> >>>>  hw/vfio/common.c               |   6 +-
> >>>>  hw/virtio/vhost.c              |   6 +-
> >>>>  memory.c                       | 299 +++++++++++++++++++++++++++--------------
> >>>>  monitor.c                      |   3 +-
> >>>>  vl.c                           |   4 +
> >>>>  hmp-commands-info.hx           |   7 +-
> >>>>  18 files changed, 448 insertions(+), 248 deletions(-)
> >>>>
> >>>> -- 
> >>>> 2.11.0
> >>>>
> >>>>
> >>> --
> >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>
> >>
> >>
> >> -- 
> >> Alexey
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
> 
> -- 
> Alexey
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 2/4] memory: Prepare for shared flat views
  2017-09-07  9:20 ` [Qemu-devel] [RFC PATCH qemu 2/4] memory: Prepare for shared flat views Alexey Kardashevskiy
@ 2017-09-09  7:18   ` David Gibson
  2017-09-10  9:17     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 28+ messages in thread
From: David Gibson @ 2017-09-09  7:18 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: qemu-devel, Paolo Bonzini, Stefan Hajnoczi, Peter Maydell

[-- Attachment #1: Type: text/plain, Size: 1035 bytes --]

On Thu, Sep 07, 2017 at 07:20:08PM +1000, Alexey Kardashevskiy wrote:
> We are going to share flat views and dispatch trees between address
> spaces. This moves bits around but no change in behaviour is expected
> here. The following patch will implement sharing.
> 
> This switches from AddressSpace to AddressSpaceDispatch as in many
> places this is what is used actually. Also, since multiple address
> spaces will be sharing a flat view, MemoryRegionSection cannot store
> the address space pointer, this changes it to AddressSpaceDispatch
> as well. As a not very obvious result, IOMMUTLBEntry also switched
> from AddressSpace to AddressSpaceDispatch to make
> address_space_get_iotlb_entry() work.

So, mostly this seems to be exposing AddressSpaceDispatch in more
places.  It's not obvious to me why that's useful.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 2/4] memory: Prepare for shared flat views
  2017-09-09  7:18   ` David Gibson
@ 2017-09-10  9:17     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-09-10  9:17 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, Paolo Bonzini, Stefan Hajnoczi, Peter Maydell

[-- Attachment #1: Type: text/plain, Size: 1150 bytes --]

On 09/09/17 17:18, David Gibson wrote:
> On Thu, Sep 07, 2017 at 07:20:08PM +1000, Alexey Kardashevskiy wrote:
>> We are going to share flat views and dispatch trees between address
>> spaces. This moves bits around but no change in behaviour is expected
>> here. The following patch will implement sharing.
>>
>> This switches from AddressSpace to AddressSpaceDispatch as in many
>> places this is what is used actually. Also, since multiple address
>> spaces will be sharing a flat view, MemoryRegionSection cannot store
>> the address space pointer, this changes it to AddressSpaceDispatch
>> as well. As a not very obvious result, IOMMUTLBEntry also switched
>> from AddressSpace to AddressSpaceDispatch to make
>> address_space_get_iotlb_entry() work.
> 
> So, mostly this seems to be exposing AddressSpaceDispatch in more
> places.  It's not obvious to me why that's useful.


It is mostly for IOMMUs - since dispatch tree and flat view is shared among
address spaces, I cannot use address spaces in iommu_ops::translate - too
ambiguous - "use any attached to this flatview/dispatch" is just  confusing.


-- 
Alexey


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 839 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 3/4] memory: Share flat views and dispatch trees between address spaces
  2017-09-07  9:20 ` [Qemu-devel] [RFC PATCH qemu 3/4] memory: Share flat views and dispatch trees between address spaces Alexey Kardashevskiy
  2017-09-07 20:53   ` Philippe Mathieu-Daudé
@ 2017-09-11  7:40   ` Paolo Bonzini
  2017-09-11  9:06     ` Alexey Kardashevskiy
  1 sibling, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2017-09-11  7:40 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Stefan Hajnoczi, Peter Maydell, David Gibson

On 07/09/2017 11:20, Alexey Kardashevskiy wrote:
>  
>      /* Accessed via RCU.  */
>      struct FlatView *current_map;
>  
>      int ioeventfd_nb;
>      struct MemoryRegionIoeventfd *ioeventfds;
> -    struct AddressSpaceDispatch *dispatch;
> -    struct AddressSpaceDispatch *next_dispatch;
> +

The rough idea of the patch matches my suggestion indeed.  However, I am
not sure why all of the changes in patch 2 are needed.

Once you have built the FlatView and the dispatch within it, you can
still cache its dispatch tree in as->dispatch, and free it with RCU from
flatview_destroy.  This removes the need to use call_rcu from
flatview_unref.

In addition, you could change the computation of FlatView's root to
resolve 2^64-sized aliases; also set it to NULL if the AddressSpace's
root is disabled or the alias it resolves to is disabled (and so on
recursively until a non-alias is found).  This should remove the need
for address_space_root() and the change to pci_init_bus_master.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 3/4] memory: Share flat views and dispatch trees between address spaces
  2017-09-11  7:40   ` Paolo Bonzini
@ 2017-09-11  9:06     ` Alexey Kardashevskiy
  2017-09-11  9:37       ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-09-11  9:06 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: Stefan Hajnoczi, Peter Maydell, David Gibson

On 11/09/17 17:40, Paolo Bonzini wrote:
> On 07/09/2017 11:20, Alexey Kardashevskiy wrote:
>>  
>>      /* Accessed via RCU.  */
>>      struct FlatView *current_map;
>>  
>>      int ioeventfd_nb;
>>      struct MemoryRegionIoeventfd *ioeventfds;
>> -    struct AddressSpaceDispatch *dispatch;
>> -    struct AddressSpaceDispatch *next_dispatch;
>> +
> 
> The rough idea of the patch matches my suggestion indeed.  However, I am
> not sure why all of the changes in patch 2 are needed.

For this:

 struct MemoryRegionSection {
     MemoryRegion *mr;
-    AddressSpace *address_space;
+    AddressSpaceDispatch *dispatch;

as there are many ASes attached to the same flatview/dispatch.

And because of that, there is also:

 struct IOMMUTLBEntry {
-    AddressSpace    *target_as;
+    AddressSpaceDispatch *target_dispatch;

as the "section" in address_space_get_iotlb_entry() does not have
address_space any more, even though the only user of it -
vhost_device_iotlb_miss() - only checks if (iotlb.target_dispatch != NULL).


> Once you have built the FlatView and the dispatch within it, you can
> still cache its dispatch tree in as->dispatch, and free it with RCU from
> flatview_destroy.  This removes the need to use call_rcu from
> flatview_unref.

Ok, I will do that.

> In addition, you could change the computation of FlatView's root to
> resolve 2^64-sized aliases;

Here we reached the boundary of my english :)

Roots are given when AS/Flatview is created, and aliases are resolved already.

> also set it to NULL if the AddressSpace's
> root is disabled or the alias it resolves to is disabled (and so on
> recursively until a non-alias is found).  This should remove the need
> for address_space_root() and the change to pci_init_bus_master.




-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 3/4] memory: Share flat views and dispatch trees between address spaces
  2017-09-11  9:06     ` Alexey Kardashevskiy
@ 2017-09-11  9:37       ` Paolo Bonzini
  2017-09-11 12:08         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2017-09-11  9:37 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Stefan Hajnoczi, Peter Maydell, David Gibson

On 11/09/2017 11:06, Alexey Kardashevskiy wrote:
> On 11/09/17 17:40, Paolo Bonzini wrote:
>> On 07/09/2017 11:20, Alexey Kardashevskiy wrote:
>>>  
>>>      /* Accessed via RCU.  */
>>>      struct FlatView *current_map;
>>>  
>>>      int ioeventfd_nb;
>>>      struct MemoryRegionIoeventfd *ioeventfds;
>>> -    struct AddressSpaceDispatch *dispatch;
>>> -    struct AddressSpaceDispatch *next_dispatch;
>>> +
>>
>> The rough idea of the patch matches my suggestion indeed.  However, I am
>> not sure why all of the changes in patch 2 are needed.
> 
> For this:
> 
>  struct MemoryRegionSection {
>      MemoryRegion *mr;
> -    AddressSpace *address_space;
> +    AddressSpaceDispatch *dispatch;
> 
> as there are many ASes attached to the same flatview/dispatch.

Ok, this makes sense.  Maybe it should be a flatview rather than an
AddressSpaceDispatch (a FlatView is essentially a list of
MemoryRegionSections; attaching the ASD to the FlatView is more or less
an implementation detail).


> And because of that, there is also:
> 
>  struct IOMMUTLBEntry {
> -    AddressSpace    *target_as;
> +    AddressSpaceDispatch *target_dispatch;
> 
> as the "section" in address_space_get_iotlb_entry() does not have
> address_space any more, even though the only user of it -
> vhost_device_iotlb_miss() - only checks if (iotlb.target_dispatch != NULL).

You could change address_space_do_translate's "as" to AddressSpace **as.
 Then, address_space_get_iotlb_entry can populate the .target_as from
the output value of the argument.

>> In addition, you could change the computation of FlatView's root to
>> resolve 2^64-sized aliases;
> 
> Here we reached the boundary of my english :)
> 
> Roots are given when AS/Flatview is created, and aliases are resolved already.
> 

Currently, you're doing

+        if (view->root == root) {
+            as->current_map = flatview_get(view);
+            break;
+        }

(and by the way the above doesn't resolve aliases; aliases are only
resolved by render_memory_region).

Instead, the FlatViews should be shared at transaction commit time.  At
the beginning of commit, the list of flat views is empty.  As you
process each AddressSpace, you resolve the root alias(*) and search for
the resulting MemoryRegion in the list of FlatViews.  If you find it,
you do flatview_ref and point as->current_map to the FlatView you just
found.  Otherwise, you do generate_memory_topology etc. as in the old code.

(*) something like

	MemoryRegion *mr = as->root;
	MemoryRegion *root_mr;
	while (mr->alias && !mr->alias_offset &&
	       int128_ge(mr->size, mr->alias->size)) {
	    /* The alias is included in its entirety.  Use it as
	     * the "real" root, so that we can share more FlatViews.
	     */
	    mr = mr->alias;
	}	

	/* We found the "real" root, but maybe we can use a shared empty
	 * FlatView instead?
	 */
        for (root_mr = mr; root_mr; root_mr = root_mr->alias) {
	    if (!root_mr->enabled) {
		/* Use a shared empty FlatView.  */
		return NULL;
	    }
        }

        return mr;

Also:

> +static FlatView *flatview_get(FlatView *view)
>  {
> -    FlatView *view;
> -
>      rcu_read_lock();
> -    view = atomic_rcu_read(&as->current_map);
> +    view = atomic_rcu_read(&view);

This is "view = view" so it makes little sense, and then in the caller

+    view = flatview_get(as->current_map);

as->current_map is accessed without atomic_rcu_read.  So
address_space_get_flatview must stay.  Probably this will solve itself
when you do the rest.

Paolo

>> also set it to NULL if the AddressSpace's
>> root is disabled or the alias it resolves to is disabled (and so on
>> recursively until a non-alias is found).  This should remove the need
>> for address_space_root() and the change to pci_init_bus_master.
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 3/4] memory: Share flat views and dispatch trees between address spaces
  2017-09-11  9:37       ` Paolo Bonzini
@ 2017-09-11 12:08         ` Alexey Kardashevskiy
  2017-09-11 15:30           ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-09-11 12:08 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: Stefan Hajnoczi, Peter Maydell, David Gibson

On 11/09/17 19:37, Paolo Bonzini wrote:
> On 11/09/2017 11:06, Alexey Kardashevskiy wrote:
>> On 11/09/17 17:40, Paolo Bonzini wrote:
>>> On 07/09/2017 11:20, Alexey Kardashevskiy wrote:
>>>>  
>>>>      /* Accessed via RCU.  */
>>>>      struct FlatView *current_map;
>>>>  
>>>>      int ioeventfd_nb;
>>>>      struct MemoryRegionIoeventfd *ioeventfds;
>>>> -    struct AddressSpaceDispatch *dispatch;
>>>> -    struct AddressSpaceDispatch *next_dispatch;
>>>> +
>>>
>>> The rough idea of the patch matches my suggestion indeed.  However, I am
>>> not sure why all of the changes in patch 2 are needed.
>>
>> For this:
>>
>>  struct MemoryRegionSection {
>>      MemoryRegion *mr;
>> -    AddressSpace *address_space;
>> +    AddressSpaceDispatch *dispatch;
>>
>> as there are many ASes attached to the same flatview/dispatch.
> 
> Ok, this makes sense.  Maybe it should be a flatview rather than an
> AddressSpaceDispatch (a FlatView is essentially a list of
> MemoryRegionSections; attaching the ASD to the FlatView is more or less
> an implementation detail).

The helpers I converted from AddressSpace to AddressSpaceDispatch do use
dispatch structure only and do not use FlatView so it seemed logical.

btw this address_space in MemoryRegionSection - it does not seem to make
much sense in the PhysPageMap::sections array, it only makes sense when
MemoryRegionSection uses as a temporary object when calling listeners. Will
it make sense if we enforce MemoryRegionSection::address_space to be NULL
in the array and not NULL when used temporary?


> 
> 
>> And because of that, there is also:
>>
>>  struct IOMMUTLBEntry {
>> -    AddressSpace    *target_as;
>> +    AddressSpaceDispatch *target_dispatch;
>>
>> as the "section" in address_space_get_iotlb_entry() does not have
>> address_space any more, even though the only user of it -
>> vhost_device_iotlb_miss() - only checks if (iotlb.target_dispatch != NULL).
> 
> You could change address_space_do_translate's "as" to AddressSpace **as.
>  Then, address_space_get_iotlb_entry can populate the .target_as from
> the output value of the argument.

Ok, since this seems better.


> 
>>> In addition, you could change the computation of FlatView's root to
>>> resolve 2^64-sized aliases;
>>
>> Here we reached the boundary of my english :)
>>
>> Roots are given when AS/Flatview is created, and aliases are resolved already.
>>
> 
> Currently, you're doing
> 
> +        if (view->root == root) {
> +            as->current_map = flatview_get(view);
> +            break;
> +        }
> 
> (and by the way the above doesn't resolve aliases; aliases are only
> resolved by render_memory_region).

I am learning as we speak :)


> 
> Instead, the FlatViews should be shared at transaction commit time.  At
> the beginning of commit, the list of flat views is empty.  As you
> process each AddressSpace, you resolve the root alias(*) and search for
> the resulting MemoryRegion in the list of FlatViews.  If you find it,
> you do flatview_ref and point as->current_map to the FlatView you just
> found.  Otherwise, you do generate_memory_topology etc. as in the old code.
> 
> (*) something like
> 
> 	MemoryRegion *mr = as->root;
> 	MemoryRegion *root_mr;
> 	while (mr->alias && !mr->alias_offset &&
> 	       int128_ge(mr->size, mr->alias->size)) {
> 	    /* The alias is included in its entirety.  Use it as
> 	     * the "real" root, so that we can share more FlatViews.
> 	     */
> 	    mr = mr->alias;
> 	}	
> 
> 	/* We found the "real" root, but maybe we can use a shared empty
> 	 * FlatView instead?
> 	 */
>         for (root_mr = mr; root_mr; root_mr = root_mr->alias) {
> 	    if (!root_mr->enabled) {
> 		/* Use a shared empty FlatView.  */
> 		return NULL;
> 	    }
>         }
> 
>         return mr;


Ah, makes sense now, thanks.

> 
> Also:
> 
>> +static FlatView *flatview_get(FlatView *view)
>>  {
>> -    FlatView *view;
>> -
>>      rcu_read_lock();
>> -    view = atomic_rcu_read(&as->current_map);
>> +    view = atomic_rcu_read(&view);
> 
> This is "view = view" so it makes little sense, and then in the caller
> 
> +    view = flatview_get(as->current_map);
> 
> as->current_map is accessed without atomic_rcu_read.  So
> address_space_get_flatview must stay.  Probably this will solve itself
> when you do the rest.
> 
> Paolo
> 
>>> also set it to NULL if the AddressSpace's
>>> root is disabled or the alias it resolves to is disabled (and so on
>>> recursively until a non-alias is found).  This should remove the need
>>> for address_space_root() and the change to pci_init_bus_master.
>>
>>
>>
>>
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 3/4] memory: Share flat views and dispatch trees between address spaces
  2017-09-11 12:08         ` Alexey Kardashevskiy
@ 2017-09-11 15:30           ` Paolo Bonzini
  2017-09-12  5:55             ` Alexey Kardashevskiy
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2017-09-11 15:30 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Stefan Hajnoczi, Peter Maydell, David Gibson

On 11/09/2017 14:08, Alexey Kardashevskiy wrote:
>> Ok, this makes sense.  Maybe it should be a flatview rather than an
>> AddressSpaceDispatch (a FlatView is essentially a list of
>> MemoryRegionSections; attaching the ASD to the FlatView is more or less
>> an implementation detail).
> The helpers I converted from AddressSpace to AddressSpaceDispatch do use
> dispatch structure only and do not use FlatView so it seemed logical.

Understood, but from a design POV FlatView makes more sense.

> btw this address_space in MemoryRegionSection - it does not seem to make
> much sense in the PhysPageMap::sections array, it only makes sense when
> MemoryRegionSection uses as a temporary object when calling listeners. Will
> it make sense if we enforce MemoryRegionSection::address_space to be NULL
> in the array and not NULL when used temporary?

memory_region_section_get_iotlb needs to access the AddressSpaceDispatch
for sections stored in the PhysPageMap array, because
memory_region_section_get_iotlb uses the ASD to compute the section index.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 3/4] memory: Share flat views and dispatch trees between address spaces
  2017-09-11 15:30           ` Paolo Bonzini
@ 2017-09-12  5:55             ` Alexey Kardashevskiy
  2017-09-12  7:12               ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-09-12  5:55 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: Stefan Hajnoczi, Peter Maydell, David Gibson

On 12/09/17 01:30, Paolo Bonzini wrote:
> On 11/09/2017 14:08, Alexey Kardashevskiy wrote:
>>> Ok, this makes sense.  Maybe it should be a flatview rather than an
>>> AddressSpaceDispatch (a FlatView is essentially a list of
>>> MemoryRegionSections; attaching the ASD to the FlatView is more or less
>>> an implementation detail).
>> The helpers I converted from AddressSpace to AddressSpaceDispatch do use
>> dispatch structure only and do not use FlatView so it seemed logical.
> 
> Understood, but from a design POV FlatView makes more sense.
> 
>> btw this address_space in MemoryRegionSection - it does not seem to make
>> much sense in the PhysPageMap::sections array, it only makes sense when
>> MemoryRegionSection uses as a temporary object when calling listeners. Will
>> it make sense if we enforce MemoryRegionSection::address_space to be NULL
>> in the array and not NULL when used temporary?
> 
> memory_region_section_get_iotlb needs to access the AddressSpaceDispatch
> for sections stored in the PhysPageMap array, because
> memory_region_section_get_iotlb uses the ASD to compute the section index.

Ohhh, not extremely trivial, out of curiosity - is that iotlb encoding
described anywhere?

Anyway, this can be simplified (or rather made more straightforward?) -
tlb_set_page_with_attrs() can calculate the section index and pass it to
memory_region_section_get_iotlb(). Still does not make much sense? It just
looks quite useless to keep that address_space pointer alive just for one
case which can easily avoid using this pointer.


-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 3/4] memory: Share flat views and dispatch trees between address spaces
  2017-09-12  5:55             ` Alexey Kardashevskiy
@ 2017-09-12  7:12               ` Paolo Bonzini
  2017-09-12  9:47                 ` Alexey Kardashevskiy
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2017-09-12  7:12 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Stefan Hajnoczi, Peter Maydell, David Gibson

On 12/09/2017 07:55, Alexey Kardashevskiy wrote:
> On 12/09/17 01:30, Paolo Bonzini wrote:
>> On 11/09/2017 14:08, Alexey Kardashevskiy wrote:
>>>> Ok, this makes sense.  Maybe it should be a flatview rather than an
>>>> AddressSpaceDispatch (a FlatView is essentially a list of
>>>> MemoryRegionSections; attaching the ASD to the FlatView is more or less
>>>> an implementation detail).
>>> The helpers I converted from AddressSpace to AddressSpaceDispatch do use
>>> dispatch structure only and do not use FlatView so it seemed logical.
>>
>> Understood, but from a design POV FlatView makes more sense.
>>
>>> btw this address_space in MemoryRegionSection - it does not seem to make
>>> much sense in the PhysPageMap::sections array, it only makes sense when
>>> MemoryRegionSection uses as a temporary object when calling listeners. Will
>>> it make sense if we enforce MemoryRegionSection::address_space to be NULL
>>> in the array and not NULL when used temporary?
>>
>> memory_region_section_get_iotlb needs to access the AddressSpaceDispatch
>> for sections stored in the PhysPageMap array, because
>> memory_region_section_get_iotlb uses the ASD to compute the section index.
> 
> Ohhh, not extremely trivial, out of curiosity - is that iotlb encoding
> described anywhere?

No, I don't think so.

> Anyway, this can be simplified (or rather made more straightforward?) -
> tlb_set_page_with_attrs() can calculate the section index and pass it to
> memory_region_section_get_iotlb(). Still does not make much sense? It just
> looks quite useless to keep that address_space pointer alive just for one
> case which can easily avoid using this pointer.

Hmm I suppose address_space_translate_for_iotlb knows the ASD and could
also return the index, basically combining it and
memory_region_section_get_iotlb() into one function.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 3/4] memory: Share flat views and dispatch trees between address spaces
  2017-09-12  7:12               ` Paolo Bonzini
@ 2017-09-12  9:47                 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-09-12  9:47 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: Stefan Hajnoczi, Peter Maydell, David Gibson

On 12/09/17 17:12, Paolo Bonzini wrote:
> On 12/09/2017 07:55, Alexey Kardashevskiy wrote:
>> On 12/09/17 01:30, Paolo Bonzini wrote:
>>> On 11/09/2017 14:08, Alexey Kardashevskiy wrote:
>>>>> Ok, this makes sense.  Maybe it should be a flatview rather than an
>>>>> AddressSpaceDispatch (a FlatView is essentially a list of
>>>>> MemoryRegionSections; attaching the ASD to the FlatView is more or less
>>>>> an implementation detail).
>>>> The helpers I converted from AddressSpace to AddressSpaceDispatch do use
>>>> dispatch structure only and do not use FlatView so it seemed logical.
>>>
>>> Understood, but from a design POV FlatView makes more sense.
>>>
>>>> btw this address_space in MemoryRegionSection - it does not seem to make
>>>> much sense in the PhysPageMap::sections array, it only makes sense when
>>>> MemoryRegionSection uses as a temporary object when calling listeners. Will
>>>> it make sense if we enforce MemoryRegionSection::address_space to be NULL
>>>> in the array and not NULL when used temporary?
>>>
>>> memory_region_section_get_iotlb needs to access the AddressSpaceDispatch
>>> for sections stored in the PhysPageMap array, because
>>> memory_region_section_get_iotlb uses the ASD to compute the section index.
>>
>> Ohhh, not extremely trivial, out of curiosity - is that iotlb encoding
>> described anywhere?
> 
> No, I don't think so.
> 
>> Anyway, this can be simplified (or rather made more straightforward?) -
>> tlb_set_page_with_attrs() can calculate the section index and pass it to
>> memory_region_section_get_iotlb(). Still does not make much sense? It just
>> looks quite useless to keep that address_space pointer alive just for one
>> case which can easily avoid using this pointer.
> 
> Hmm I suppose address_space_translate_for_iotlb knows the ASD and could
> also return the index, basically combining it and
> memory_region_section_get_iotlb() into one function.

Ok, good.

One more question - how do we decide what goes to exec.c and what goes to
memory.c? I have a temptation to simply embed AddressSpaceDispatch into
FlatView instead of allocating it and storing the pointer (I'll probably
avoid doing so for now anyway but still curios).

The header comment for exec.c says it is "Virtual page mapping" but
AddressSpaceDispatch is a physical address space map and it seems to fit
into the memory.c's "Physical memory management" header comment.


-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-09-12  9:47 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-07  9:20 [Qemu-devel] [RFC PATCH qemu 0/4] memory: Reduce memory use Alexey Kardashevskiy
2017-09-07  9:20 ` [Qemu-devel] [RFC PATCH qemu 1/4] memory: Postpone flatview and dispatch tree building till all devices are added Alexey Kardashevskiy
2017-09-07  9:30   ` Peter Maydell
2017-09-07 14:27     ` Alexey Kardashevskiy
2017-09-07 14:30       ` Peter Maydell
2017-09-08  6:21         ` Alexey Kardashevskiy
2017-09-07  9:20 ` [Qemu-devel] [RFC PATCH qemu 2/4] memory: Prepare for shared flat views Alexey Kardashevskiy
2017-09-09  7:18   ` David Gibson
2017-09-10  9:17     ` Alexey Kardashevskiy
2017-09-07  9:20 ` [Qemu-devel] [RFC PATCH qemu 3/4] memory: Share flat views and dispatch trees between address spaces Alexey Kardashevskiy
2017-09-07 20:53   ` Philippe Mathieu-Daudé
2017-09-07 22:18     ` Alexey Kardashevskiy
2017-09-11  7:40   ` Paolo Bonzini
2017-09-11  9:06     ` Alexey Kardashevskiy
2017-09-11  9:37       ` Paolo Bonzini
2017-09-11 12:08         ` Alexey Kardashevskiy
2017-09-11 15:30           ` Paolo Bonzini
2017-09-12  5:55             ` Alexey Kardashevskiy
2017-09-12  7:12               ` Paolo Bonzini
2017-09-12  9:47                 ` Alexey Kardashevskiy
2017-09-07  9:20 ` [Qemu-devel] [RFC PATCH qemu 4/4] memory: Add flat views to HMP "info mtree" Alexey Kardashevskiy
2017-09-07  9:51 ` [Qemu-devel] [RFC PATCH qemu 0/4] memory: Reduce memory use Dr. David Alan Gilbert
2017-09-07 10:08   ` David Gibson
2017-09-07 14:44   ` Alexey Kardashevskiy
2017-09-07 14:54     ` Dr. David Alan Gilbert
2017-09-08  2:08       ` Alexey Kardashevskiy
2017-09-08  4:04         ` Alexey Kardashevskiy
2017-09-08 11:12         ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.