[Qemu-devel] [PATCH v1 0/6] make memory listener prepared for rcu style

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v1 0/6] make memory listener prepared for rcu style
@ 2013-05-29  2:11 Liu Ping Fan
  2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 1/6] mem: change variable to macro Liu Ping Fan
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Liu Ping Fan @ 2013-05-29  2:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Anthony Liguori

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

This series aim to make mmio dispatch lockless.

Based on Paolo's tree git://github.com/bonzini/qemu.git, branch iommu
For seqlock and rcu, I think Paolo will post them out later.

rfcv1->v1:
  use seqlock to protect reader against writer (thanks to Paolo's suggestion)
  concenter the root of each AddressSpaceDispatch to ease switching to next

Liu Ping Fan (6):
  mem: change variable to macro
  mem: make global dispatch table ready for rcu
  mem: fold tcg listener's logic into core memory listener
  mem: concenter the root of each AddressSpaceDispatch
  mem: make dispatch path satify rcu style
  mem: change tcg code to rcu style

 cputlb.c                        |   25 ++-
 exec.c                          |  462 ++++++++++++++++++++++++---------------
 include/exec/memory-internal.h  |    2 +-
 include/exec/softmmu_template.h |   53 ++++-
 4 files changed, 358 insertions(+), 184 deletions(-)

-- 
1.7.4.4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v1 1/6] mem: change variable to macro
  2013-05-29  2:11 [Qemu-devel] [PATCH v1 0/6] make memory listener prepared for rcu style Liu Ping Fan
@ 2013-05-29  2:11 ` Liu Ping Fan
  2013-05-29  9:06   ` Paolo Bonzini
  2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 2/6] mem: make global dispatch table ready for rcu Liu Ping Fan
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Liu Ping Fan @ 2013-05-29  2:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Anthony Liguori

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

The secions like phys_section_unassigned always has fixed address
in phys_sections, make them declared as macro, so we can use them
when having more than one phys_sections, ie, more than one dispatch
context.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 exec.c |   22 ++++++++++++++--------
 1 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/exec.c b/exec.c
index 281c007..6934c2d 100644
--- a/exec.c
+++ b/exec.c
@@ -82,10 +82,10 @@ int use_icount;
 
 static MemoryRegionSection *phys_sections;
 static unsigned phys_sections_nb, phys_sections_nb_alloc;
-static uint16_t phys_section_unassigned;
-static uint16_t phys_section_notdirty;
-static uint16_t phys_section_rom;
-static uint16_t phys_section_watch;
+#define phys_section_unassigned 0
+#define phys_section_notdirty 1
+#define phys_section_rom 2
+#define phys_section_watch 3
 
 /* Simple allocator for PhysPageEntry nodes */
 static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
@@ -1747,11 +1747,17 @@ static void mem_begin(MemoryListener *listener)
 
 static void core_begin(MemoryListener *listener)
 {
+    uint16_t n;
+
     phys_sections_clear();
-    phys_section_unassigned = dummy_section(&io_mem_unassigned);
-    phys_section_notdirty = dummy_section(&io_mem_notdirty);
-    phys_section_rom = dummy_section(&io_mem_rom);
-    phys_section_watch = dummy_section(&io_mem_watch);
+    n = dummy_section(&io_mem_unassigned);
+    assert(phys_section_unassigned == n);
+    n = dummy_section(&io_mem_notdirty);
+    assert(phys_section_notdirty == n);
+    n = dummy_section(&io_mem_rom);
+    assert(phys_section_rom == n);
+    n = dummy_section(&io_mem_watch);
+    assert(phys_section_watch == n);
 }
 
 static void tcg_commit(MemoryListener *listener)
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v1 2/6] mem: make global dispatch table ready for rcu
  2013-05-29  2:11 [Qemu-devel] [PATCH v1 0/6] make memory listener prepared for rcu style Liu Ping Fan
  2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 1/6] mem: change variable to macro Liu Ping Fan
@ 2013-05-29  2:11 ` Liu Ping Fan
  2013-05-29  7:07   ` Paolo Bonzini
  2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 3/6] mem: fold tcg listener's logic into core memory listener Liu Ping Fan
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Liu Ping Fan @ 2013-05-29  2:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Anthony Liguori

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Currently, phys_node_map and phys_sections are shared by all
of the AddressSpaceDispatch. When updating mem topology, all
AddressSpaceDispatch will rebuild dispatch tables sequentially
on them. In order to use rcu-style, introducing next_node_map
and next_phys_sections, so that when rebuilding, the new dispatch
tables will locate on next_*. After all AddressSpaceDispatch
finished, we can switch to next_* and drop the previous stuff.

This patch still work in the scenario that readers and writer
can not run in parellel (ie, protected by biglock), so no extra
lock method needed till now, and the rcu style consistent issue
will be left to the following patches.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 exec.c |  202 ++++++++++++++++++++++++++++++++++-----------------------------
 1 files changed, 109 insertions(+), 93 deletions(-)

diff --git a/exec.c b/exec.c
index 6934c2d..e5335f5 100644
--- a/exec.c
+++ b/exec.c
@@ -80,16 +80,33 @@ int use_icount;
 
 #if !defined(CONFIG_USER_ONLY)
 
-static MemoryRegionSection *phys_sections;
-static unsigned phys_sections_nb, phys_sections_nb_alloc;
 #define phys_section_unassigned 0
 #define phys_section_notdirty 1
 #define phys_section_rom 2
 #define phys_section_watch 3
 
-/* Simple allocator for PhysPageEntry nodes */
-static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
-static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
+typedef PhysPageEntry Node[L2_SIZE];
+
+typedef struct AllocInfo {
+    unsigned phys_sections_nb;
+    unsigned phys_sections_nb_alloc;
+    unsigned phys_map_nodes_nb;
+    unsigned phys_map_nodes_nb_alloc;
+    /* Only used for release purpose */
+    Node *map;
+    MemoryRegionSection *sections;
+} AllocInfo;
+
+/* For performance, fetch page map related pointers directly, other than
+ * hiding them inside AllocInfo
+ */
+static MemoryRegionSection *cur_phys_sections;
+static Node *cur_map_nodes;
+static AllocInfo *cur_alloc_info;
+
+static MemoryRegionSection *next_phys_sections;
+static Node *next_map_nodes;
+static AllocInfo *next_alloc_info;
 
 #define PHYS_MAP_NODE_NIL (((uint16_t)~0) >> 1)
 
@@ -104,13 +121,15 @@ static MemoryRegion io_mem_watch;
 
 static void phys_map_node_reserve(unsigned nodes)
 {
-    if (phys_map_nodes_nb + nodes > phys_map_nodes_nb_alloc) {
-        typedef PhysPageEntry Node[L2_SIZE];
-        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc * 2, 16);
-        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc,
-                                      phys_map_nodes_nb + nodes);
-        phys_map_nodes = g_renew(Node, phys_map_nodes,
-                                 phys_map_nodes_nb_alloc);
+    AllocInfo *info = next_alloc_info;
+
+    if (info->phys_map_nodes_nb + nodes > info->phys_map_nodes_nb_alloc) {
+        info->phys_map_nodes_nb_alloc = MAX(info->phys_map_nodes_nb_alloc * 2,
+                                            16);
+        info->phys_map_nodes_nb_alloc = MAX(info->phys_map_nodes_nb_alloc,
+                                      info->phys_map_nodes_nb + nodes);
+        next_map_nodes = g_renew(Node, next_map_nodes,
+                                 info->phys_map_nodes_nb_alloc);
     }
 }
 
@@ -118,23 +137,18 @@ static uint16_t phys_map_node_alloc(void)
 {
     unsigned i;
     uint16_t ret;
+    AllocInfo *info = next_alloc_info;
 
-    ret = phys_map_nodes_nb++;
+    ret = info->phys_map_nodes_nb++;
     assert(ret != PHYS_MAP_NODE_NIL);
-    assert(ret != phys_map_nodes_nb_alloc);
+    assert(ret != info->phys_map_nodes_nb_alloc);
     for (i = 0; i < L2_SIZE; ++i) {
-        phys_map_nodes[ret][i].is_leaf = 0;
-        phys_map_nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
+        next_map_nodes[ret][i].is_leaf = 0;
+        next_map_nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
     }
     return ret;
 }
 
-static void phys_map_nodes_reset(void)
-{
-    phys_map_nodes_nb = 0;
-}
-
-
 static void phys_page_set_level(PhysPageEntry *lp, hwaddr *index,
                                 hwaddr *nb, uint16_t leaf,
                                 int level)
@@ -145,7 +159,7 @@ static void phys_page_set_level(PhysPageEntry *lp, hwaddr *index,
 
     if (!lp->is_leaf && lp->ptr == PHYS_MAP_NODE_NIL) {
         lp->ptr = phys_map_node_alloc();
-        p = phys_map_nodes[lp->ptr];
+        p = next_map_nodes[lp->ptr];
         if (level == 0) {
             for (i = 0; i < L2_SIZE; i++) {
                 p[i].is_leaf = 1;
@@ -153,7 +167,7 @@ static void phys_page_set_level(PhysPageEntry *lp, hwaddr *index,
             }
         }
     } else {
-        p = phys_map_nodes[lp->ptr];
+        p = next_map_nodes[lp->ptr];
     }
     lp = &p[(*index >> (level * L2_BITS)) & (L2_SIZE - 1)];
 
@@ -180,20 +194,29 @@ static void phys_page_set(AddressSpaceDispatch *d,
     phys_page_set_level(&d->phys_map, &index, &nb, leaf, P_L2_LEVELS - 1);
 }
 
-static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr index)
+static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr index, bool cur)
 {
     PhysPageEntry lp = d->phys_map;
     PhysPageEntry *p;
     int i;
+    Node *map_nodes;
+    MemoryRegionSection *sections;
 
+    if (likely(cur)) {
+        map_nodes = cur_map_nodes;
+        sections = cur_phys_sections;
+    } else {
+        map_nodes = next_map_nodes;
+        sections = next_phys_sections;
+    }
     for (i = P_L2_LEVELS - 1; i >= 0 && !lp.is_leaf; i--) {
         if (lp.ptr == PHYS_MAP_NODE_NIL) {
-            return &phys_sections[phys_section_unassigned];
+            return &sections[phys_section_unassigned];
         }
-        p = phys_map_nodes[lp.ptr];
+        p = map_nodes[lp.ptr];
         lp = p[(index >> (i * L2_BITS)) & (L2_SIZE - 1)];
     }
-    return &phys_sections[lp.ptr];
+    return &sections[lp.ptr];
 }
 
 bool memory_region_is_unassigned(MemoryRegion *mr)
@@ -205,7 +228,7 @@ bool memory_region_is_unassigned(MemoryRegion *mr)
 static MemoryRegionSection *address_space_lookup_region(AddressSpace *as,
                                                         hwaddr addr)
 {
-    return phys_page_find(as->dispatch, addr >> TARGET_PAGE_BITS);
+    return phys_page_find(as->dispatch, addr >> TARGET_PAGE_BITS, true);
 }
 
 MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr,
@@ -213,7 +236,7 @@ MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr,
                                              bool is_write)
 {
     IOMMUTLBEntry iotlb;
-    MemoryRegionSection *section;
+    MemoryRegionSection *section, *base = cur_phys_sections;
     hwaddr len = *plen;
 
     for (;;) {
@@ -235,7 +258,7 @@ MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr,
                 | (addr & iotlb.addr_mask));
         len = MIN(len, (addr | iotlb.addr_mask) - addr + 1);
         if (!(iotlb.perm & (1 << is_write))) {
-            section = &phys_sections[phys_section_unassigned];
+            section = &base[phys_section_unassigned];
             break;
         }
 
@@ -677,7 +700,7 @@ hwaddr memory_region_section_get_iotlb(CPUArchState *env,
             iotlb |= phys_section_rom;
         }
     } else {
-        iotlb = section - phys_sections;
+        iotlb = section - cur_phys_sections;
         iotlb += xlat;
     }
 
@@ -710,67 +733,31 @@ typedef struct subpage_t {
 static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
                              uint16_t section);
 static subpage_t *subpage_init(hwaddr base);
-static void destroy_page_desc(uint16_t section_index)
-{
-    MemoryRegionSection *section = &phys_sections[section_index];
-    MemoryRegion *mr = section->mr;
-
-    if (mr->subpage) {
-        subpage_t *subpage = container_of(mr, subpage_t, iomem);
-        memory_region_destroy(&subpage->iomem);
-        g_free(subpage);
-    }
-}
-
-static void destroy_l2_mapping(PhysPageEntry *lp, unsigned level)
-{
-    unsigned i;
-    PhysPageEntry *p;
-
-    if (lp->ptr == PHYS_MAP_NODE_NIL) {
-        return;
-    }
-
-    p = phys_map_nodes[lp->ptr];
-    for (i = 0; i < L2_SIZE; ++i) {
-        if (!p[i].is_leaf) {
-            destroy_l2_mapping(&p[i], level - 1);
-        } else {
-            destroy_page_desc(p[i].ptr);
-        }
-    }
-    lp->is_leaf = 0;
-    lp->ptr = PHYS_MAP_NODE_NIL;
-}
-
-static void destroy_all_mappings(AddressSpaceDispatch *d)
-{
-    destroy_l2_mapping(&d->phys_map, P_L2_LEVELS - 1);
-    phys_map_nodes_reset();
-}
 
 static uint16_t phys_section_add(MemoryRegionSection *section)
 {
+    AllocInfo *info = next_alloc_info;
     /* The physical section number is ORed with a page-aligned
      * pointer to produce the iotlb entries.  Thus it should
      * never overflow into the page-aligned value.
      */
-    assert(phys_sections_nb < TARGET_PAGE_SIZE);
+    assert(info->phys_sections_nb < TARGET_PAGE_SIZE);
 
-    if (phys_sections_nb == phys_sections_nb_alloc) {
-        phys_sections_nb_alloc = MAX(phys_sections_nb_alloc * 2, 16);
-        phys_sections = g_renew(MemoryRegionSection, phys_sections,
-                                phys_sections_nb_alloc);
+    if (info->phys_sections_nb == info->phys_sections_nb_alloc) {
+        info->phys_sections_nb_alloc = MAX(info->phys_sections_nb_alloc * 2,
+                                            16);
+        next_phys_sections = g_renew(MemoryRegionSection, next_phys_sections,
+                                info->phys_sections_nb_alloc);
     }
-    phys_sections[phys_sections_nb] = *section;
+    next_phys_sections[info->phys_sections_nb] = *section;
     memory_region_ref(section->mr);
-    return phys_sections_nb++;
+    return info->phys_sections_nb++;
 }
 
-static void phys_sections_clear(void)
+static void phys_sections_clear(unsigned cnt, MemoryRegionSection *mrs)
 {
-    while (phys_sections_nb > 0) {
-        MemoryRegionSection *section = &phys_sections[--phys_sections_nb];
+    while (cnt > 0) {
+        MemoryRegionSection *section = &mrs[--cnt];
         memory_region_unref(section->mr);
     }
 }
@@ -780,7 +767,8 @@ static void register_subpage(AddressSpaceDispatch *d, MemoryRegionSection *secti
     subpage_t *subpage;
     hwaddr base = section->offset_within_address_space
         & TARGET_PAGE_MASK;
-    MemoryRegionSection *existing = phys_page_find(d, base >> TARGET_PAGE_BITS);
+    MemoryRegionSection *existing = phys_page_find(d,
+                                    base >> TARGET_PAGE_BITS, false);
     MemoryRegionSection subsection = {
         .offset_within_address_space = base,
         .size = TARGET_PAGE_SIZE,
@@ -1572,13 +1560,13 @@ static uint64_t subpage_read(void *opaque, hwaddr addr,
     unsigned int idx = SUBPAGE_IDX(addr);
     uint64_t val;
 
-    MemoryRegionSection *section;
+    MemoryRegionSection *section, *base = cur_phys_sections;
 #if defined(DEBUG_SUBPAGE)
     printf("%s: subpage %p len %d addr " TARGET_FMT_plx " idx %d\n", __func__,
            mmio, len, addr, idx);
 #endif
 
-    section = &phys_sections[mmio->sub_section[idx]];
+    section = &base[mmio->sub_section[idx]];
     addr += mmio->base;
     addr -= section->offset_within_address_space;
     addr += section->offset_within_region;
@@ -1591,14 +1579,14 @@ static void subpage_write(void *opaque, hwaddr addr,
 {
     subpage_t *mmio = opaque;
     unsigned int idx = SUBPAGE_IDX(addr);
-    MemoryRegionSection *section;
+    MemoryRegionSection *section, *base = cur_phys_sections;
 #if defined(DEBUG_SUBPAGE)
     printf("%s: subpage %p len %d addr " TARGET_FMT_plx
            " idx %d value %"PRIx64"\n",
            __func__, mmio, len, addr, idx, value);
 #endif
 
-    section = &phys_sections[mmio->sub_section[idx]];
+    section = &base[mmio->sub_section[idx]];
     addr += mmio->base;
     addr -= section->offset_within_address_space;
     addr += section->offset_within_region;
@@ -1610,14 +1598,14 @@ static bool subpage_accepts(void *opaque, hwaddr addr,
 {
     subpage_t *mmio = opaque;
     unsigned int idx = SUBPAGE_IDX(addr);
-    MemoryRegionSection *section;
+    MemoryRegionSection *section, *base = cur_phys_sections;;
 #if defined(DEBUG_SUBPAGE)
     printf("%s: subpage %p %c len %d addr " TARGET_FMT_plx
            " idx %d\n", __func__, mmio,
            is_write ? 'w' : 'r', len, addr, idx);
 #endif
 
-    section = &phys_sections[mmio->sub_section[idx]];
+    section = &base[mmio->sub_section[idx]];
     addr += mmio->base;
     addr -= section->offset_within_address_space;
     addr += section->offset_within_region;
@@ -1676,8 +1664,8 @@ static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
     printf("%s: %p start %08x end %08x idx %08x eidx %08x mem %ld\n", __func__,
            mmio, start, end, idx, eidx, memory);
 #endif
-    if (memory_region_is_ram(phys_sections[section].mr)) {
-        MemoryRegionSection new_section = phys_sections[section];
+    if (memory_region_is_ram(next_phys_sections[section].mr)) {
+        MemoryRegionSection new_section = next_phys_sections[section];
         new_section.mr = &io_mem_subpage_ram;
         section = phys_section_add(&new_section);
     }
@@ -1721,7 +1709,7 @@ static uint16_t dummy_section(MemoryRegion *mr)
 
 MemoryRegion *iotlb_to_region(hwaddr index)
 {
-    return phys_sections[index & ~TARGET_PAGE_MASK].mr;
+    return cur_phys_sections[index & ~TARGET_PAGE_MASK].mr;
 }
 
 static void io_mem_init(void)
@@ -1741,7 +1729,6 @@ static void mem_begin(MemoryListener *listener)
 {
     AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch, listener);
 
-    destroy_all_mappings(d);
     d->phys_map.ptr = PHYS_MAP_NODE_NIL;
 }
 
@@ -1749,7 +1736,7 @@ static void core_begin(MemoryListener *listener)
 {
     uint16_t n;
 
-    phys_sections_clear();
+    next_alloc_info = g_malloc0(sizeof(AllocInfo));
     n = dummy_section(&io_mem_unassigned);
     assert(phys_section_unassigned == n);
     n = dummy_section(&io_mem_notdirty);
@@ -1760,6 +1747,35 @@ static void core_begin(MemoryListener *listener)
     assert(phys_section_watch == n);
 }
 
+static void release_dispatch_map(AllocInfo *info)
+{
+    phys_sections_clear(info->phys_sections_nb, info->sections);
+    g_free(info->map);
+    g_free(info->sections);
+    g_free(info);
+}
+
+/* This listener's commit run after the other AddressSpaceDispatch listeners'.
+ * It means that AddressSpaceDispatch's deleter has finished, so it can be
+ * the place for call_rcu()
+ */
+static void core_commit(MemoryListener *listener)
+{
+    AllocInfo *info = cur_alloc_info;
+    info->map = cur_map_nodes;
+    info->sections = cur_phys_sections;
+
+    /* Fix me, in future, will be protected by wr seqlock when in parellel
+     * with reader
+     */
+    cur_map_nodes = next_map_nodes;
+    cur_phys_sections = next_phys_sections;
+    cur_alloc_info = next_alloc_info;
+
+    /* Fix me, will changed to call_rcu */
+    release_dispatch_map(info);
+}
+
 static void tcg_commit(MemoryListener *listener)
 {
     CPUArchState *env;
@@ -1802,6 +1818,7 @@ static void io_region_del(MemoryListener *listener,
 
 static MemoryListener core_memory_listener = {
     .begin = core_begin,
+    .commit = core_commit,
     .log_global_start = core_log_global_start,
     .log_global_stop = core_log_global_stop,
     .priority = 1,
@@ -1837,7 +1854,6 @@ void address_space_destroy_dispatch(AddressSpace *as)
     AddressSpaceDispatch *d = as->dispatch;
 
     memory_listener_unregister(&d->listener);
-    destroy_l2_mapping(&d->phys_map, P_L2_LEVELS - 1);
     g_free(d);
     as->dispatch = NULL;
 }
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v1 3/6] mem: fold tcg listener's logic into core memory listener
  2013-05-29  2:11 [Qemu-devel] [PATCH v1 0/6] make memory listener prepared for rcu style Liu Ping Fan
  2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 1/6] mem: change variable to macro Liu Ping Fan
  2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 2/6] mem: make global dispatch table ready for rcu Liu Ping Fan
@ 2013-05-29  2:11 ` Liu Ping Fan
  2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 4/6] mem: concenter the root of each AddressSpaceDispatch Liu Ping Fan
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Liu Ping Fan @ 2013-05-29  2:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Anthony Liguori

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

We can do the tcg listener's logic in core memory listener. And this
will help us concentrate the rcu updater.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 exec.c |   19 +++++--------------
 1 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/exec.c b/exec.c
index e5335f5..eb69a98 100644
--- a/exec.c
+++ b/exec.c
@@ -1761,6 +1761,7 @@ static void release_dispatch_map(AllocInfo *info)
  */
 static void core_commit(MemoryListener *listener)
 {
+    CPUArchState *env;
     AllocInfo *info = cur_alloc_info;
     info->map = cur_map_nodes;
     info->sections = cur_phys_sections;
@@ -1772,20 +1773,15 @@ static void core_commit(MemoryListener *listener)
     cur_phys_sections = next_phys_sections;
     cur_alloc_info = next_alloc_info;
 
-    /* Fix me, will changed to call_rcu */
-    release_dispatch_map(info);
-}
-
-static void tcg_commit(MemoryListener *listener)
-{
-    CPUArchState *env;
-
     /* since each CPU stores ram addresses in its TLB cache, we must
        reset the modified entries */
     /* XXX: slow ! */
-    for(env = first_cpu; env != NULL; env = env->next_cpu) {
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
         tlb_flush(env, 1);
     }
+
+    /* Fix me, will changed to call_rcu */
+    release_dispatch_map(info);
 }
 
 static void core_log_global_start(MemoryListener *listener)
@@ -1830,10 +1826,6 @@ static MemoryListener io_memory_listener = {
     .priority = 0,
 };
 
-static MemoryListener tcg_memory_listener = {
-    .commit = tcg_commit,
-};
-
 void address_space_init_dispatch(AddressSpace *as)
 {
     AddressSpaceDispatch *d = g_new(AddressSpaceDispatch, 1);
@@ -1870,7 +1862,6 @@ static void memory_map_init(void)
 
     memory_listener_register(&core_memory_listener, &address_space_memory);
     memory_listener_register(&io_memory_listener, &address_space_io);
-    memory_listener_register(&tcg_memory_listener, &address_space_memory);
 }
 
 MemoryRegion *get_system_memory(void)
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v1 4/6] mem: concenter the root of each AddressSpaceDispatch
  2013-05-29  2:11 [Qemu-devel] [PATCH v1 0/6] make memory listener prepared for rcu style Liu Ping Fan
                   ` (2 preceding siblings ...)
  2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 3/6] mem: fold tcg listener's logic into core memory listener Liu Ping Fan
@ 2013-05-29  2:11 ` Liu Ping Fan
  2013-05-29  7:03   ` Paolo Bonzini
  2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 5/6] mem: make dispatch path satify rcu style Liu Ping Fan
  2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 6/6] mem: change tcg code to " Liu Ping Fan
  5 siblings, 1 reply; 19+ messages in thread
From: Liu Ping Fan @ 2013-05-29  2:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Anthony Liguori

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

All of AddressSpaceDispatch's roots are part of dispatch context,
along with cur_map_nodes, cur_phys_sections, and we should walk
through AddressSpaceDispatchs in the same dispatch context, ie
the same memory topology.  Concenter the roots, so we can switch
to next more easily.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 exec.c                         |   48 ++++++++++++++++++++++++++++++++++++---
 include/exec/memory-internal.h |    2 +-
 2 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/exec.c b/exec.c
index eb69a98..315960d 100644
--- a/exec.c
+++ b/exec.c
@@ -49,6 +49,7 @@
 #include "translate-all.h"
 
 #include "exec/memory-internal.h"
+#include "qemu/bitops.h"
 
 //#define DEBUG_SUBPAGE
 
@@ -95,6 +96,7 @@ typedef struct AllocInfo {
     /* Only used for release purpose */
     Node *map;
     MemoryRegionSection *sections;
+    PhysPageEntry *roots;
 } AllocInfo;
 
 /* For performance, fetch page map related pointers directly, other than
@@ -102,10 +104,12 @@ typedef struct AllocInfo {
  */
 static MemoryRegionSection *cur_phys_sections;
 static Node *cur_map_nodes;
+static PhysPageEntry *cur_roots;
 static AllocInfo *cur_alloc_info;
 
 static MemoryRegionSection *next_phys_sections;
 static Node *next_map_nodes;
+static PhysPageEntry *next_roots;
 static AllocInfo *next_alloc_info;
 
 #define PHYS_MAP_NODE_NIL (((uint16_t)~0) >> 1)
@@ -191,12 +195,13 @@ static void phys_page_set(AddressSpaceDispatch *d,
     /* Wildly overreserve - it doesn't matter much. */
     phys_map_node_reserve(3 * P_L2_LEVELS);
 
-    phys_page_set_level(&d->phys_map, &index, &nb, leaf, P_L2_LEVELS - 1);
+    phys_page_set_level(&next_roots[d->idx], &index, &nb, leaf,
+                        P_L2_LEVELS - 1);
 }
 
 static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr index, bool cur)
 {
-    PhysPageEntry lp = d->phys_map;
+    PhysPageEntry lp;
     PhysPageEntry *p;
     int i;
     Node *map_nodes;
@@ -205,9 +210,11 @@ static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr index
     if (likely(cur)) {
         map_nodes = cur_map_nodes;
         sections = cur_phys_sections;
+        lp = cur_roots[d->idx];
     } else {
         map_nodes = next_map_nodes;
         sections = next_phys_sections;
+        lp = next_roots[d->idx];
     }
     for (i = P_L2_LEVELS - 1; i >= 0 && !lp.is_leaf; i--) {
         if (lp.ptr == PHYS_MAP_NODE_NIL) {
@@ -1729,7 +1736,8 @@ static void mem_begin(MemoryListener *listener)
 {
     AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch, listener);
 
-    d->phys_map.ptr = PHYS_MAP_NODE_NIL;
+    next_roots[d->idx] = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL,
+                                    is_leaf = 0 };
 }
 
 static void core_begin(MemoryListener *listener)
@@ -1737,6 +1745,7 @@ static void core_begin(MemoryListener *listener)
     uint16_t n;
 
     next_alloc_info = g_malloc0(sizeof(AllocInfo));
+    next_roots = g_malloc0(2048*sizeof(PhysPageEntry));
     n = dummy_section(&io_mem_unassigned);
     assert(phys_section_unassigned == n);
     n = dummy_section(&io_mem_notdirty);
@@ -1752,6 +1761,7 @@ static void release_dispatch_map(AllocInfo *info)
     phys_sections_clear(info->phys_sections_nb, info->sections);
     g_free(info->map);
     g_free(info->sections);
+    g_free(info->roots);
     g_free(info);
 }
 
@@ -1765,6 +1775,7 @@ static void core_commit(MemoryListener *listener)
     AllocInfo *info = cur_alloc_info;
     info->map = cur_map_nodes;
     info->sections = cur_phys_sections;
+    info->roots = cur_roots;
 
     /* Fix me, in future, will be protected by wr seqlock when in parellel
      * with reader
@@ -1772,6 +1783,7 @@ static void core_commit(MemoryListener *listener)
     cur_map_nodes = next_map_nodes;
     cur_phys_sections = next_phys_sections;
     cur_alloc_info = next_alloc_info;
+    cur_roots = next_roots;
 
     /* since each CPU stores ram addresses in its TLB cache, we must
        reset the modified entries */
@@ -1826,11 +1838,36 @@ static MemoryListener io_memory_listener = {
     .priority = 0,
 };
 
+static MemoryListener tcg_memory_listener = {
+    .commit = tcg_commit,
+};
+
+#define MAX_IDX (1<<15)
+static unsigned long *address_space_id_map;
+static QemuMutex id_lock;
+
+static void address_space_release_id(int16_t idx)
+{
+    qemu_mutex_lock(&id_lock);
+    clear_bit(idx, address_space_id_map);
+    qemu_mutex_unlock(&id_lock);
+}
+
+static int16_t address_space_alloc_id()
+{
+    unsigned long idx;
+
+    qemu_mutex_lock(&id_lock);
+    idx = find_first_zero_bit(address_space_id_map, MAX_IDX/BITS_PER_LONG);
+    set_bit(idx, address_space_id_map);
+    qemu_mutex_unlock(&id_lock);
+}
+
 void address_space_init_dispatch(AddressSpace *as)
 {
     AddressSpaceDispatch *d = g_new(AddressSpaceDispatch, 1);
 
-    d->phys_map  = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
+    d->idx = address_space_alloc_id();
     d->listener = (MemoryListener) {
         .begin = mem_begin,
         .region_add = mem_add,
@@ -1845,6 +1882,7 @@ void address_space_destroy_dispatch(AddressSpace *as)
 {
     AddressSpaceDispatch *d = as->dispatch;
 
+    address_space_release_id(d->idx);
     memory_listener_unregister(&d->listener);
     g_free(d);
     as->dispatch = NULL;
@@ -1852,6 +1890,8 @@ void address_space_destroy_dispatch(AddressSpace *as)
 
 static void memory_map_init(void)
 {
+    qemu_mutex_init(&id_lock);
+    address_space_id_map = g_malloc0(MAX_IDX/BITS_PER_BYTE);
     system_memory = g_malloc(sizeof(*system_memory));
     memory_region_init(system_memory, "system", INT64_MAX);
     address_space_init(&address_space_memory, system_memory, "memory");
diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
index 799c02a..54a3635 100644
--- a/include/exec/memory-internal.h
+++ b/include/exec/memory-internal.h
@@ -36,7 +36,7 @@ struct AddressSpaceDispatch {
     /* This is a multi-level map on the physical address space.
      * The bottom level has pointers to MemoryRegionSections.
      */
-    PhysPageEntry phys_map;
+    int16_t idx;
     MemoryListener listener;
 };
 
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v1 5/6] mem: make dispatch path satify rcu style
  2013-05-29  2:11 [Qemu-devel] [PATCH v1 0/6] make memory listener prepared for rcu style Liu Ping Fan
                   ` (3 preceding siblings ...)
  2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 4/6] mem: concenter the root of each AddressSpaceDispatch Liu Ping Fan
@ 2013-05-29  2:11 ` Liu Ping Fan
  2013-05-29  7:06   ` Paolo Bonzini
  2013-05-29  7:15   ` Paolo Bonzini
  2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 6/6] mem: change tcg code to " Liu Ping Fan
  5 siblings, 2 replies; 19+ messages in thread
From: Liu Ping Fan @ 2013-05-29  2:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Anthony Liguori

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Using seqlock to load dispatch context atomic. The dispatch
context consist of: cur_map_nodes, cur_sections, cur_roots.

Also during the dispatch, we should get the terminal, and dup
MemoryRegionSection. So after rcu unlock, the cur dispatch
context can be dropped safely.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

--------
The tcg code related to tlb_set_page() should be fixed too.
But it is a little complicated, separating it for easing review.
---
 exec.c |  239 ++++++++++++++++++++++++++++++++++++++++-----------------------
 1 files changed, 152 insertions(+), 87 deletions(-)

diff --git a/exec.c b/exec.c
index 315960d..9a5c67f 100644
--- a/exec.c
+++ b/exec.c
@@ -112,6 +112,8 @@ static Node *next_map_nodes;
 static PhysPageEntry *next_roots;
 static AllocInfo *next_alloc_info;
 
+QemuSeqLock dispatch_table_lock;
+
 #define PHYS_MAP_NODE_NIL (((uint16_t)~0) >> 1)
 
 static void io_mem_init(void);
@@ -199,31 +201,24 @@ static void phys_page_set(AddressSpaceDispatch *d,
                         P_L2_LEVELS - 1);
 }
 
-static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr index, bool cur)
+static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d,
+            hwaddr index,
+            Node *map_base,
+            MemoryRegionSection *sections_base,
+            PhysPageEntry *root_base)
 {
-    PhysPageEntry lp;
+    PhysPageEntry lp = root_base[d->idx];
     PhysPageEntry *p;
     int i;
-    Node *map_nodes;
-    MemoryRegionSection *sections;
 
-    if (likely(cur)) {
-        map_nodes = cur_map_nodes;
-        sections = cur_phys_sections;
-        lp = cur_roots[d->idx];
-    } else {
-        map_nodes = next_map_nodes;
-        sections = next_phys_sections;
-        lp = next_roots[d->idx];
-    }
     for (i = P_L2_LEVELS - 1; i >= 0 && !lp.is_leaf; i--) {
         if (lp.ptr == PHYS_MAP_NODE_NIL) {
-            return &sections[phys_section_unassigned];
+            return &sections_base[phys_section_unassigned];
         }
-        p = map_nodes[lp.ptr];
+        p = map_base[lp.ptr];
         lp = p[(index >> (i * L2_BITS)) & (L2_SIZE - 1)];
     }
-    return &sections[lp.ptr];
+    return &sections_base[lp.ptr];
 }
 
 bool memory_region_is_unassigned(MemoryRegion *mr)
@@ -233,21 +228,28 @@ bool memory_region_is_unassigned(MemoryRegion *mr)
 }
 
 static MemoryRegionSection *address_space_lookup_region(AddressSpace *as,
-                                                        hwaddr addr)
+        hwaddr addr, Node *map_base, MemoryRegionSection *sections_base,
+        PhysPageEntry *root_base)
+
 {
-    return phys_page_find(as->dispatch, addr >> TARGET_PAGE_BITS, true);
+    return phys_page_find(as->dispatch, addr >> TARGET_PAGE_BITS,
+                                    map_base, sections_base, root_base);
 }
 
 MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr,
                                              hwaddr *xlat, hwaddr *plen,
-                                             bool is_write)
+                                             bool is_write,
+                                             Node *map_base,
+                                             MemoryRegionSection *sections_base,
+                                             PhysPageEntry *root_base)
 {
     IOMMUTLBEntry iotlb;
-    MemoryRegionSection *section, *base = cur_phys_sections;
+    MemoryRegionSection *section;
     hwaddr len = *plen;
 
     for (;;) {
-        section = address_space_lookup_region(as, addr);
+        section = address_space_lookup_region(as, addr, map_base,
+                        sections_base, root_base);
 
         /* Compute offset within MemoryRegionSection */
         addr -= section->offset_within_address_space;
@@ -265,7 +267,7 @@ MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr,
                 | (addr & iotlb.addr_mask));
         len = MIN(len, (addr | iotlb.addr_mask) - addr + 1);
         if (!(iotlb.perm & (1 << is_write))) {
-            section = &base[phys_section_unassigned];
+            section = &sections_base[phys_section_unassigned];
             break;
         }
 
@@ -276,6 +278,42 @@ MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr,
     *xlat = addr;
     return section;
 }
+
+/* caller should hold rcu read lock */
+MemoryRegionSection address_space_get_terminal(AddressSpace *as, hwaddr addr,
+                                             hwaddr *xlat, hwaddr *plen,
+                                             bool is_write)
+{
+    MemoryRegionSection *mrs, *ret, *sections_base;
+    Node *map_base;
+    subpage_t *mmio;
+    unsigned int idx, start;
+    PhysPageEntry *root_base;
+
+    /* Walk through all of the AddressSpace in consistent */
+    do {
+        start = seqlock_read_begin(&dispatch_table_lock);
+        map_base = cur_map_nodes;
+        sections_base = cur_phys_sections;
+        /* get all AddressSpaceDispatch root */
+        root_base = cur_roots;
+    } while (seqlock_read_check(&dispatch_table_lock, start);
+
+    mrs = address_space_translate(as, addr, xlat, plen, is_write,
+                    map_base, sections_base, root_base);
+    if (!mrs->mr->terminates) {
+        mmio = container_of(mrs->mr, subpage_t, mmio);
+        idx = SUBPAGE_IDX(addr);
+        ret = &sections_base[mmio->sub_section[idx]];
+        *xlat += mmio->base;
+        *xlat -= mrs->offset_within_address_space;
+        *xlat += mrs->offset_within_region;
+    } else {
+        ret = mrs;
+    }
+
+    return *ret;
+}
 #endif
 
 void cpu_exec_init_all(void)
@@ -1777,13 +1815,12 @@ static void core_commit(MemoryListener *listener)
     info->sections = cur_phys_sections;
     info->roots = cur_roots;
 
-    /* Fix me, in future, will be protected by wr seqlock when in parellel
-     * with reader
-     */
+    seqlock_write_lock(&dispatch_table_lock);
     cur_map_nodes = next_map_nodes;
     cur_phys_sections = next_phys_sections;
     cur_alloc_info = next_alloc_info;
     cur_roots = next_roots;
+    seqlock_write_unlock(&dispatch_table_lock);
 
     /* since each CPU stores ram addresses in its TLB cache, we must
        reset the modified entries */
@@ -1890,8 +1927,13 @@ void address_space_destroy_dispatch(AddressSpace *as)
 
 static void memory_map_init(void)
 {
+    QemuMutex *mutex;
+
     qemu_mutex_init(&id_lock);
     address_space_id_map = g_malloc0(MAX_IDX/BITS_PER_BYTE);
+    mutex = g_malloc0(sizeof(QemuMutex));
+    qemu_mutex_init(mutex);
+    seqlock_init(&dispatch_table_lock, mutex);
     system_memory = g_malloc(sizeof(*system_memory));
     memory_region_init(system_memory, "system", INT64_MAX);
     address_space_init(&address_space_memory, system_memory, "memory");
@@ -2001,59 +2043,68 @@ bool address_space_rw(AddressSpace *as, hwaddr addr, uint8_t *buf,
     uint8_t *ptr;
     uint64_t val;
     hwaddr addr1;
-    MemoryRegionSection *section;
+    MemoryRegionSection section;
     bool error = false;
 
     while (len > 0) {
         l = len;
-        section = address_space_translate(as, addr, &addr1, &l, is_write);
+        rcu_read_lock();
+        section = address_space_get_terminal(as, addr, &addr1, &l, is_write);
 
         if (is_write) {
-            if (!memory_access_is_direct(section->mr, is_write)) {
+            if (!memory_access_is_direct(section.mr, is_write)) {
                 l = memory_access_size(l, addr1);
                 /* XXX: could force cpu_single_env to NULL to avoid
                    potential bugs */
+                memory_region_ref(section.mr);
+                rcu_read_unlock();
                 if (l == 4) {
                     /* 32 bit write access */
                     val = ldl_p(buf);
-                    error |= io_mem_write(section->mr, addr1, val, 4);
+                    error |= io_mem_write(section.mr, addr1, val, 4);
                 } else if (l == 2) {
                     /* 16 bit write access */
                     val = lduw_p(buf);
-                    error |= io_mem_write(section->mr, addr1, val, 2);
+                    error |= io_mem_write(section.mr, addr1, val, 2);
                 } else {
                     /* 8 bit write access */
                     val = ldub_p(buf);
-                    error |= io_mem_write(section->mr, addr1, val, 1);
+                    error |= io_mem_write(section.mr, addr1, val, 1);
                 }
+                memory_region_unref(section.mr);
             } else {
-                addr1 += memory_region_get_ram_addr(section->mr);
+                addr1 += memory_region_get_ram_addr(section.mr);
                 /* RAM case */
                 ptr = qemu_get_ram_ptr(addr1);
                 memcpy(ptr, buf, l);
                 invalidate_and_set_dirty(addr1, l);
+                rcu_read_unlock();
             }
         } else {
-            if (!memory_access_is_direct(section->mr, is_write)) {
+            if (!memory_access_is_direct(section.mr, is_write)) {
                 /* I/O case */
+                memory_region_ref(section.mr);
+                rcu_read_unlock();
                 l = memory_access_size(l, addr1);
                 if (l == 4) {
                     /* 32 bit read access */
-                    error |= io_mem_read(section->mr, addr1, &val, 4);
+                    error |= io_mem_read(section.mr, addr1, &val, 4);
                     stl_p(buf, val);
                 } else if (l == 2) {
                     /* 16 bit read access */
-                    error |= io_mem_read(section->mr, addr1, &val, 2);
+                    error |= io_mem_read(section.mr, addr1, &val, 2);
                     stw_p(buf, val);
                 } else {
                     /* 8 bit read access */
-                    error |= io_mem_read(section->mr, addr1, &val, 1);
+                    error |= io_mem_read(section.mr, addr1, &val, 1);
                     stb_p(buf, val);
                 }
+                memory_region_unref(section.mr);
             } else {
                 /* RAM case */
-                ptr = qemu_get_ram_ptr(section->mr->ram_addr + addr1);
+                ptr = qemu_get_ram_ptr(section.mr->ram_addr + addr1);
                 memcpy(buf, ptr, l);
+                rcu_read_unlock();
             }
         }
         len -= l;
@@ -2089,18 +2140,19 @@ void cpu_physical_memory_write_rom(hwaddr addr,
     hwaddr l;
     uint8_t *ptr;
     hwaddr addr1;
-    MemoryRegionSection *section;
+    MemoryRegionSection section;
 
+    rcu_read_lock();
     while (len > 0) {
         l = len;
-        section = address_space_translate(&address_space_memory,
+        section = address_space_get_terminal(&address_space_memory,
                                           addr, &addr1, &l, true);
 
-        if (!(memory_region_is_ram(section->mr) ||
-              memory_region_is_romd(section->mr))) {
+        if (!(memory_region_is_ram(section.mr) ||
+              memory_region_is_romd(section.mr))) {
             /* do nothing */
         } else {
-            addr1 += memory_region_get_ram_addr(section->mr);
+            addr1 += memory_region_get_ram_addr(section.mr);
             /* ROM/RAM case */
             ptr = qemu_get_ram_ptr(addr1);
             memcpy(ptr, buf, l);
@@ -2110,6 +2162,7 @@ void cpu_physical_memory_write_rom(hwaddr addr,
         buf += l;
         addr += l;
     }
+    rcu_read_unlock();
 }
 
 typedef struct {
@@ -2161,15 +2214,15 @@ static void cpu_notify_map_clients(void)
 
 bool address_space_access_valid(AddressSpace *as, hwaddr addr, int len, bool is_write)
 {
-    MemoryRegionSection *section;
+    MemoryRegionSection section;
     hwaddr l, xlat;
 
     while (len > 0) {
         l = len;
-        section = address_space_translate(as, addr, &xlat, &l, is_write);
-        if (!memory_access_is_direct(section->mr, is_write)) {
+        section = address_space_get_terminal(as, addr, &xlat, &l, is_write);
+        if (!memory_access_is_direct(section.mr, is_write)) {
             l = memory_access_size(l, addr);
-            if (!memory_region_access_valid(section->mr, xlat, l, is_write)) {
+            if (!memory_region_access_valid(section.mr, xlat, l, is_write)) {
                 return false;
             }
         }
@@ -2195,14 +2248,14 @@ void *address_space_map(AddressSpace *as,
     hwaddr len = *plen;
     hwaddr todo = 0;
     hwaddr l, xlat;
-    MemoryRegionSection *section;
+    MemoryRegionSection section;
     ram_addr_t raddr = RAM_ADDR_MAX;
     ram_addr_t rlen;
     void *ret;
 
     while (len > 0) {
         l = len;
-        section = address_space_translate(as, addr, &xlat, &l, is_write);
+        section = address_space_get_terminal(as, addr, &xlat, &l, is_write);
 
         if (!memory_access_is_direct(section->mr, is_write)) {
             if (todo || bounce.buffer) {
@@ -2211,20 +2264,20 @@ void *address_space_map(AddressSpace *as,
             bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, TARGET_PAGE_SIZE);
             bounce.addr = addr;
             bounce.len = l;
-            bounce.mr = section->mr;
+            bounce.mr = section.mr;
             if (!is_write) {
                 address_space_read(as, addr, bounce.buffer, l);
             }
 
             *plen = l;
-            memory_region_ref(section->mr);
+            memory_region_ref(section.mr);
             return bounce.buffer;
         }
         if (!todo) {
-            raddr = memory_region_get_ram_addr(section->mr) + xlat;
-            memory_region_ref(section->mr);
+            raddr = memory_region_get_ram_addr(section.mr) + xlat;
+            memory_region_ref(section.mr);
         } else {
-            if (memory_region_get_ram_addr(section->mr) + xlat != raddr + todo) {
+            if (memory_region_get_ram_addr(section.mr) + xlat != raddr + todo) {
                 break;
             }
         }
@@ -2297,15 +2350,17 @@ static inline uint32_t ldl_phys_internal(hwaddr addr,
 {
     uint8_t *ptr;
     uint64_t val;
-    MemoryRegionSection *section;
+    MemoryRegionSection section;
     hwaddr l = 4;
     hwaddr addr1;
 
-    section = address_space_translate(&address_space_memory, addr, &addr1, &l,
-                                      false);
+    rcu_read_lock();
+    section = address_space_get_terminal(&address_space_memory, addr, &addr1,
+                                    &l, false);
+    rcu_read_unlock();
     if (l < 4 || !memory_access_is_direct(section->mr, false)) {
         /* I/O case */
-        io_mem_read(section->mr, addr1, &val, 4);
+        io_mem_read(section.mr, addr1, &val, 4);
 #if defined(TARGET_WORDS_BIGENDIAN)
         if (endian == DEVICE_LITTLE_ENDIAN) {
             val = bswap32(val);
@@ -2360,8 +2415,10 @@ static inline uint64_t ldq_phys_internal(hwaddr addr,
     hwaddr l = 8;
     hwaddr addr1;
 
-    section = address_space_translate(&address_space_memory, addr, &addr1, &l,
-                                      false);
+    rcu_read_lock();
+    section = address_space_get_terminal(&address_space_memory, addr, &addr1,
+                                    &l, false);
+    rcu_read_unlock();
     if (l < 8 || !memory_access_is_direct(section->mr, false)) {
         /* I/O case */
         io_mem_read(section->mr, addr1, &val, 8);
@@ -2423,15 +2480,17 @@ static inline uint32_t lduw_phys_internal(hwaddr addr,
 {
     uint8_t *ptr;
     uint64_t val;
-    MemoryRegionSection *section;
+    MemoryRegionSection section;
     hwaddr l = 2;
     hwaddr addr1;
 
-    section = address_space_translate(&address_space_memory, addr, &addr1, &l,
-                                      false);
-    if (l < 2 || !memory_access_is_direct(section->mr, false)) {
+    rcu_read_lock();
+    section = address_space_get_terminal(&address_space_memory, addr, &addr1,
+                                    &l, false);
+    rcu_read_unlock();
+    if (l < 2 || !memory_access_is_direct(section.mr, false)) {
         /* I/O case */
-        io_mem_read(section->mr, addr1, &val, 2);
+        io_mem_read(section.mr, addr1, &val, 2);
 #if defined(TARGET_WORDS_BIGENDIAN)
         if (endian == DEVICE_LITTLE_ENDIAN) {
             val = bswap16(val);
@@ -2443,7 +2502,7 @@ static inline uint32_t lduw_phys_internal(hwaddr addr,
 #endif
     } else {
         /* RAM case */
-        ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(section->mr)
+        ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(section.mr)
                                 & TARGET_PAGE_MASK)
                                + addr1);
         switch (endian) {
@@ -2482,16 +2541,18 @@ uint32_t lduw_be_phys(hwaddr addr)
 void stl_phys_notdirty(hwaddr addr, uint32_t val)
 {
     uint8_t *ptr;
-    MemoryRegionSection *section;
+    MemoryRegionSection section;
     hwaddr l = 4;
     hwaddr addr1;
 
-    section = address_space_translate(&address_space_memory, addr, &addr1, &l,
-                                      true);
-    if (l < 4 || !memory_access_is_direct(section->mr, true)) {
-        io_mem_write(section->mr, addr1, val, 4);
+    rcu_read_lock();
+    section = address_space_get_terminal(&address_space_memory, addr, &addr1,
+                                    &l, true);
+    rcu_read_unlock();
+    if (l < 4 || !memory_access_is_direct(section.mr, true)) {
+        io_mem_write(section.mr, addr1, val, 4);
     } else {
-        addr1 += memory_region_get_ram_addr(section->mr) & TARGET_PAGE_MASK;
+        addr1 += memory_region_get_ram_addr(section.mr) & TARGET_PAGE_MASK;
         ptr = qemu_get_ram_ptr(addr1);
         stl_p(ptr, val);
 
@@ -2512,13 +2573,15 @@ static inline void stl_phys_internal(hwaddr addr, uint32_t val,
                                      enum device_endian endian)
 {
     uint8_t *ptr;
-    MemoryRegionSection *section;
+    MemoryRegionSection section;
     hwaddr l = 4;
     hwaddr addr1;
 
-    section = address_space_translate(&address_space_memory, addr, &addr1, &l,
-                                      true);
-    if (l < 4 || !memory_access_is_direct(section->mr, true)) {
+    rcu_read_lock();
+    section = address_space_get_terminal(&address_space_memory, addr, &addr1,
+                                    &l, true);
+    rcu_read_unlock();
+    if (l < 4 || !memory_access_is_direct(section.mr, true)) {
 #if defined(TARGET_WORDS_BIGENDIAN)
         if (endian == DEVICE_LITTLE_ENDIAN) {
             val = bswap32(val);
@@ -2528,10 +2591,10 @@ static inline void stl_phys_internal(hwaddr addr, uint32_t val,
             val = bswap32(val);
         }
 #endif
-        io_mem_write(section->mr, addr1, val, 4);
+        io_mem_write(section.mr, addr1, val, 4);
     } else {
         /* RAM case */
-        addr1 += memory_region_get_ram_addr(section->mr) & TARGET_PAGE_MASK;
+        addr1 += memory_region_get_ram_addr(section.mr) & TARGET_PAGE_MASK;
         ptr = qemu_get_ram_ptr(addr1);
         switch (endian) {
         case DEVICE_LITTLE_ENDIAN:
@@ -2575,13 +2638,13 @@ static inline void stw_phys_internal(hwaddr addr, uint32_t val,
                                      enum device_endian endian)
 {
     uint8_t *ptr;
-    MemoryRegionSection *section;
+    MemoryRegionSection section;
     hwaddr l = 2;
     hwaddr addr1;
 
-    section = address_space_translate(&address_space_memory, addr, &addr1, &l,
-                                      true);
-    if (l < 2 || !memory_access_is_direct(section->mr, true)) {
+    section = address_space_get_terminal(&address_space_memory, addr, &addr1,
+                                    &l, true);
+    if (l < 2 || !memory_access_is_direct(section.mr, true)) {
 #if defined(TARGET_WORDS_BIGENDIAN)
         if (endian == DEVICE_LITTLE_ENDIAN) {
             val = bswap16(val);
@@ -2591,10 +2654,10 @@ static inline void stw_phys_internal(hwaddr addr, uint32_t val,
             val = bswap16(val);
         }
 #endif
-        io_mem_write(section->mr, addr1, val, 2);
+        io_mem_write(section.mr, addr1, val, 2);
     } else {
         /* RAM case */
-        addr1 += memory_region_get_ram_addr(section->mr) & TARGET_PAGE_MASK;
+        addr1 += memory_region_get_ram_addr(section.mr) & TARGET_PAGE_MASK;
         ptr = qemu_get_ram_ptr(addr1);
         switch (endian) {
         case DEVICE_LITTLE_ENDIAN:
@@ -2696,13 +2759,15 @@ bool virtio_is_big_endian(void)
 #ifndef CONFIG_USER_ONLY
 bool cpu_physical_memory_is_io(hwaddr phys_addr)
 {
-    MemoryRegionSection *section;
+    MemoryRegionSection section;
     hwaddr l = 1;
 
-    section = address_space_translate(&address_space_memory,
+    rcu_read_lock();
+    section = address_space_get_terminal(&address_space_memory,
                                       phys_addr, &phys_addr, &l, false);
+    rcu_read_unlock();
 
-    return !(memory_region_is_ram(section->mr) ||
-             memory_region_is_romd(section->mr));
+    return !(memory_region_is_ram(section.mr) ||
+             memory_region_is_romd(section.mr));
 }
 #endif
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v1 6/6] mem: change tcg code to rcu style
  2013-05-29  2:11 [Qemu-devel] [PATCH v1 0/6] make memory listener prepared for rcu style Liu Ping Fan
                   ` (4 preceding siblings ...)
  2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 5/6] mem: make dispatch path satify rcu style Liu Ping Fan
@ 2013-05-29  2:11 ` Liu Ping Fan
  2013-05-29  7:22   ` Paolo Bonzini
  5 siblings, 1 reply; 19+ messages in thread
From: Liu Ping Fan @ 2013-05-29  2:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Anthony Liguori

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

When adopting rcu style, for tcg code, need to fix two kind of path:
 -tlb_set_page() will cache translation info.
 -instruction emualation path

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
----
Not sure about tcg code, so I took helper_st as the example.
---
 cputlb.c                        |   25 +++++++++++++++++-
 exec.c                          |    2 +-
 include/exec/softmmu_template.h |   53 +++++++++++++++++++++++++++++++-------
 3 files changed, 68 insertions(+), 12 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index 86666c8..6b55c70 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -249,6 +249,10 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr,
     uintptr_t addend;
     CPUTLBEntry *te;
     hwaddr iotlb, xlat, sz;
+    unsigned long start;
+    Node *map_base;
+    MemoryRegionSection *sections_base;
+    PhysPageEntry *roots_base;
 
     assert(size >= TARGET_PAGE_SIZE);
     if (size != TARGET_PAGE_SIZE) {
@@ -256,8 +260,18 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr,
     }
 
     sz = size;
+
+reload:
+    rcu_read_lock();
+    start = seqlock_read_begin(&dispatch_table_lock);
+    map_base = cur_map_nodes;
+    sections_base = cur_phys_sections;
+    roots_base = cur_roots;
     section = address_space_translate(&address_space_memory, paddr, &xlat, &sz,
-                                      false);
+                                      false,
+                                      map_base,
+                                      sections_base,
+                                      roots_base);
     assert(sz >= TARGET_PAGE_SIZE);
 
 #if defined(DEBUG_TLB)
@@ -282,6 +296,9 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr,
 
     index = (vaddr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     env->iotlb[mmu_idx][index] = iotlb - vaddr;
+    /* Serialized with reader, so only need to worry about tlb_flush
+      * in parellel.  If there is conflict, just reload tlb entry until right.
+      */
     te = &env->tlb_table[mmu_idx][index];
     te->addend = addend - vaddr;
     if (prot & PAGE_READ) {
@@ -309,6 +326,12 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr,
     } else {
         te->addr_write = -1;
     }
+
+    if (unlikely(seqlock_read_check(&dispatch_table_lock, start))) {
+        rcu_read_unlock();
+        goto reload;
+    }
+    rcu_read_unlock();
 }
 
 /* NOTE: this function can trigger an exception */
diff --git a/exec.c b/exec.c
index 9a5c67f..d4ca101 100644
--- a/exec.c
+++ b/exec.c
@@ -1820,7 +1820,6 @@ static void core_commit(MemoryListener *listener)
     cur_phys_sections = next_phys_sections;
     cur_alloc_info = next_alloc_info;
     cur_roots = next_roots;
-    seqlock_write_unlock(&dispatch_table_lock);
 
     /* since each CPU stores ram addresses in its TLB cache, we must
        reset the modified entries */
@@ -1828,6 +1827,7 @@ static void core_commit(MemoryListener *listener)
     for (env = first_cpu; env != NULL; env = env->next_cpu) {
         tlb_flush(env, 1);
     }
+    seqlock_write_unlock(&dispatch_table_lock);
 
     /* Fix me, will changed to call_rcu */
     release_dispatch_map(info);
diff --git a/include/exec/softmmu_template.h b/include/exec/softmmu_template.h
index 8584902..7fa631e 100644
--- a/include/exec/softmmu_template.h
+++ b/include/exec/softmmu_template.h
@@ -196,13 +196,16 @@ static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
                                                    int mmu_idx,
                                                    uintptr_t retaddr);
 
+/* Caller hold rcu read lock, this func will release lock */
 static inline void glue(io_write, SUFFIX)(CPUArchState *env,
                                           hwaddr physaddr,
                                           DATA_TYPE val,
                                           target_ulong addr,
-                                          uintptr_t retaddr)
+                                          uintptr_t retaddr,
+                                          MemoryRegionSection *mrs_base)
 {
-    MemoryRegion *mr = iotlb_to_region(physaddr);
+    subpage_t *subpg;
+    MemoryRegion *mr = mrs_base[[physaddr & ~TARGET_PAGE_MASK].mr;
 
     physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
     if (mr != &io_mem_rom && mr != &io_mem_notdirty && !can_do_io(env)) {
@@ -211,6 +214,13 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
 
     env->mem_io_vaddr = addr;
     env->mem_io_pc = retaddr;
+    if (!mr->terminates) {
+        subpg = container_of(mr, subpage_t, mmio);
+        idx = SUBPAGE_IDX(addr);
+        mr = mrs_base[subpg->sections[idx]].mr;
+    }
+    memory_region_ref(mr);
+    rcu_read_unlock();
     io_mem_write(mr, physaddr, val, 1 << SHIFT);
 }
 
@@ -222,17 +232,28 @@ void glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
     target_ulong tlb_addr;
     uintptr_t retaddr;
     int index;
+    unsigned int start;
+    MemoryRegionSection *mrs_base;
+    uintptr_t addend;
 
     index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
  redo:
-    tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
+    rcu_read_lock();
+    do {
+        start = seqlock_read_begin(&dispatch_table_lock);
+        tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
+        ioaddr = env->iotlb[mmu_idx][index];
+        addend = env->tlb_table[mmu_idx][index].addend;
+        mrs_base =  cur_phys_sections;
+    } while (seqlock_read_check(&dispatch_table_lock, start));
+
     if ((addr & TARGET_PAGE_MASK) == (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
         if (tlb_addr & ~TARGET_PAGE_MASK) {
             /* IO access */
             if ((addr & (DATA_SIZE - 1)) != 0)
                 goto do_unaligned_access;
             retaddr = GETPC_EXT();
-            ioaddr = env->iotlb[mmu_idx][index];
+            /* will rcu_read_unlock() inside */
             glue(io_write, SUFFIX)(env, ioaddr, val, addr, retaddr);
         } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
         do_unaligned_access:
@@ -240,22 +261,23 @@ void glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
 #ifdef ALIGNED_ONLY
             do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
 #endif
+            rcu_read_unlock();
             glue(glue(slow_st, SUFFIX), MMUSUFFIX)(env, addr, val,
                                                    mmu_idx, retaddr);
         } else {
             /* aligned/unaligned access in the same page */
-            uintptr_t addend;
 #ifdef ALIGNED_ONLY
             if ((addr & (DATA_SIZE - 1)) != 0) {
                 retaddr = GETPC_EXT();
                 do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
             }
 #endif
-            addend = env->tlb_table[mmu_idx][index].addend;
             glue(glue(st, SUFFIX), _raw)((uint8_t *)(intptr_t)
                                          (addr + addend), val);
+            rcu_read_unlock();
         }
     } else {
+        rcu_read_unlock();
         /* the page is not in the TLB : fill it */
         retaddr = GETPC_EXT();
 #ifdef ALIGNED_ONLY
@@ -277,17 +299,25 @@ static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
     hwaddr ioaddr;
     target_ulong tlb_addr;
     int index, i;
+    uintptr_t addend;
 
     index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
  redo:
-    tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
+    rcu_read_lock();
+    do {
+        start = seqlock_read_begin(&dispatch_table_lock);
+        tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
+        ioaddr = env->iotlb[mmu_idx][index];
+        addend = env->tlb_table[mmu_idx][index].addend;
+        mrs_base =  cur_phys_sections;
+    } while (seqlock_read_check(&dispatch_table_lock, start));
+
     if ((addr & TARGET_PAGE_MASK) == (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
         if (tlb_addr & ~TARGET_PAGE_MASK) {
             /* IO access */
             if ((addr & (DATA_SIZE - 1)) != 0)
                 goto do_unaligned_access;
-            ioaddr = env->iotlb[mmu_idx][index];
-            glue(io_write, SUFFIX)(env, ioaddr, val, addr, retaddr);
+            glue(io_write, SUFFIX)(env, ioaddr, val, addr, retaddr, mrs_base);
         } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
         do_unaligned_access:
             /* XXX: not efficient, but simple */
@@ -295,10 +325,12 @@ static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
              * previous page from the TLB cache.  */
             for(i = DATA_SIZE - 1; i >= 0; i--) {
 #ifdef TARGET_WORDS_BIGENDIAN
+                rcu_read_unlock();
                 glue(slow_stb, MMUSUFFIX)(env, addr + i,
                                           val >> (((DATA_SIZE - 1) * 8) - (i * 8)),
                                           mmu_idx, retaddr);
 #else
+                rcu_read_unlock();
                 glue(slow_stb, MMUSUFFIX)(env, addr + i,
                                           val >> (i * 8),
                                           mmu_idx, retaddr);
@@ -306,11 +338,12 @@ static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
             }
         } else {
             /* aligned/unaligned access in the same page */
-            uintptr_t addend = env->tlb_table[mmu_idx][index].addend;
             glue(glue(st, SUFFIX), _raw)((uint8_t *)(intptr_t)
                                          (addr + addend), val);
+            rcu_read_unlock();
         }
     } else {
+        rcu_read_unlock();
         /* the page is not in the TLB : fill it */
         tlb_fill(env, addr, 1, mmu_idx, retaddr);
         goto redo;
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v1 4/6] mem: concenter the root of each AddressSpaceDispatch
  2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 4/6] mem: concenter the root of each AddressSpaceDispatch Liu Ping Fan
@ 2013-05-29  7:03   ` Paolo Bonzini
  2013-05-29  7:48     ` liu ping fan
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2013-05-29  7:03 UTC (permalink / raw)
  To: Liu Ping Fan; +Cc: qemu-devel, Anthony Liguori

Il 29/05/2013 04:11, Liu Ping Fan ha scritto:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> All of AddressSpaceDispatch's roots are part of dispatch context,
> along with cur_map_nodes, cur_phys_sections, and we should walk
> through AddressSpaceDispatchs in the same dispatch context, ie
> the same memory topology.  Concenter the roots, so we can switch
> to next more easily.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  exec.c                         |   48 ++++++++++++++++++++++++++++++++++++---
>  include/exec/memory-internal.h |    2 +-
>  2 files changed, 45 insertions(+), 5 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index eb69a98..315960d 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -49,6 +49,7 @@
>  #include "translate-all.h"
>  
>  #include "exec/memory-internal.h"
> +#include "qemu/bitops.h"
>  
>  //#define DEBUG_SUBPAGE
>  
> @@ -95,6 +96,7 @@ typedef struct AllocInfo {
>      /* Only used for release purpose */
>      Node *map;
>      MemoryRegionSection *sections;
> +    PhysPageEntry *roots;
>  } AllocInfo;

I wouldn't put it here.  I would put it in AddressSpaceDispatch (as
next_phys_map/next_sections/next_nodes: initialize
next_sections/next_nodes in the "begin" hook, switch under seqlock in
the "commit" hook).

This requires refcounting AllocInfo, but it removes the need for the
->idx indirection and the id management.

Paolo

>  /* For performance, fetch page map related pointers directly, other than
> @@ -102,10 +104,12 @@ typedef struct AllocInfo {
>   */
>  static MemoryRegionSection *cur_phys_sections;
>  static Node *cur_map_nodes;
> +static PhysPageEntry *cur_roots;
>  static AllocInfo *cur_alloc_info;
>  
>  static MemoryRegionSection *next_phys_sections;
>  static Node *next_map_nodes;
> +static PhysPageEntry *next_roots;
>  static AllocInfo *next_alloc_info;
>  
>  #define PHYS_MAP_NODE_NIL (((uint16_t)~0) >> 1)
> @@ -191,12 +195,13 @@ static void phys_page_set(AddressSpaceDispatch *d,
>      /* Wildly overreserve - it doesn't matter much. */
>      phys_map_node_reserve(3 * P_L2_LEVELS);
>  
> -    phys_page_set_level(&d->phys_map, &index, &nb, leaf, P_L2_LEVELS - 1);
> +    phys_page_set_level(&next_roots[d->idx], &index, &nb, leaf,
> +                        P_L2_LEVELS - 1);
>  }
>  
>  static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr index, bool cur)
>  {
> -    PhysPageEntry lp = d->phys_map;
> +    PhysPageEntry lp;
>      PhysPageEntry *p;
>      int i;
>      Node *map_nodes;
> @@ -205,9 +210,11 @@ static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr index
>      if (likely(cur)) {
>          map_nodes = cur_map_nodes;
>          sections = cur_phys_sections;
> +        lp = cur_roots[d->idx];
>      } else {
>          map_nodes = next_map_nodes;
>          sections = next_phys_sections;
> +        lp = next_roots[d->idx];
>      }
>      for (i = P_L2_LEVELS - 1; i >= 0 && !lp.is_leaf; i--) {
>          if (lp.ptr == PHYS_MAP_NODE_NIL) {
> @@ -1729,7 +1736,8 @@ static void mem_begin(MemoryListener *listener)
>  {
>      AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch, listener);
>  
> -    d->phys_map.ptr = PHYS_MAP_NODE_NIL;
> +    next_roots[d->idx] = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL,
> +                                    is_leaf = 0 };
>  }
>  
>  static void core_begin(MemoryListener *listener)
> @@ -1737,6 +1745,7 @@ static void core_begin(MemoryListener *listener)
>      uint16_t n;
>  
>      next_alloc_info = g_malloc0(sizeof(AllocInfo));
> +    next_roots = g_malloc0(2048*sizeof(PhysPageEntry));
>      n = dummy_section(&io_mem_unassigned);
>      assert(phys_section_unassigned == n);
>      n = dummy_section(&io_mem_notdirty);
> @@ -1752,6 +1761,7 @@ static void release_dispatch_map(AllocInfo *info)
>      phys_sections_clear(info->phys_sections_nb, info->sections);
>      g_free(info->map);
>      g_free(info->sections);
> +    g_free(info->roots);
>      g_free(info);
>  }
>  
> @@ -1765,6 +1775,7 @@ static void core_commit(MemoryListener *listener)
>      AllocInfo *info = cur_alloc_info;
>      info->map = cur_map_nodes;
>      info->sections = cur_phys_sections;
> +    info->roots = cur_roots;
>  
>      /* Fix me, in future, will be protected by wr seqlock when in parellel
>       * with reader
> @@ -1772,6 +1783,7 @@ static void core_commit(MemoryListener *listener)
>      cur_map_nodes = next_map_nodes;
>      cur_phys_sections = next_phys_sections;
>      cur_alloc_info = next_alloc_info;
> +    cur_roots = next_roots;
>  
>      /* since each CPU stores ram addresses in its TLB cache, we must
>         reset the modified entries */
> @@ -1826,11 +1838,36 @@ static MemoryListener io_memory_listener = {
>      .priority = 0,
>  };
>  
> +static MemoryListener tcg_memory_listener = {
> +    .commit = tcg_commit,
> +};

Rebase artifact?

> +#define MAX_IDX (1<<15)
> +static unsigned long *address_space_id_map;
> +static QemuMutex id_lock;
> +
> +static void address_space_release_id(int16_t idx)
> +{
> +    qemu_mutex_lock(&id_lock);
> +    clear_bit(idx, address_space_id_map);
> +    qemu_mutex_unlock(&id_lock);
> +}
> +
> +static int16_t address_space_alloc_id()
> +{
> +    unsigned long idx;
> +
> +    qemu_mutex_lock(&id_lock);
> +    idx = find_first_zero_bit(address_space_id_map, MAX_IDX/BITS_PER_LONG);
> +    set_bit(idx, address_space_id_map);
> +    qemu_mutex_unlock(&id_lock);
> +}
> +
>  void address_space_init_dispatch(AddressSpace *as)
>  {
>      AddressSpaceDispatch *d = g_new(AddressSpaceDispatch, 1);
>  
> -    d->phys_map  = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
> +    d->idx = address_space_alloc_id();
>      d->listener = (MemoryListener) {
>          .begin = mem_begin,
>          .region_add = mem_add,
> @@ -1845,6 +1882,7 @@ void address_space_destroy_dispatch(AddressSpace *as)
>  {
>      AddressSpaceDispatch *d = as->dispatch;
>  
> +    address_space_release_id(d->idx);
>      memory_listener_unregister(&d->listener);
>      g_free(d);
>      as->dispatch = NULL;
> @@ -1852,6 +1890,8 @@ void address_space_destroy_dispatch(AddressSpace *as)
>  
>  static void memory_map_init(void)
>  {
> +    qemu_mutex_init(&id_lock);
> +    address_space_id_map = g_malloc0(MAX_IDX/BITS_PER_BYTE);
>      system_memory = g_malloc(sizeof(*system_memory));
>      memory_region_init(system_memory, "system", INT64_MAX);
>      address_space_init(&address_space_memory, system_memory, "memory");
> diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
> index 799c02a..54a3635 100644
> --- a/include/exec/memory-internal.h
> +++ b/include/exec/memory-internal.h
> @@ -36,7 +36,7 @@ struct AddressSpaceDispatch {
>      /* This is a multi-level map on the physical address space.
>       * The bottom level has pointers to MemoryRegionSections.
>       */
> -    PhysPageEntry phys_map;
> +    int16_t idx;
>      MemoryListener listener;
>  };
>  
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v1 5/6] mem: make dispatch path satify rcu style
  2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 5/6] mem: make dispatch path satify rcu style Liu Ping Fan
@ 2013-05-29  7:06   ` Paolo Bonzini
  2013-05-29  7:15   ` Paolo Bonzini
  1 sibling, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2013-05-29  7:06 UTC (permalink / raw)
  To: Liu Ping Fan; +Cc: qemu-devel, Anthony Liguori

Il 29/05/2013 04:11, Liu Ping Fan ha scritto:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Using seqlock to load dispatch context atomic. The dispatch
> context consist of: cur_map_nodes, cur_sections, cur_roots.
> 
> Also during the dispatch, we should get the terminal, and dup
> MemoryRegionSection. So after rcu unlock, the cur dispatch
> context can be dropped safely.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> --------
> The tcg code related to tlb_set_page() should be fixed too.
> But it is a little complicated, separating it for easing review.
> ---
>  exec.c |  239 ++++++++++++++++++++++++++++++++++++++++-----------------------
>  1 files changed, 152 insertions(+), 87 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 315960d..9a5c67f 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -112,6 +112,8 @@ static Node *next_map_nodes;
>  static PhysPageEntry *next_roots;
>  static AllocInfo *next_alloc_info;
>  
> +QemuSeqLock dispatch_table_lock;
> +
>  #define PHYS_MAP_NODE_NIL (((uint16_t)~0) >> 1)
>  
>  static void io_mem_init(void);
> @@ -199,31 +201,24 @@ static void phys_page_set(AddressSpaceDispatch *d,
>                          P_L2_LEVELS - 1);
>  }
>  
> -static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr index, bool cur)
> +static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d,
> +            hwaddr index,
> +            Node *map_base,
> +            MemoryRegionSection *sections_base,
> +            PhysPageEntry *root_base)
>  {
> -    PhysPageEntry lp;
> +    PhysPageEntry lp = root_base[d->idx];
>      PhysPageEntry *p;
>      int i;
> -    Node *map_nodes;
> -    MemoryRegionSection *sections;
>  
> -    if (likely(cur)) {
> -        map_nodes = cur_map_nodes;
> -        sections = cur_phys_sections;
> -        lp = cur_roots[d->idx];
> -    } else {
> -        map_nodes = next_map_nodes;
> -        sections = next_phys_sections;
> -        lp = next_roots[d->idx];
> -    }
>      for (i = P_L2_LEVELS - 1; i >= 0 && !lp.is_leaf; i--) {
>          if (lp.ptr == PHYS_MAP_NODE_NIL) {
> -            return &sections[phys_section_unassigned];
> +            return &sections_base[phys_section_unassigned];
>          }
> -        p = map_nodes[lp.ptr];
> +        p = map_base[lp.ptr];
>          lp = p[(index >> (i * L2_BITS)) & (L2_SIZE - 1)];
>      }
> -    return &sections[lp.ptr];
> +    return &sections_base[lp.ptr];
>  }
>  
>  bool memory_region_is_unassigned(MemoryRegion *mr)
> @@ -233,21 +228,28 @@ bool memory_region_is_unassigned(MemoryRegion *mr)
>  }
>  
>  static MemoryRegionSection *address_space_lookup_region(AddressSpace *as,
> -                                                        hwaddr addr)
> +        hwaddr addr, Node *map_base, MemoryRegionSection *sections_base,
> +        PhysPageEntry *root_base)
> +
>  {
> -    return phys_page_find(as->dispatch, addr >> TARGET_PAGE_BITS, true);
> +    return phys_page_find(as->dispatch, addr >> TARGET_PAGE_BITS,
> +                                    map_base, sections_base, root_base);
>  }
>  
>  MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr,
>                                               hwaddr *xlat, hwaddr *plen,
> -                                             bool is_write)
> +                                             bool is_write,
> +                                             Node *map_base,
> +                                             MemoryRegionSection *sections_base,
> +                                             PhysPageEntry *root_base)

These two functions are always called on the "cur" map.  Only
phys_page_find can be called (from register_subpage) on the "next" map.

Paolo

>  {
>      IOMMUTLBEntry iotlb;
> -    MemoryRegionSection *section, *base = cur_phys_sections;
> +    MemoryRegionSection *section;
>      hwaddr len = *plen;
>  
>      for (;;) {
> -        section = address_space_lookup_region(as, addr);
> +        section = address_space_lookup_region(as, addr, map_base,
> +                        sections_base, root_base);
>  
>          /* Compute offset within MemoryRegionSection */
>          addr -= section->offset_within_address_space;
> @@ -265,7 +267,7 @@ MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr,
>                  | (addr & iotlb.addr_mask));
>          len = MIN(len, (addr | iotlb.addr_mask) - addr + 1);
>          if (!(iotlb.perm & (1 << is_write))) {
> -            section = &base[phys_section_unassigned];
> +            section = &sections_base[phys_section_unassigned];
>              break;
>          }
>  
> @@ -276,6 +278,42 @@ MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr,
>      *xlat = addr;
>      return section;
>  }
> +
> +/* caller should hold rcu read lock */
> +MemoryRegionSection address_space_get_terminal(AddressSpace *as, hwaddr addr,
> +                                             hwaddr *xlat, hwaddr *plen,
> +                                             bool is_write)
> +{
> +    MemoryRegionSection *mrs, *ret, *sections_base;
> +    Node *map_base;
> +    subpage_t *mmio;
> +    unsigned int idx, start;
> +    PhysPageEntry *root_base;
> +
> +    /* Walk through all of the AddressSpace in consistent */
> +    do {
> +        start = seqlock_read_begin(&dispatch_table_lock);
> +        map_base = cur_map_nodes;
> +        sections_base = cur_phys_sections;
> +        /* get all AddressSpaceDispatch root */
> +        root_base = cur_roots;
> +    } while (seqlock_read_check(&dispatch_table_lock, start);
> +
> +    mrs = address_space_translate(as, addr, xlat, plen, is_write,
> +                    map_base, sections_base, root_base);
> +    if (!mrs->mr->terminates) {
> +        mmio = container_of(mrs->mr, subpage_t, mmio);
> +        idx = SUBPAGE_IDX(addr);
> +        ret = &sections_base[mmio->sub_section[idx]];
> +        *xlat += mmio->base;
> +        *xlat -= mrs->offset_within_address_space;
> +        *xlat += mrs->offset_within_region;
> +    } else {
> +        ret = mrs;
> +    }
> +
> +    return *ret;
> +}
>  #endif
>  
>  void cpu_exec_init_all(void)
> @@ -1777,13 +1815,12 @@ static void core_commit(MemoryListener *listener)
>      info->sections = cur_phys_sections;
>      info->roots = cur_roots;
>  
> -    /* Fix me, in future, will be protected by wr seqlock when in parellel
> -     * with reader
> -     */
> +    seqlock_write_lock(&dispatch_table_lock);
>      cur_map_nodes = next_map_nodes;
>      cur_phys_sections = next_phys_sections;
>      cur_alloc_info = next_alloc_info;
>      cur_roots = next_roots;
> +    seqlock_write_unlock(&dispatch_table_lock);
>  
>      /* since each CPU stores ram addresses in its TLB cache, we must
>         reset the modified entries */
> @@ -1890,8 +1927,13 @@ void address_space_destroy_dispatch(AddressSpace *as)
>  
>  static void memory_map_init(void)
>  {
> +    QemuMutex *mutex;
> +
>      qemu_mutex_init(&id_lock);
>      address_space_id_map = g_malloc0(MAX_IDX/BITS_PER_BYTE);
> +    mutex = g_malloc0(sizeof(QemuMutex));
> +    qemu_mutex_init(mutex);
> +    seqlock_init(&dispatch_table_lock, mutex);
>      system_memory = g_malloc(sizeof(*system_memory));
>      memory_region_init(system_memory, "system", INT64_MAX);
>      address_space_init(&address_space_memory, system_memory, "memory");
> @@ -2001,59 +2043,68 @@ bool address_space_rw(AddressSpace *as, hwaddr addr, uint8_t *buf,
>      uint8_t *ptr;
>      uint64_t val;
>      hwaddr addr1;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>      bool error = false;
>  
>      while (len > 0) {
>          l = len;
> -        section = address_space_translate(as, addr, &addr1, &l, is_write);
> +        rcu_read_lock();
> +        section = address_space_get_terminal(as, addr, &addr1, &l, is_write);
>  
>          if (is_write) {
> -            if (!memory_access_is_direct(section->mr, is_write)) {
> +            if (!memory_access_is_direct(section.mr, is_write)) {
>                  l = memory_access_size(l, addr1);
>                  /* XXX: could force cpu_single_env to NULL to avoid
>                     potential bugs */
> +                memory_region_ref(section.mr);
> +                rcu_read_unlock();
>                  if (l == 4) {
>                      /* 32 bit write access */
>                      val = ldl_p(buf);
> -                    error |= io_mem_write(section->mr, addr1, val, 4);
> +                    error |= io_mem_write(section.mr, addr1, val, 4);
>                  } else if (l == 2) {
>                      /* 16 bit write access */
>                      val = lduw_p(buf);
> -                    error |= io_mem_write(section->mr, addr1, val, 2);
> +                    error |= io_mem_write(section.mr, addr1, val, 2);
>                  } else {
>                      /* 8 bit write access */
>                      val = ldub_p(buf);
> -                    error |= io_mem_write(section->mr, addr1, val, 1);
> +                    error |= io_mem_write(section.mr, addr1, val, 1);
>                  }
> +                memory_region_unref(section.mr);
>              } else {
> -                addr1 += memory_region_get_ram_addr(section->mr);
> +                addr1 += memory_region_get_ram_addr(section.mr);
>                  /* RAM case */
>                  ptr = qemu_get_ram_ptr(addr1);
>                  memcpy(ptr, buf, l);
>                  invalidate_and_set_dirty(addr1, l);
> +                rcu_read_unlock();
>              }
>          } else {
> -            if (!memory_access_is_direct(section->mr, is_write)) {
> +            if (!memory_access_is_direct(section.mr, is_write)) {
>                  /* I/O case */
> +                memory_region_ref(section.mr);
> +                rcu_read_unlock();
>                  l = memory_access_size(l, addr1);
>                  if (l == 4) {
>                      /* 32 bit read access */
> -                    error |= io_mem_read(section->mr, addr1, &val, 4);
> +                    error |= io_mem_read(section.mr, addr1, &val, 4);
>                      stl_p(buf, val);
>                  } else if (l == 2) {
>                      /* 16 bit read access */
> -                    error |= io_mem_read(section->mr, addr1, &val, 2);
> +                    error |= io_mem_read(section.mr, addr1, &val, 2);
>                      stw_p(buf, val);
>                  } else {
>                      /* 8 bit read access */
> -                    error |= io_mem_read(section->mr, addr1, &val, 1);
> +                    error |= io_mem_read(section.mr, addr1, &val, 1);
>                      stb_p(buf, val);
>                  }
> +                memory_region_unref(section.mr);
>              } else {
>                  /* RAM case */
> -                ptr = qemu_get_ram_ptr(section->mr->ram_addr + addr1);
> +                ptr = qemu_get_ram_ptr(section.mr->ram_addr + addr1);
>                  memcpy(buf, ptr, l);
> +                rcu_read_unlock();
>              }
>          }
>          len -= l;
> @@ -2089,18 +2140,19 @@ void cpu_physical_memory_write_rom(hwaddr addr,
>      hwaddr l;
>      uint8_t *ptr;
>      hwaddr addr1;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>  
> +    rcu_read_lock();
>      while (len > 0) {
>          l = len;
> -        section = address_space_translate(&address_space_memory,
> +        section = address_space_get_terminal(&address_space_memory,
>                                            addr, &addr1, &l, true);
>  
> -        if (!(memory_region_is_ram(section->mr) ||
> -              memory_region_is_romd(section->mr))) {
> +        if (!(memory_region_is_ram(section.mr) ||
> +              memory_region_is_romd(section.mr))) {
>              /* do nothing */
>          } else {
> -            addr1 += memory_region_get_ram_addr(section->mr);
> +            addr1 += memory_region_get_ram_addr(section.mr);
>              /* ROM/RAM case */
>              ptr = qemu_get_ram_ptr(addr1);
>              memcpy(ptr, buf, l);
> @@ -2110,6 +2162,7 @@ void cpu_physical_memory_write_rom(hwaddr addr,
>          buf += l;
>          addr += l;
>      }
> +    rcu_read_unlock();
>  }
>  
>  typedef struct {
> @@ -2161,15 +2214,15 @@ static void cpu_notify_map_clients(void)
>  
>  bool address_space_access_valid(AddressSpace *as, hwaddr addr, int len, bool is_write)
>  {
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>      hwaddr l, xlat;
>  
>      while (len > 0) {
>          l = len;
> -        section = address_space_translate(as, addr, &xlat, &l, is_write);
> -        if (!memory_access_is_direct(section->mr, is_write)) {
> +        section = address_space_get_terminal(as, addr, &xlat, &l, is_write);
> +        if (!memory_access_is_direct(section.mr, is_write)) {
>              l = memory_access_size(l, addr);
> -            if (!memory_region_access_valid(section->mr, xlat, l, is_write)) {
> +            if (!memory_region_access_valid(section.mr, xlat, l, is_write)) {
>                  return false;
>              }
>          }
> @@ -2195,14 +2248,14 @@ void *address_space_map(AddressSpace *as,
>      hwaddr len = *plen;
>      hwaddr todo = 0;
>      hwaddr l, xlat;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>      ram_addr_t raddr = RAM_ADDR_MAX;
>      ram_addr_t rlen;
>      void *ret;
>  
>      while (len > 0) {
>          l = len;
> -        section = address_space_translate(as, addr, &xlat, &l, is_write);
> +        section = address_space_get_terminal(as, addr, &xlat, &l, is_write);
>  
>          if (!memory_access_is_direct(section->mr, is_write)) {
>              if (todo || bounce.buffer) {
> @@ -2211,20 +2264,20 @@ void *address_space_map(AddressSpace *as,
>              bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, TARGET_PAGE_SIZE);
>              bounce.addr = addr;
>              bounce.len = l;
> -            bounce.mr = section->mr;
> +            bounce.mr = section.mr;
>              if (!is_write) {
>                  address_space_read(as, addr, bounce.buffer, l);
>              }
>  
>              *plen = l;
> -            memory_region_ref(section->mr);
> +            memory_region_ref(section.mr);
>              return bounce.buffer;
>          }
>          if (!todo) {
> -            raddr = memory_region_get_ram_addr(section->mr) + xlat;
> -            memory_region_ref(section->mr);
> +            raddr = memory_region_get_ram_addr(section.mr) + xlat;
> +            memory_region_ref(section.mr);
>          } else {
> -            if (memory_region_get_ram_addr(section->mr) + xlat != raddr + todo) {
> +            if (memory_region_get_ram_addr(section.mr) + xlat != raddr + todo) {
>                  break;
>              }
>          }
> @@ -2297,15 +2350,17 @@ static inline uint32_t ldl_phys_internal(hwaddr addr,
>  {
>      uint8_t *ptr;
>      uint64_t val;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>      hwaddr l = 4;
>      hwaddr addr1;
>  
> -    section = address_space_translate(&address_space_memory, addr, &addr1, &l,
> -                                      false);
> +    rcu_read_lock();
> +    section = address_space_get_terminal(&address_space_memory, addr, &addr1,
> +                                    &l, false);
> +    rcu_read_unlock();
>      if (l < 4 || !memory_access_is_direct(section->mr, false)) {
>          /* I/O case */
> -        io_mem_read(section->mr, addr1, &val, 4);
> +        io_mem_read(section.mr, addr1, &val, 4);
>  #if defined(TARGET_WORDS_BIGENDIAN)
>          if (endian == DEVICE_LITTLE_ENDIAN) {
>              val = bswap32(val);
> @@ -2360,8 +2415,10 @@ static inline uint64_t ldq_phys_internal(hwaddr addr,
>      hwaddr l = 8;
>      hwaddr addr1;
>  
> -    section = address_space_translate(&address_space_memory, addr, &addr1, &l,
> -                                      false);
> +    rcu_read_lock();
> +    section = address_space_get_terminal(&address_space_memory, addr, &addr1,
> +                                    &l, false);
> +    rcu_read_unlock();
>      if (l < 8 || !memory_access_is_direct(section->mr, false)) {
>          /* I/O case */
>          io_mem_read(section->mr, addr1, &val, 8);
> @@ -2423,15 +2480,17 @@ static inline uint32_t lduw_phys_internal(hwaddr addr,
>  {
>      uint8_t *ptr;
>      uint64_t val;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>      hwaddr l = 2;
>      hwaddr addr1;
>  
> -    section = address_space_translate(&address_space_memory, addr, &addr1, &l,
> -                                      false);
> -    if (l < 2 || !memory_access_is_direct(section->mr, false)) {
> +    rcu_read_lock();
> +    section = address_space_get_terminal(&address_space_memory, addr, &addr1,
> +                                    &l, false);
> +    rcu_read_unlock();
> +    if (l < 2 || !memory_access_is_direct(section.mr, false)) {
>          /* I/O case */
> -        io_mem_read(section->mr, addr1, &val, 2);
> +        io_mem_read(section.mr, addr1, &val, 2);
>  #if defined(TARGET_WORDS_BIGENDIAN)
>          if (endian == DEVICE_LITTLE_ENDIAN) {
>              val = bswap16(val);
> @@ -2443,7 +2502,7 @@ static inline uint32_t lduw_phys_internal(hwaddr addr,
>  #endif
>      } else {
>          /* RAM case */
> -        ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(section->mr)
> +        ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(section.mr)
>                                  & TARGET_PAGE_MASK)
>                                 + addr1);
>          switch (endian) {
> @@ -2482,16 +2541,18 @@ uint32_t lduw_be_phys(hwaddr addr)
>  void stl_phys_notdirty(hwaddr addr, uint32_t val)
>  {
>      uint8_t *ptr;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>      hwaddr l = 4;
>      hwaddr addr1;
>  
> -    section = address_space_translate(&address_space_memory, addr, &addr1, &l,
> -                                      true);
> -    if (l < 4 || !memory_access_is_direct(section->mr, true)) {
> -        io_mem_write(section->mr, addr1, val, 4);
> +    rcu_read_lock();
> +    section = address_space_get_terminal(&address_space_memory, addr, &addr1,
> +                                    &l, true);
> +    rcu_read_unlock();
> +    if (l < 4 || !memory_access_is_direct(section.mr, true)) {
> +        io_mem_write(section.mr, addr1, val, 4);
>      } else {
> -        addr1 += memory_region_get_ram_addr(section->mr) & TARGET_PAGE_MASK;
> +        addr1 += memory_region_get_ram_addr(section.mr) & TARGET_PAGE_MASK;
>          ptr = qemu_get_ram_ptr(addr1);
>          stl_p(ptr, val);
>  
> @@ -2512,13 +2573,15 @@ static inline void stl_phys_internal(hwaddr addr, uint32_t val,
>                                       enum device_endian endian)
>  {
>      uint8_t *ptr;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>      hwaddr l = 4;
>      hwaddr addr1;
>  
> -    section = address_space_translate(&address_space_memory, addr, &addr1, &l,
> -                                      true);
> -    if (l < 4 || !memory_access_is_direct(section->mr, true)) {
> +    rcu_read_lock();
> +    section = address_space_get_terminal(&address_space_memory, addr, &addr1,
> +                                    &l, true);
> +    rcu_read_unlock();
> +    if (l < 4 || !memory_access_is_direct(section.mr, true)) {
>  #if defined(TARGET_WORDS_BIGENDIAN)
>          if (endian == DEVICE_LITTLE_ENDIAN) {
>              val = bswap32(val);
> @@ -2528,10 +2591,10 @@ static inline void stl_phys_internal(hwaddr addr, uint32_t val,
>              val = bswap32(val);
>          }
>  #endif
> -        io_mem_write(section->mr, addr1, val, 4);
> +        io_mem_write(section.mr, addr1, val, 4);
>      } else {
>          /* RAM case */
> -        addr1 += memory_region_get_ram_addr(section->mr) & TARGET_PAGE_MASK;
> +        addr1 += memory_region_get_ram_addr(section.mr) & TARGET_PAGE_MASK;
>          ptr = qemu_get_ram_ptr(addr1);
>          switch (endian) {
>          case DEVICE_LITTLE_ENDIAN:
> @@ -2575,13 +2638,13 @@ static inline void stw_phys_internal(hwaddr addr, uint32_t val,
>                                       enum device_endian endian)
>  {
>      uint8_t *ptr;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>      hwaddr l = 2;
>      hwaddr addr1;
>  
> -    section = address_space_translate(&address_space_memory, addr, &addr1, &l,
> -                                      true);
> -    if (l < 2 || !memory_access_is_direct(section->mr, true)) {
> +    section = address_space_get_terminal(&address_space_memory, addr, &addr1,
> +                                    &l, true);
> +    if (l < 2 || !memory_access_is_direct(section.mr, true)) {
>  #if defined(TARGET_WORDS_BIGENDIAN)
>          if (endian == DEVICE_LITTLE_ENDIAN) {
>              val = bswap16(val);
> @@ -2591,10 +2654,10 @@ static inline void stw_phys_internal(hwaddr addr, uint32_t val,
>              val = bswap16(val);
>          }
>  #endif
> -        io_mem_write(section->mr, addr1, val, 2);
> +        io_mem_write(section.mr, addr1, val, 2);
>      } else {
>          /* RAM case */
> -        addr1 += memory_region_get_ram_addr(section->mr) & TARGET_PAGE_MASK;
> +        addr1 += memory_region_get_ram_addr(section.mr) & TARGET_PAGE_MASK;
>          ptr = qemu_get_ram_ptr(addr1);
>          switch (endian) {
>          case DEVICE_LITTLE_ENDIAN:
> @@ -2696,13 +2759,15 @@ bool virtio_is_big_endian(void)
>  #ifndef CONFIG_USER_ONLY
>  bool cpu_physical_memory_is_io(hwaddr phys_addr)
>  {
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>      hwaddr l = 1;
>  
> -    section = address_space_translate(&address_space_memory,
> +    rcu_read_lock();
> +    section = address_space_get_terminal(&address_space_memory,
>                                        phys_addr, &phys_addr, &l, false);
> +    rcu_read_unlock();
>  
> -    return !(memory_region_is_ram(section->mr) ||
> -             memory_region_is_romd(section->mr));
> +    return !(memory_region_is_ram(section.mr) ||
> +             memory_region_is_romd(section.mr));
>  }
>  #endif
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/6] mem: make global dispatch table ready for rcu
  2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 2/6] mem: make global dispatch table ready for rcu Liu Ping Fan
@ 2013-05-29  7:07   ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2013-05-29  7:07 UTC (permalink / raw)
  To: Liu Ping Fan; +Cc: qemu-devel, Anthony Liguori

Il 29/05/2013 04:11, Liu Ping Fan ha scritto:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Currently, phys_node_map and phys_sections are shared by all
> of the AddressSpaceDispatch. When updating mem topology, all
> AddressSpaceDispatch will rebuild dispatch tables sequentially
> on them. In order to use rcu-style, introducing next_node_map
> and next_phys_sections, so that when rebuilding, the new dispatch
> tables will locate on next_*. After all AddressSpaceDispatch
> finished, we can switch to next_* and drop the previous stuff.
> 
> This patch still work in the scenario that readers and writer
> can not run in parellel (ie, protected by biglock), so no extra
> lock method needed till now, and the rcu style consistent issue
> will be left to the following patches.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  exec.c |  202 ++++++++++++++++++++++++++++++++++-----------------------------
>  1 files changed, 109 insertions(+), 93 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 6934c2d..e5335f5 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -80,16 +80,33 @@ int use_icount;
>  
>  #if !defined(CONFIG_USER_ONLY)
>  
> -static MemoryRegionSection *phys_sections;
> -static unsigned phys_sections_nb, phys_sections_nb_alloc;
>  #define phys_section_unassigned 0
>  #define phys_section_notdirty 1
>  #define phys_section_rom 2
>  #define phys_section_watch 3
>  
> -/* Simple allocator for PhysPageEntry nodes */
> -static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
> -static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
> +typedef PhysPageEntry Node[L2_SIZE];
> +
> +typedef struct AllocInfo {
> +    unsigned phys_sections_nb;
> +    unsigned phys_sections_nb_alloc;
> +    unsigned phys_map_nodes_nb;
> +    unsigned phys_map_nodes_nb_alloc;
> +    /* Only used for release purpose */
> +    Node *map;

Node *nodes;

> +    MemoryRegionSection *sections;
> +} AllocInfo;
> +
> +/* For performance, fetch page map related pointers directly, other than
> + * hiding them inside AllocInfo
> + */
> +static MemoryRegionSection *cur_phys_sections;
> +static Node *cur_map_nodes;
> +static AllocInfo *cur_alloc_info;
> +
> +static MemoryRegionSection *next_phys_sections;
> +static Node *next_map_nodes;
> +static AllocInfo *next_alloc_info;

No need for the "next" variables, just make it "static AllocInfo *next_map".
>  
>  #define PHYS_MAP_NODE_NIL (((uint16_t)~0) >> 1)
>  
> @@ -104,13 +121,15 @@ static MemoryRegion io_mem_watch;
>  
>  static void phys_map_node_reserve(unsigned nodes)
>  {
> -    if (phys_map_nodes_nb + nodes > phys_map_nodes_nb_alloc) {
> -        typedef PhysPageEntry Node[L2_SIZE];
> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc * 2, 16);
> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc,
> -                                      phys_map_nodes_nb + nodes);
> -        phys_map_nodes = g_renew(Node, phys_map_nodes,
> -                                 phys_map_nodes_nb_alloc);
> +    AllocInfo *info = next_alloc_info;
> +
> +    if (info->phys_map_nodes_nb + nodes > info->phys_map_nodes_nb_alloc) {
> +        info->phys_map_nodes_nb_alloc = MAX(info->phys_map_nodes_nb_alloc * 2,
> +                                            16);
> +        info->phys_map_nodes_nb_alloc = MAX(info->phys_map_nodes_nb_alloc,
> +                                      info->phys_map_nodes_nb + nodes);
> +        next_map_nodes = g_renew(Node, next_map_nodes,
> +                                 info->phys_map_nodes_nb_alloc);
>      }
>  }
>  
> @@ -118,23 +137,18 @@ static uint16_t phys_map_node_alloc(void)
>  {
>      unsigned i;
>      uint16_t ret;
> +    AllocInfo *info = next_alloc_info;
>  
> -    ret = phys_map_nodes_nb++;
> +    ret = info->phys_map_nodes_nb++;
>      assert(ret != PHYS_MAP_NODE_NIL);
> -    assert(ret != phys_map_nodes_nb_alloc);
> +    assert(ret != info->phys_map_nodes_nb_alloc);
>      for (i = 0; i < L2_SIZE; ++i) {
> -        phys_map_nodes[ret][i].is_leaf = 0;
> -        phys_map_nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
> +        next_map_nodes[ret][i].is_leaf = 0;
> +        next_map_nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
>      }
>      return ret;
>  }
>  
> -static void phys_map_nodes_reset(void)
> -{
> -    phys_map_nodes_nb = 0;
> -}
> -
> -
>  static void phys_page_set_level(PhysPageEntry *lp, hwaddr *index,
>                                  hwaddr *nb, uint16_t leaf,
>                                  int level)
> @@ -145,7 +159,7 @@ static void phys_page_set_level(PhysPageEntry *lp, hwaddr *index,
>  
>      if (!lp->is_leaf && lp->ptr == PHYS_MAP_NODE_NIL) {
>          lp->ptr = phys_map_node_alloc();
> -        p = phys_map_nodes[lp->ptr];
> +        p = next_map_nodes[lp->ptr];
>          if (level == 0) {
>              for (i = 0; i < L2_SIZE; i++) {
>                  p[i].is_leaf = 1;
> @@ -153,7 +167,7 @@ static void phys_page_set_level(PhysPageEntry *lp, hwaddr *index,
>              }
>          }
>      } else {
> -        p = phys_map_nodes[lp->ptr];
> +        p = next_map_nodes[lp->ptr];
>      }
>      lp = &p[(*index >> (level * L2_BITS)) & (L2_SIZE - 1)];
>  
> @@ -180,20 +194,29 @@ static void phys_page_set(AddressSpaceDispatch *d,
>      phys_page_set_level(&d->phys_map, &index, &nb, leaf, P_L2_LEVELS - 1);
>  }
>  
> -static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr index)
> +static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr index, bool cur)
>  {
>      PhysPageEntry lp = d->phys_map;
>      PhysPageEntry *p;
>      int i;
> +    Node *map_nodes;
> +    MemoryRegionSection *sections;
>  
> +    if (likely(cur)) {
> +        map_nodes = cur_map_nodes;
> +        sections = cur_phys_sections;
> +    } else {
> +        map_nodes = next_map_nodes;
> +        sections = next_phys_sections;
> +    }

Just pass the AllocInfo here.

>      for (i = P_L2_LEVELS - 1; i >= 0 && !lp.is_leaf; i--) {
>          if (lp.ptr == PHYS_MAP_NODE_NIL) {
> -            return &phys_sections[phys_section_unassigned];
> +            return &sections[phys_section_unassigned];
>          }
> -        p = phys_map_nodes[lp.ptr];
> +        p = map_nodes[lp.ptr];
>          lp = p[(index >> (i * L2_BITS)) & (L2_SIZE - 1)];
>      }
> -    return &phys_sections[lp.ptr];
> +    return &sections[lp.ptr];
>  }
>  
>  bool memory_region_is_unassigned(MemoryRegion *mr)
> @@ -205,7 +228,7 @@ bool memory_region_is_unassigned(MemoryRegion *mr)
>  static MemoryRegionSection *address_space_lookup_region(AddressSpace *as,
>                                                          hwaddr addr)
>  {
> -    return phys_page_find(as->dispatch, addr >> TARGET_PAGE_BITS);
> +    return phys_page_find(as->dispatch, addr >> TARGET_PAGE_BITS, true);
>  }
>  
>  MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr,
> @@ -213,7 +236,7 @@ MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr,
>                                               bool is_write)
>  {
>      IOMMUTLBEntry iotlb;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection *section, *base = cur_phys_sections;
>      hwaddr len = *plen;
>  
>      for (;;) {
> @@ -235,7 +258,7 @@ MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr,
>                  | (addr & iotlb.addr_mask));
>          len = MIN(len, (addr | iotlb.addr_mask) - addr + 1);
>          if (!(iotlb.perm & (1 << is_write))) {
> -            section = &phys_sections[phys_section_unassigned];
> +            section = &base[phys_section_unassigned];
>              break;
>          }
>  
> @@ -677,7 +700,7 @@ hwaddr memory_region_section_get_iotlb(CPUArchState *env,
>              iotlb |= phys_section_rom;
>          }
>      } else {
> -        iotlb = section - phys_sections;
> +        iotlb = section - cur_phys_sections;
>          iotlb += xlat;
>      }
>  
> @@ -710,67 +733,31 @@ typedef struct subpage_t {
>  static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
>                               uint16_t section);
>  static subpage_t *subpage_init(hwaddr base);
> -static void destroy_page_desc(uint16_t section_index)
> -{
> -    MemoryRegionSection *section = &phys_sections[section_index];
> -    MemoryRegion *mr = section->mr;
> -
> -    if (mr->subpage) {
> -        subpage_t *subpage = container_of(mr, subpage_t, iomem);
> -        memory_region_destroy(&subpage->iomem);
> -        g_free(subpage);
> -    }
> -}
> -
> -static void destroy_l2_mapping(PhysPageEntry *lp, unsigned level)
> -{
> -    unsigned i;
> -    PhysPageEntry *p;
> -
> -    if (lp->ptr == PHYS_MAP_NODE_NIL) {
> -        return;
> -    }
> -
> -    p = phys_map_nodes[lp->ptr];
> -    for (i = 0; i < L2_SIZE; ++i) {
> -        if (!p[i].is_leaf) {
> -            destroy_l2_mapping(&p[i], level - 1);
> -        } else {
> -            destroy_page_desc(p[i].ptr);
> -        }
> -    }
> -    lp->is_leaf = 0;
> -    lp->ptr = PHYS_MAP_NODE_NIL;
> -}
> -
> -static void destroy_all_mappings(AddressSpaceDispatch *d)
> -{
> -    destroy_l2_mapping(&d->phys_map, P_L2_LEVELS - 1);
> -    phys_map_nodes_reset();
> -}
>  
>  static uint16_t phys_section_add(MemoryRegionSection *section)
>  {
> +    AllocInfo *info = next_alloc_info;
>      /* The physical section number is ORed with a page-aligned
>       * pointer to produce the iotlb entries.  Thus it should
>       * never overflow into the page-aligned value.
>       */
> -    assert(phys_sections_nb < TARGET_PAGE_SIZE);
> +    assert(info->phys_sections_nb < TARGET_PAGE_SIZE);
>  
> -    if (phys_sections_nb == phys_sections_nb_alloc) {
> -        phys_sections_nb_alloc = MAX(phys_sections_nb_alloc * 2, 16);
> -        phys_sections = g_renew(MemoryRegionSection, phys_sections,
> -                                phys_sections_nb_alloc);
> +    if (info->phys_sections_nb == info->phys_sections_nb_alloc) {
> +        info->phys_sections_nb_alloc = MAX(info->phys_sections_nb_alloc * 2,
> +                                            16);
> +        next_phys_sections = g_renew(MemoryRegionSection, next_phys_sections,
> +                                info->phys_sections_nb_alloc);
>      }
> -    phys_sections[phys_sections_nb] = *section;
> +    next_phys_sections[info->phys_sections_nb] = *section;
>      memory_region_ref(section->mr);
> -    return phys_sections_nb++;
> +    return info->phys_sections_nb++;
>  }
>  
> -static void phys_sections_clear(void)
> +static void phys_sections_clear(unsigned cnt, MemoryRegionSection *mrs)
>  {
> -    while (phys_sections_nb > 0) {
> -        MemoryRegionSection *section = &phys_sections[--phys_sections_nb];
> +    while (cnt > 0) {
> +        MemoryRegionSection *section = &mrs[--cnt];
>          memory_region_unref(section->mr);
>      }
>  }
> @@ -780,7 +767,8 @@ static void register_subpage(AddressSpaceDispatch *d, MemoryRegionSection *secti
>      subpage_t *subpage;
>      hwaddr base = section->offset_within_address_space
>          & TARGET_PAGE_MASK;
> -    MemoryRegionSection *existing = phys_page_find(d, base >> TARGET_PAGE_BITS);
> +    MemoryRegionSection *existing = phys_page_find(d,
> +                                    base >> TARGET_PAGE_BITS, false);
>      MemoryRegionSection subsection = {
>          .offset_within_address_space = base,
>          .size = TARGET_PAGE_SIZE,
> @@ -1572,13 +1560,13 @@ static uint64_t subpage_read(void *opaque, hwaddr addr,
>      unsigned int idx = SUBPAGE_IDX(addr);
>      uint64_t val;
>  
> -    MemoryRegionSection *section;
> +    MemoryRegionSection *section, *base = cur_phys_sections;
>  #if defined(DEBUG_SUBPAGE)
>      printf("%s: subpage %p len %d addr " TARGET_FMT_plx " idx %d\n", __func__,
>             mmio, len, addr, idx);
>  #endif
>  
> -    section = &phys_sections[mmio->sub_section[idx]];
> +    section = &base[mmio->sub_section[idx]];
>      addr += mmio->base;
>      addr -= section->offset_within_address_space;
>      addr += section->offset_within_region;
> @@ -1591,14 +1579,14 @@ static void subpage_write(void *opaque, hwaddr addr,
>  {
>      subpage_t *mmio = opaque;
>      unsigned int idx = SUBPAGE_IDX(addr);
> -    MemoryRegionSection *section;
> +    MemoryRegionSection *section, *base = cur_phys_sections;
>  #if defined(DEBUG_SUBPAGE)
>      printf("%s: subpage %p len %d addr " TARGET_FMT_plx
>             " idx %d value %"PRIx64"\n",
>             __func__, mmio, len, addr, idx, value);
>  #endif
>  
> -    section = &phys_sections[mmio->sub_section[idx]];
> +    section = &base[mmio->sub_section[idx]];
>      addr += mmio->base;
>      addr -= section->offset_within_address_space;
>      addr += section->offset_within_region;
> @@ -1610,14 +1598,14 @@ static bool subpage_accepts(void *opaque, hwaddr addr,
>  {
>      subpage_t *mmio = opaque;
>      unsigned int idx = SUBPAGE_IDX(addr);
> -    MemoryRegionSection *section;
> +    MemoryRegionSection *section, *base = cur_phys_sections;;
>  #if defined(DEBUG_SUBPAGE)
>      printf("%s: subpage %p %c len %d addr " TARGET_FMT_plx
>             " idx %d\n", __func__, mmio,
>             is_write ? 'w' : 'r', len, addr, idx);
>  #endif
>  
> -    section = &phys_sections[mmio->sub_section[idx]];
> +    section = &base[mmio->sub_section[idx]];
>      addr += mmio->base;
>      addr -= section->offset_within_address_space;
>      addr += section->offset_within_region;
> @@ -1676,8 +1664,8 @@ static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
>      printf("%s: %p start %08x end %08x idx %08x eidx %08x mem %ld\n", __func__,
>             mmio, start, end, idx, eidx, memory);
>  #endif
> -    if (memory_region_is_ram(phys_sections[section].mr)) {
> -        MemoryRegionSection new_section = phys_sections[section];
> +    if (memory_region_is_ram(next_phys_sections[section].mr)) {
> +        MemoryRegionSection new_section = next_phys_sections[section];
>          new_section.mr = &io_mem_subpage_ram;
>          section = phys_section_add(&new_section);
>      }
> @@ -1721,7 +1709,7 @@ static uint16_t dummy_section(MemoryRegion *mr)
>  
>  MemoryRegion *iotlb_to_region(hwaddr index)
>  {
> -    return phys_sections[index & ~TARGET_PAGE_MASK].mr;
> +    return cur_phys_sections[index & ~TARGET_PAGE_MASK].mr;
>  }
>  
>  static void io_mem_init(void)
> @@ -1741,7 +1729,6 @@ static void mem_begin(MemoryListener *listener)
>  {
>      AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch, listener);
>  
> -    destroy_all_mappings(d);
>      d->phys_map.ptr = PHYS_MAP_NODE_NIL;
>  }
>  
> @@ -1749,7 +1736,7 @@ static void core_begin(MemoryListener *listener)
>  {
>      uint16_t n;
>  
> -    phys_sections_clear();
> +    next_alloc_info = g_malloc0(sizeof(AllocInfo));
>      n = dummy_section(&io_mem_unassigned);
>      assert(phys_section_unassigned == n);
>      n = dummy_section(&io_mem_notdirty);
> @@ -1760,6 +1747,35 @@ static void core_begin(MemoryListener *listener)
>      assert(phys_section_watch == n);
>  }
>  
> +static void release_dispatch_map(AllocInfo *info)
> +{
> +    phys_sections_clear(info->phys_sections_nb, info->sections);
> +    g_free(info->map);
> +    g_free(info->sections);
> +    g_free(info);
> +}
> +
> +/* This listener's commit run after the other AddressSpaceDispatch listeners'.
> + * It means that AddressSpaceDispatch's deleter has finished, so it can be
> + * the place for call_rcu()
> + */
> +static void core_commit(MemoryListener *listener)
> +{
> +    AllocInfo *info = cur_alloc_info;
> +    info->map = cur_map_nodes;
> +    info->sections = cur_phys_sections;
> +
> +    /* Fix me, in future, will be protected by wr seqlock when in parellel
> +     * with reader
> +     */
> +    cur_map_nodes = next_map_nodes;
> +    cur_phys_sections = next_phys_sections;
> +    cur_alloc_info = next_alloc_info;
> +
> +    /* Fix me, will changed to call_rcu */
> +    release_dispatch_map(info);
> +}
> +
>  static void tcg_commit(MemoryListener *listener)
>  {
>      CPUArchState *env;
> @@ -1802,6 +1818,7 @@ static void io_region_del(MemoryListener *listener,
>  
>  static MemoryListener core_memory_listener = {
>      .begin = core_begin,
> +    .commit = core_commit,
>      .log_global_start = core_log_global_start,
>      .log_global_stop = core_log_global_stop,
>      .priority = 1,
> @@ -1837,7 +1854,6 @@ void address_space_destroy_dispatch(AddressSpace *as)
>      AddressSpaceDispatch *d = as->dispatch;
>  
>      memory_listener_unregister(&d->listener);
> -    destroy_l2_mapping(&d->phys_map, P_L2_LEVELS - 1);
>      g_free(d);
>      as->dispatch = NULL;
>  }
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v1 5/6] mem: make dispatch path satify rcu style
  2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 5/6] mem: make dispatch path satify rcu style Liu Ping Fan
  2013-05-29  7:06   ` Paolo Bonzini
@ 2013-05-29  7:15   ` Paolo Bonzini
  1 sibling, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2013-05-29  7:15 UTC (permalink / raw)
  To: Liu Ping Fan; +Cc: qemu-devel, Anthony Liguori

Il 29/05/2013 04:11, Liu Ping Fan ha scritto:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Using seqlock to load dispatch context atomic. The dispatch
> context consist of: cur_map_nodes, cur_sections, cur_roots.
> 
> Also during the dispatch, we should get the terminal, and dup
> MemoryRegionSection. So after rcu unlock, the cur dispatch
> context can be dropped safely.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> --------
> The tcg code related to tlb_set_page() should be fixed too.
> But it is a little complicated, separating it for easing review.
> ---
>  exec.c |  239 ++++++++++++++++++++++++++++++++++++++++-----------------------
>  1 files changed, 152 insertions(+), 87 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 315960d..9a5c67f 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -112,6 +112,8 @@ static Node *next_map_nodes;
>  static PhysPageEntry *next_roots;
>  static AllocInfo *next_alloc_info;
>  
> +QemuSeqLock dispatch_table_lock;

You didn't include the seqlock implementation.

> +
>  #define PHYS_MAP_NODE_NIL (((uint16_t)~0) >> 1)
>  
>  static void io_mem_init(void);
> @@ -199,31 +201,24 @@ static void phys_page_set(AddressSpaceDispatch *d,
>                          P_L2_LEVELS - 1);
>  }
>  
> -static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr index, bool cur)
> +static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d,
> +            hwaddr index,
> +            Node *map_base,
> +            MemoryRegionSection *sections_base,
> +            PhysPageEntry *root_base)
>  {
> -    PhysPageEntry lp;
> +    PhysPageEntry lp = root_base[d->idx];
>      PhysPageEntry *p;
>      int i;
> -    Node *map_nodes;
> -    MemoryRegionSection *sections;
>  
> -    if (likely(cur)) {
> -        map_nodes = cur_map_nodes;
> -        sections = cur_phys_sections;
> -        lp = cur_roots[d->idx];
> -    } else {
> -        map_nodes = next_map_nodes;
> -        sections = next_phys_sections;
> -        lp = next_roots[d->idx];
> -    }
>      for (i = P_L2_LEVELS - 1; i >= 0 && !lp.is_leaf; i--) {
>          if (lp.ptr == PHYS_MAP_NODE_NIL) {
> -            return &sections[phys_section_unassigned];
> +            return &sections_base[phys_section_unassigned];
>          }
> -        p = map_nodes[lp.ptr];
> +        p = map_base[lp.ptr];
>          lp = p[(index >> (i * L2_BITS)) & (L2_SIZE - 1)];
>      }
> -    return &sections[lp.ptr];
> +    return &sections_base[lp.ptr];
>  }
>  
>  bool memory_region_is_unassigned(MemoryRegion *mr)
> @@ -233,21 +228,28 @@ bool memory_region_is_unassigned(MemoryRegion *mr)
>  }
>  
>  static MemoryRegionSection *address_space_lookup_region(AddressSpace *as,
> -                                                        hwaddr addr)
> +        hwaddr addr, Node *map_base, MemoryRegionSection *sections_base,
> +        PhysPageEntry *root_base)
> +
>  {
> -    return phys_page_find(as->dispatch, addr >> TARGET_PAGE_BITS, true);
> +    return phys_page_find(as->dispatch, addr >> TARGET_PAGE_BITS,
> +                                    map_base, sections_base, root_base);
>  }
>  
>  MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr,
>                                               hwaddr *xlat, hwaddr *plen,
> -                                             bool is_write)
> +                                             bool is_write,
> +                                             Node *map_base,
> +                                             MemoryRegionSection *sections_base,
> +                                             PhysPageEntry *root_base)
>  {
>      IOMMUTLBEntry iotlb;
> -    MemoryRegionSection *section, *base = cur_phys_sections;
> +    MemoryRegionSection *section;
>      hwaddr len = *plen;
>  
>      for (;;) {
> -        section = address_space_lookup_region(as, addr);
> +        section = address_space_lookup_region(as, addr, map_base,
> +                        sections_base, root_base);
>  
>          /* Compute offset within MemoryRegionSection */
>          addr -= section->offset_within_address_space;
> @@ -265,7 +267,7 @@ MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr,
>                  | (addr & iotlb.addr_mask));
>          len = MIN(len, (addr | iotlb.addr_mask) - addr + 1);
>          if (!(iotlb.perm & (1 << is_write))) {
> -            section = &base[phys_section_unassigned];
> +            section = &sections_base[phys_section_unassigned];
>              break;
>          }
>  
> @@ -276,6 +278,42 @@ MemoryRegionSection *address_space_translate(AddressSpace *as, hwaddr addr,
>      *xlat = addr;
>      return section;
>  }
> +
> +/* caller should hold rcu read lock */
> +MemoryRegionSection address_space_get_terminal(AddressSpace *as, hwaddr addr,
> +                                             hwaddr *xlat, hwaddr *plen,
> +                                             bool is_write)
> +{
> +    MemoryRegionSection *mrs, *ret, *sections_base;
> +    Node *map_base;
> +    subpage_t *mmio;
> +    unsigned int idx, start;
> +    PhysPageEntry *root_base;
> +
> +    /* Walk through all of the AddressSpace in consistent */
> +    do {
> +        start = seqlock_read_begin(&dispatch_table_lock);
> +        map_base = cur_map_nodes;
> +        sections_base = cur_phys_sections;
> +        /* get all AddressSpaceDispatch root */
> +        root_base = cur_roots;
> +    } while (seqlock_read_check(&dispatch_table_lock, start);
> +
> +    mrs = address_space_translate(as, addr, xlat, plen, is_write,
> +                    map_base, sections_base, root_base);
> +    if (!mrs->mr->terminates) {

Do you want to test mrs->mr->subpage instead?  Very similar code is
already in address_space_lookup_region, and should be there.

address_space_lookup_region should ref the region, and the callers
should unref it.  This way, only address_space_lookup_region needs to
host the RCU critical section.

Adding the ref/unref should be a separate patch.  Even before, we should
change address_space_translate and address_space_translate_for_iotlb to
return a MemoryRegion instead of a MemoryRegionSection.

> +        mmio = container_of(mrs->mr, subpage_t, mmio);
> +        idx = SUBPAGE_IDX(addr);
> +        ret = &sections_base[mmio->sub_section[idx]];
> +        *xlat += mmio->base;
> +        *xlat -= mrs->offset_within_address_space;
> +        *xlat += mrs->offset_within_region;
> +    } else {
> +        ret = mrs;
> +    }
> +
> +    return *ret;
> +}
>  #endif
>  
>  void cpu_exec_init_all(void)
> @@ -1777,13 +1815,12 @@ static void core_commit(MemoryListener *listener)
>      info->sections = cur_phys_sections;
>      info->roots = cur_roots;
>  
> -    /* Fix me, in future, will be protected by wr seqlock when in parellel
> -     * with reader
> -     */
> +    seqlock_write_lock(&dispatch_table_lock);
>      cur_map_nodes = next_map_nodes;
>      cur_phys_sections = next_phys_sections;
>      cur_alloc_info = next_alloc_info;
>      cur_roots = next_roots;
> +    seqlock_write_unlock(&dispatch_table_lock);
>  
>      /* since each CPU stores ram addresses in its TLB cache, we must
>         reset the modified entries */
> @@ -1890,8 +1927,13 @@ void address_space_destroy_dispatch(AddressSpace *as)
>  
>  static void memory_map_init(void)
>  {
> +    QemuMutex *mutex;
> +
>      qemu_mutex_init(&id_lock);
>      address_space_id_map = g_malloc0(MAX_IDX/BITS_PER_BYTE);
> +    mutex = g_malloc0(sizeof(QemuMutex));
> +    qemu_mutex_init(mutex);

No need for a separate mutex, we can use the BQL.

> +    seqlock_init(&dispatch_table_lock, mutex);
>      system_memory = g_malloc(sizeof(*system_memory));
>      memory_region_init(system_memory, "system", INT64_MAX);
>      address_space_init(&address_space_memory, system_memory, "memory");
> @@ -2001,59 +2043,68 @@ bool address_space_rw(AddressSpace *as, hwaddr addr, uint8_t *buf,
>      uint8_t *ptr;
>      uint64_t val;
>      hwaddr addr1;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>      bool error = false;
>  
>      while (len > 0) {
>          l = len;
> -        section = address_space_translate(as, addr, &addr1, &l, is_write);
> +        rcu_read_lock();
> +        section = address_space_get_terminal(as, addr, &addr1, &l, is_write);
>  
>          if (is_write) {
> -            if (!memory_access_is_direct(section->mr, is_write)) {
> +            if (!memory_access_is_direct(section.mr, is_write)) {
>                  l = memory_access_size(l, addr1);
>                  /* XXX: could force cpu_single_env to NULL to avoid
>                     potential bugs */
> +                memory_region_ref(section.mr);
> +                rcu_read_unlock();
>                  if (l == 4) {
>                      /* 32 bit write access */
>                      val = ldl_p(buf);
> -                    error |= io_mem_write(section->mr, addr1, val, 4);
> +                    error |= io_mem_write(section.mr, addr1, val, 4);
>                  } else if (l == 2) {
>                      /* 16 bit write access */
>                      val = lduw_p(buf);
> -                    error |= io_mem_write(section->mr, addr1, val, 2);
> +                    error |= io_mem_write(section.mr, addr1, val, 2);
>                  } else {
>                      /* 8 bit write access */
>                      val = ldub_p(buf);
> -                    error |= io_mem_write(section->mr, addr1, val, 1);
> +                    error |= io_mem_write(section.mr, addr1, val, 1);
>                  }
> +                memory_region_unref(section.mr);
>              } else {
> -                addr1 += memory_region_get_ram_addr(section->mr);
> +                addr1 += memory_region_get_ram_addr(section.mr);
>                  /* RAM case */
>                  ptr = qemu_get_ram_ptr(addr1);
>                  memcpy(ptr, buf, l);
>                  invalidate_and_set_dirty(addr1, l);
> +                rcu_read_unlock();
>              }
>          } else {
> -            if (!memory_access_is_direct(section->mr, is_write)) {
> +            if (!memory_access_is_direct(section.mr, is_write)) {
>                  /* I/O case */
> +                memory_region_ref(section.mr);
> +                rcu_read_unlock();
>                  l = memory_access_size(l, addr1);
>                  if (l == 4) {
>                      /* 32 bit read access */
> -                    error |= io_mem_read(section->mr, addr1, &val, 4);
> +                    error |= io_mem_read(section.mr, addr1, &val, 4);
>                      stl_p(buf, val);
>                  } else if (l == 2) {
>                      /* 16 bit read access */
> -                    error |= io_mem_read(section->mr, addr1, &val, 2);
> +                    error |= io_mem_read(section.mr, addr1, &val, 2);
>                      stw_p(buf, val);
>                  } else {
>                      /* 8 bit read access */
> -                    error |= io_mem_read(section->mr, addr1, &val, 1);
> +                    error |= io_mem_read(section.mr, addr1, &val, 1);
>                      stb_p(buf, val);
>                  }
> +                memory_region_unref(section.mr);
>              } else {
>                  /* RAM case */
> -                ptr = qemu_get_ram_ptr(section->mr->ram_addr + addr1);
> +                ptr = qemu_get_ram_ptr(section.mr->ram_addr + addr1);
>                  memcpy(buf, ptr, l);
> +                rcu_read_unlock();
>              }
>          }
>          len -= l;
> @@ -2089,18 +2140,19 @@ void cpu_physical_memory_write_rom(hwaddr addr,
>      hwaddr l;
>      uint8_t *ptr;
>      hwaddr addr1;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>  
> +    rcu_read_lock();
>      while (len > 0) {
>          l = len;
> -        section = address_space_translate(&address_space_memory,
> +        section = address_space_get_terminal(&address_space_memory,
>                                            addr, &addr1, &l, true);
>  
> -        if (!(memory_region_is_ram(section->mr) ||
> -              memory_region_is_romd(section->mr))) {
> +        if (!(memory_region_is_ram(section.mr) ||
> +              memory_region_is_romd(section.mr))) {
>              /* do nothing */
>          } else {
> -            addr1 += memory_region_get_ram_addr(section->mr);
> +            addr1 += memory_region_get_ram_addr(section.mr);
>              /* ROM/RAM case */
>              ptr = qemu_get_ram_ptr(addr1);
>              memcpy(ptr, buf, l);
> @@ -2110,6 +2162,7 @@ void cpu_physical_memory_write_rom(hwaddr addr,
>          buf += l;
>          addr += l;
>      }
> +    rcu_read_unlock();
>  }
>  
>  typedef struct {
> @@ -2161,15 +2214,15 @@ static void cpu_notify_map_clients(void)
>  
>  bool address_space_access_valid(AddressSpace *as, hwaddr addr, int len, bool is_write)
>  {
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>      hwaddr l, xlat;
>  
>      while (len > 0) {
>          l = len;
> -        section = address_space_translate(as, addr, &xlat, &l, is_write);
> -        if (!memory_access_is_direct(section->mr, is_write)) {
> +        section = address_space_get_terminal(as, addr, &xlat, &l, is_write);
> +        if (!memory_access_is_direct(section.mr, is_write)) {
>              l = memory_access_size(l, addr);
> -            if (!memory_region_access_valid(section->mr, xlat, l, is_write)) {
> +            if (!memory_region_access_valid(section.mr, xlat, l, is_write)) {
>                  return false;
>              }
>          }
> @@ -2195,14 +2248,14 @@ void *address_space_map(AddressSpace *as,
>      hwaddr len = *plen;
>      hwaddr todo = 0;
>      hwaddr l, xlat;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>      ram_addr_t raddr = RAM_ADDR_MAX;
>      ram_addr_t rlen;
>      void *ret;
>  
>      while (len > 0) {
>          l = len;
> -        section = address_space_translate(as, addr, &xlat, &l, is_write);
> +        section = address_space_get_terminal(as, addr, &xlat, &l, is_write);
>  
>          if (!memory_access_is_direct(section->mr, is_write)) {
>              if (todo || bounce.buffer) {
> @@ -2211,20 +2264,20 @@ void *address_space_map(AddressSpace *as,
>              bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, TARGET_PAGE_SIZE);
>              bounce.addr = addr;
>              bounce.len = l;
> -            bounce.mr = section->mr;
> +            bounce.mr = section.mr;
>              if (!is_write) {
>                  address_space_read(as, addr, bounce.buffer, l);
>              }
>  
>              *plen = l;
> -            memory_region_ref(section->mr);
> +            memory_region_ref(section.mr);
>              return bounce.buffer;
>          }
>          if (!todo) {
> -            raddr = memory_region_get_ram_addr(section->mr) + xlat;
> -            memory_region_ref(section->mr);
> +            raddr = memory_region_get_ram_addr(section.mr) + xlat;
> +            memory_region_ref(section.mr);
>          } else {
> -            if (memory_region_get_ram_addr(section->mr) + xlat != raddr + todo) {
> +            if (memory_region_get_ram_addr(section.mr) + xlat != raddr + todo) {
>                  break;
>              }
>          }
> @@ -2297,15 +2350,17 @@ static inline uint32_t ldl_phys_internal(hwaddr addr,
>  {
>      uint8_t *ptr;
>      uint64_t val;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>      hwaddr l = 4;
>      hwaddr addr1;
>  
> -    section = address_space_translate(&address_space_memory, addr, &addr1, &l,
> -                                      false);
> +    rcu_read_lock();
> +    section = address_space_get_terminal(&address_space_memory, addr, &addr1,
> +                                    &l, false);
> +    rcu_read_unlock();
>      if (l < 4 || !memory_access_is_direct(section->mr, false)) {
>          /* I/O case */
> -        io_mem_read(section->mr, addr1, &val, 4);
> +        io_mem_read(section.mr, addr1, &val, 4);
>  #if defined(TARGET_WORDS_BIGENDIAN)
>          if (endian == DEVICE_LITTLE_ENDIAN) {
>              val = bswap32(val);
> @@ -2360,8 +2415,10 @@ static inline uint64_t ldq_phys_internal(hwaddr addr,
>      hwaddr l = 8;
>      hwaddr addr1;
>  
> -    section = address_space_translate(&address_space_memory, addr, &addr1, &l,
> -                                      false);
> +    rcu_read_lock();
> +    section = address_space_get_terminal(&address_space_memory, addr, &addr1,
> +                                    &l, false);
> +    rcu_read_unlock();
>      if (l < 8 || !memory_access_is_direct(section->mr, false)) {
>          /* I/O case */
>          io_mem_read(section->mr, addr1, &val, 8);
> @@ -2423,15 +2480,17 @@ static inline uint32_t lduw_phys_internal(hwaddr addr,
>  {
>      uint8_t *ptr;
>      uint64_t val;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>      hwaddr l = 2;
>      hwaddr addr1;
>  
> -    section = address_space_translate(&address_space_memory, addr, &addr1, &l,
> -                                      false);
> -    if (l < 2 || !memory_access_is_direct(section->mr, false)) {
> +    rcu_read_lock();
> +    section = address_space_get_terminal(&address_space_memory, addr, &addr1,
> +                                    &l, false);
> +    rcu_read_unlock();
> +    if (l < 2 || !memory_access_is_direct(section.mr, false)) {
>          /* I/O case */
> -        io_mem_read(section->mr, addr1, &val, 2);
> +        io_mem_read(section.mr, addr1, &val, 2);
>  #if defined(TARGET_WORDS_BIGENDIAN)
>          if (endian == DEVICE_LITTLE_ENDIAN) {
>              val = bswap16(val);
> @@ -2443,7 +2502,7 @@ static inline uint32_t lduw_phys_internal(hwaddr addr,
>  #endif
>      } else {
>          /* RAM case */
> -        ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(section->mr)
> +        ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(section.mr)
>                                  & TARGET_PAGE_MASK)
>                                 + addr1);
>          switch (endian) {
> @@ -2482,16 +2541,18 @@ uint32_t lduw_be_phys(hwaddr addr)
>  void stl_phys_notdirty(hwaddr addr, uint32_t val)
>  {
>      uint8_t *ptr;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>      hwaddr l = 4;
>      hwaddr addr1;
>  
> -    section = address_space_translate(&address_space_memory, addr, &addr1, &l,
> -                                      true);
> -    if (l < 4 || !memory_access_is_direct(section->mr, true)) {
> -        io_mem_write(section->mr, addr1, val, 4);
> +    rcu_read_lock();
> +    section = address_space_get_terminal(&address_space_memory, addr, &addr1,
> +                                    &l, true);
> +    rcu_read_unlock();
> +    if (l < 4 || !memory_access_is_direct(section.mr, true)) {
> +        io_mem_write(section.mr, addr1, val, 4);
>      } else {
> -        addr1 += memory_region_get_ram_addr(section->mr) & TARGET_PAGE_MASK;
> +        addr1 += memory_region_get_ram_addr(section.mr) & TARGET_PAGE_MASK;
>          ptr = qemu_get_ram_ptr(addr1);
>          stl_p(ptr, val);
>  
> @@ -2512,13 +2573,15 @@ static inline void stl_phys_internal(hwaddr addr, uint32_t val,
>                                       enum device_endian endian)
>  {
>      uint8_t *ptr;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>      hwaddr l = 4;
>      hwaddr addr1;
>  
> -    section = address_space_translate(&address_space_memory, addr, &addr1, &l,
> -                                      true);
> -    if (l < 4 || !memory_access_is_direct(section->mr, true)) {
> +    rcu_read_lock();
> +    section = address_space_get_terminal(&address_space_memory, addr, &addr1,
> +                                    &l, true);
> +    rcu_read_unlock();
> +    if (l < 4 || !memory_access_is_direct(section.mr, true)) {
>  #if defined(TARGET_WORDS_BIGENDIAN)
>          if (endian == DEVICE_LITTLE_ENDIAN) {
>              val = bswap32(val);
> @@ -2528,10 +2591,10 @@ static inline void stl_phys_internal(hwaddr addr, uint32_t val,
>              val = bswap32(val);
>          }
>  #endif
> -        io_mem_write(section->mr, addr1, val, 4);
> +        io_mem_write(section.mr, addr1, val, 4);
>      } else {
>          /* RAM case */
> -        addr1 += memory_region_get_ram_addr(section->mr) & TARGET_PAGE_MASK;
> +        addr1 += memory_region_get_ram_addr(section.mr) & TARGET_PAGE_MASK;
>          ptr = qemu_get_ram_ptr(addr1);
>          switch (endian) {
>          case DEVICE_LITTLE_ENDIAN:
> @@ -2575,13 +2638,13 @@ static inline void stw_phys_internal(hwaddr addr, uint32_t val,
>                                       enum device_endian endian)
>  {
>      uint8_t *ptr;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>      hwaddr l = 2;
>      hwaddr addr1;
>  
> -    section = address_space_translate(&address_space_memory, addr, &addr1, &l,
> -                                      true);
> -    if (l < 2 || !memory_access_is_direct(section->mr, true)) {
> +    section = address_space_get_terminal(&address_space_memory, addr, &addr1,
> +                                    &l, true);
> +    if (l < 2 || !memory_access_is_direct(section.mr, true)) {
>  #if defined(TARGET_WORDS_BIGENDIAN)
>          if (endian == DEVICE_LITTLE_ENDIAN) {
>              val = bswap16(val);
> @@ -2591,10 +2654,10 @@ static inline void stw_phys_internal(hwaddr addr, uint32_t val,
>              val = bswap16(val);
>          }
>  #endif
> -        io_mem_write(section->mr, addr1, val, 2);
> +        io_mem_write(section.mr, addr1, val, 2);
>      } else {
>          /* RAM case */
> -        addr1 += memory_region_get_ram_addr(section->mr) & TARGET_PAGE_MASK;
> +        addr1 += memory_region_get_ram_addr(section.mr) & TARGET_PAGE_MASK;
>          ptr = qemu_get_ram_ptr(addr1);
>          switch (endian) {
>          case DEVICE_LITTLE_ENDIAN:
> @@ -2696,13 +2759,15 @@ bool virtio_is_big_endian(void)
>  #ifndef CONFIG_USER_ONLY
>  bool cpu_physical_memory_is_io(hwaddr phys_addr)
>  {
> -    MemoryRegionSection *section;
> +    MemoryRegionSection section;
>      hwaddr l = 1;
>  
> -    section = address_space_translate(&address_space_memory,
> +    rcu_read_lock();
> +    section = address_space_get_terminal(&address_space_memory,
>                                        phys_addr, &phys_addr, &l, false);
> +    rcu_read_unlock();
>  
> -    return !(memory_region_is_ram(section->mr) ||
> -             memory_region_is_romd(section->mr));
> +    return !(memory_region_is_ram(section.mr) ||
> +             memory_region_is_romd(section.mr));
>  }
>  #endif
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v1 6/6] mem: change tcg code to rcu style
  2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 6/6] mem: change tcg code to " Liu Ping Fan
@ 2013-05-29  7:22   ` Paolo Bonzini
  2013-05-29  9:00     ` liu ping fan
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2013-05-29  7:22 UTC (permalink / raw)
  To: Liu Ping Fan; +Cc: qemu-devel, Anthony Liguori

Il 29/05/2013 04:11, Liu Ping Fan ha scritto:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> When adopting rcu style, for tcg code, need to fix two kind of path:
>  -tlb_set_page() will cache translation info.
>  -instruction emualation path
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ----
> Not sure about tcg code, so I took helper_st as the example.
> ---
>  cputlb.c                        |   25 +++++++++++++++++-
>  exec.c                          |    2 +-
>  include/exec/softmmu_template.h |   53 +++++++++++++++++++++++++++++++-------
>  3 files changed, 68 insertions(+), 12 deletions(-)
> 
> diff --git a/cputlb.c b/cputlb.c
> index 86666c8..6b55c70 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -249,6 +249,10 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr,
>      uintptr_t addend;
>      CPUTLBEntry *te;
>      hwaddr iotlb, xlat, sz;
> +    unsigned long start;
> +    Node *map_base;
> +    MemoryRegionSection *sections_base;
> +    PhysPageEntry *roots_base;
>  
>      assert(size >= TARGET_PAGE_SIZE);
>      if (size != TARGET_PAGE_SIZE) {
> @@ -256,8 +260,18 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr,
>      }
>  
>      sz = size;
> +
> +reload:
> +    rcu_read_lock();

It's really the whole of cpu_exec that has to be protected by a RCU
critical section.

> +    start = seqlock_read_begin(&dispatch_table_lock);
> +    map_base = cur_map_nodes;
> +    sections_base = cur_phys_sections;
> +    roots_base = cur_roots;

Why the seqlock_read_check at the end and not here?

Remember that this code is running under the BQL, so there is no need to
protect the TLB flushes otherwise.  There is also no need to do anything
as long as you ref the regions that are entered in the map, and unref
them when you destroy the map.  Those refs will protect TCG's TLB too.

In the end, all of TCG's execution consists of a large RCU critical section.

>      section = address_space_translate(&address_space_memory, paddr, &xlat, &sz,
> -                                      false);
> +                                      false,
> +                                      map_base,
> +                                      sections_base,
> +                                      roots_base);
>      assert(sz >= TARGET_PAGE_SIZE);
>  
>  #if defined(DEBUG_TLB)
> @@ -282,6 +296,9 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr,
>  
>      index = (vaddr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>      env->iotlb[mmu_idx][index] = iotlb - vaddr;
> +    /* Serialized with reader, so only need to worry about tlb_flush
> +      * in parellel.  If there is conflict, just reload tlb entry until right.
> +      */
>      te = &env->tlb_table[mmu_idx][index];
>      te->addend = addend - vaddr;
>      if (prot & PAGE_READ) {
> @@ -309,6 +326,12 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr,
>      } else {
>          te->addr_write = -1;
>      }
> +
> +    if (unlikely(seqlock_read_check(&dispatch_table_lock, start))) {
> +        rcu_read_unlock();
> +        goto reload;
> +    }
> +    rcu_read_unlock();
>  }
>  
>  /* NOTE: this function can trigger an exception */
> diff --git a/exec.c b/exec.c
> index 9a5c67f..d4ca101 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1820,7 +1820,6 @@ static void core_commit(MemoryListener *listener)
>      cur_phys_sections = next_phys_sections;
>      cur_alloc_info = next_alloc_info;
>      cur_roots = next_roots;
> -    seqlock_write_unlock(&dispatch_table_lock);
>  
>      /* since each CPU stores ram addresses in its TLB cache, we must
>         reset the modified entries */
> @@ -1828,6 +1827,7 @@ static void core_commit(MemoryListener *listener)
>      for (env = first_cpu; env != NULL; env = env->next_cpu) {
>          tlb_flush(env, 1);
>      }
> +    seqlock_write_unlock(&dispatch_table_lock);
>  
>      /* Fix me, will changed to call_rcu */
>      release_dispatch_map(info);
> diff --git a/include/exec/softmmu_template.h b/include/exec/softmmu_template.h
> index 8584902..7fa631e 100644
> --- a/include/exec/softmmu_template.h
> +++ b/include/exec/softmmu_template.h

There shouldn't be any need to change this.

> @@ -196,13 +196,16 @@ static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
>                                                     int mmu_idx,
>                                                     uintptr_t retaddr);
>  
> +/* Caller hold rcu read lock, this func will release lock */
>  static inline void glue(io_write, SUFFIX)(CPUArchState *env,
>                                            hwaddr physaddr,
>                                            DATA_TYPE val,
>                                            target_ulong addr,
> -                                          uintptr_t retaddr)
> +                                          uintptr_t retaddr,
> +                                          MemoryRegionSection *mrs_base)
>  {
> -    MemoryRegion *mr = iotlb_to_region(physaddr);
> +    subpage_t *subpg;
> +    MemoryRegion *mr = mrs_base[[physaddr & ~TARGET_PAGE_MASK].mr;
>  
>      physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
>      if (mr != &io_mem_rom && mr != &io_mem_notdirty && !can_do_io(env)) {
> @@ -211,6 +214,13 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
>  
>      env->mem_io_vaddr = addr;
>      env->mem_io_pc = retaddr;
> +    if (!mr->terminates) {
> +        subpg = container_of(mr, subpage_t, mmio);
> +        idx = SUBPAGE_IDX(addr);
> +        mr = mrs_base[subpg->sections[idx]].mr;
> +    }
> +    memory_region_ref(mr);
> +    rcu_read_unlock();
>      io_mem_write(mr, physaddr, val, 1 << SHIFT);
>  }
>  
> @@ -222,17 +232,28 @@ void glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
>      target_ulong tlb_addr;
>      uintptr_t retaddr;
>      int index;
> +    unsigned int start;
> +    MemoryRegionSection *mrs_base;
> +    uintptr_t addend;
>  
>      index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>   redo:
> -    tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> +    rcu_read_lock();
> +    do {
> +        start = seqlock_read_begin(&dispatch_table_lock);
> +        tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> +        ioaddr = env->iotlb[mmu_idx][index];
> +        addend = env->tlb_table[mmu_idx][index].addend;
> +        mrs_base =  cur_phys_sections;
> +    } while (seqlock_read_check(&dispatch_table_lock, start));
> +
>      if ((addr & TARGET_PAGE_MASK) == (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
>          if (tlb_addr & ~TARGET_PAGE_MASK) {
>              /* IO access */
>              if ((addr & (DATA_SIZE - 1)) != 0)
>                  goto do_unaligned_access;
>              retaddr = GETPC_EXT();
> -            ioaddr = env->iotlb[mmu_idx][index];
> +            /* will rcu_read_unlock() inside */
>              glue(io_write, SUFFIX)(env, ioaddr, val, addr, retaddr);
>          } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
>          do_unaligned_access:
> @@ -240,22 +261,23 @@ void glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
>  #ifdef ALIGNED_ONLY
>              do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
>  #endif
> +            rcu_read_unlock();
>              glue(glue(slow_st, SUFFIX), MMUSUFFIX)(env, addr, val,
>                                                     mmu_idx, retaddr);
>          } else {
>              /* aligned/unaligned access in the same page */
> -            uintptr_t addend;
>  #ifdef ALIGNED_ONLY
>              if ((addr & (DATA_SIZE - 1)) != 0) {
>                  retaddr = GETPC_EXT();
>                  do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
>              }
>  #endif
> -            addend = env->tlb_table[mmu_idx][index].addend;
>              glue(glue(st, SUFFIX), _raw)((uint8_t *)(intptr_t)
>                                           (addr + addend), val);
> +            rcu_read_unlock();
>          }
>      } else {
> +        rcu_read_unlock();
>          /* the page is not in the TLB : fill it */
>          retaddr = GETPC_EXT();
>  #ifdef ALIGNED_ONLY
> @@ -277,17 +299,25 @@ static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
>      hwaddr ioaddr;
>      target_ulong tlb_addr;
>      int index, i;
> +    uintptr_t addend;
>  
>      index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>   redo:
> -    tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> +    rcu_read_lock();
> +    do {
> +        start = seqlock_read_begin(&dispatch_table_lock);
> +        tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> +        ioaddr = env->iotlb[mmu_idx][index];
> +        addend = env->tlb_table[mmu_idx][index].addend;
> +        mrs_base =  cur_phys_sections;
> +    } while (seqlock_read_check(&dispatch_table_lock, start));
> +
>      if ((addr & TARGET_PAGE_MASK) == (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
>          if (tlb_addr & ~TARGET_PAGE_MASK) {
>              /* IO access */
>              if ((addr & (DATA_SIZE - 1)) != 0)
>                  goto do_unaligned_access;
> -            ioaddr = env->iotlb[mmu_idx][index];
> -            glue(io_write, SUFFIX)(env, ioaddr, val, addr, retaddr);
> +            glue(io_write, SUFFIX)(env, ioaddr, val, addr, retaddr, mrs_base);
>          } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
>          do_unaligned_access:
>              /* XXX: not efficient, but simple */
> @@ -295,10 +325,12 @@ static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
>               * previous page from the TLB cache.  */
>              for(i = DATA_SIZE - 1; i >= 0; i--) {
>  #ifdef TARGET_WORDS_BIGENDIAN
> +                rcu_read_unlock();
>                  glue(slow_stb, MMUSUFFIX)(env, addr + i,
>                                            val >> (((DATA_SIZE - 1) * 8) - (i * 8)),
>                                            mmu_idx, retaddr);
>  #else
> +                rcu_read_unlock();
>                  glue(slow_stb, MMUSUFFIX)(env, addr + i,
>                                            val >> (i * 8),
>                                            mmu_idx, retaddr);
> @@ -306,11 +338,12 @@ static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
>              }
>          } else {
>              /* aligned/unaligned access in the same page */
> -            uintptr_t addend = env->tlb_table[mmu_idx][index].addend;
>              glue(glue(st, SUFFIX), _raw)((uint8_t *)(intptr_t)
>                                           (addr + addend), val);
> +            rcu_read_unlock();
>          }
>      } else {
> +        rcu_read_unlock();
>          /* the page is not in the TLB : fill it */
>          tlb_fill(env, addr, 1, mmu_idx, retaddr);
>          goto redo;
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v1 4/6] mem: concenter the root of each AddressSpaceDispatch
  2013-05-29  7:03   ` Paolo Bonzini
@ 2013-05-29  7:48     ` liu ping fan
  2013-05-29  8:31       ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: liu ping fan @ 2013-05-29  7:48 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Anthony Liguori

On Wed, May 29, 2013 at 3:03 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 29/05/2013 04:11, Liu Ping Fan ha scritto:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> All of AddressSpaceDispatch's roots are part of dispatch context,
>> along with cur_map_nodes, cur_phys_sections, and we should walk
>> through AddressSpaceDispatchs in the same dispatch context, ie
>> the same memory topology.  Concenter the roots, so we can switch
>> to next more easily.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  exec.c                         |   48 ++++++++++++++++++++++++++++++++++++---
>>  include/exec/memory-internal.h |    2 +-
>>  2 files changed, 45 insertions(+), 5 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index eb69a98..315960d 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -49,6 +49,7 @@
>>  #include "translate-all.h"
>>
>>  #include "exec/memory-internal.h"
>> +#include "qemu/bitops.h"
>>
>>  //#define DEBUG_SUBPAGE
>>
>> @@ -95,6 +96,7 @@ typedef struct AllocInfo {
>>      /* Only used for release purpose */
>>      Node *map;
>>      MemoryRegionSection *sections;
>> +    PhysPageEntry *roots;
>>  } AllocInfo;
>
> I wouldn't put it here.  I would put it in AddressSpaceDispatch (as
> next_phys_map/next_sections/next_nodes: initialize
> next_sections/next_nodes in the "begin" hook, switch under seqlock in
> the "commit" hook).
>
Implement based on separate AddressSpaceDispatch is broken. We should
walk through the ASD chain under the same mem topology.  Take the
following scenario:
(ASD-x is followed by ASD-y)
walk through ASD-x under topology-A
          ----before walk on ASD-y,  mem topology changes, and switch
from A to B
continue to walk ASD-y in B (but it should be A not B)

To elimate such issue, I concenter the
roots/phys_sections/phys_map_nodes, so we can switch from topology-A
to B  atomiclly.

Regards,
Pingfan
> This requires refcounting AllocInfo, but it removes the need for the
> ->idx indirection and the id management.
>
> Paolo
>
>>  /* For performance, fetch page map related pointers directly, other than
>> @@ -102,10 +104,12 @@ typedef struct AllocInfo {
>>   */
>>  static MemoryRegionSection *cur_phys_sections;
>>  static Node *cur_map_nodes;
>> +static PhysPageEntry *cur_roots;
>>  static AllocInfo *cur_alloc_info;
>>
>>  static MemoryRegionSection *next_phys_sections;
>>  static Node *next_map_nodes;
>> +static PhysPageEntry *next_roots;
>>  static AllocInfo *next_alloc_info;
>>
>>  #define PHYS_MAP_NODE_NIL (((uint16_t)~0) >> 1)
>> @@ -191,12 +195,13 @@ static void phys_page_set(AddressSpaceDispatch *d,
>>      /* Wildly overreserve - it doesn't matter much. */
>>      phys_map_node_reserve(3 * P_L2_LEVELS);
>>
>> -    phys_page_set_level(&d->phys_map, &index, &nb, leaf, P_L2_LEVELS - 1);
>> +    phys_page_set_level(&next_roots[d->idx], &index, &nb, leaf,
>> +                        P_L2_LEVELS - 1);
>>  }
>>
>>  static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr index, bool cur)
>>  {
>> -    PhysPageEntry lp = d->phys_map;
>> +    PhysPageEntry lp;
>>      PhysPageEntry *p;
>>      int i;
>>      Node *map_nodes;
>> @@ -205,9 +210,11 @@ static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr index
>>      if (likely(cur)) {
>>          map_nodes = cur_map_nodes;
>>          sections = cur_phys_sections;
>> +        lp = cur_roots[d->idx];
>>      } else {
>>          map_nodes = next_map_nodes;
>>          sections = next_phys_sections;
>> +        lp = next_roots[d->idx];
>>      }
>>      for (i = P_L2_LEVELS - 1; i >= 0 && !lp.is_leaf; i--) {
>>          if (lp.ptr == PHYS_MAP_NODE_NIL) {
>> @@ -1729,7 +1736,8 @@ static void mem_begin(MemoryListener *listener)
>>  {
>>      AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch, listener);
>>
>> -    d->phys_map.ptr = PHYS_MAP_NODE_NIL;
>> +    next_roots[d->idx] = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL,
>> +                                    is_leaf = 0 };
>>  }
>>
>>  static void core_begin(MemoryListener *listener)
>> @@ -1737,6 +1745,7 @@ static void core_begin(MemoryListener *listener)
>>      uint16_t n;
>>
>>      next_alloc_info = g_malloc0(sizeof(AllocInfo));
>> +    next_roots = g_malloc0(2048*sizeof(PhysPageEntry));
>>      n = dummy_section(&io_mem_unassigned);
>>      assert(phys_section_unassigned == n);
>>      n = dummy_section(&io_mem_notdirty);
>> @@ -1752,6 +1761,7 @@ static void release_dispatch_map(AllocInfo *info)
>>      phys_sections_clear(info->phys_sections_nb, info->sections);
>>      g_free(info->map);
>>      g_free(info->sections);
>> +    g_free(info->roots);
>>      g_free(info);
>>  }
>>
>> @@ -1765,6 +1775,7 @@ static void core_commit(MemoryListener *listener)
>>      AllocInfo *info = cur_alloc_info;
>>      info->map = cur_map_nodes;
>>      info->sections = cur_phys_sections;
>> +    info->roots = cur_roots;
>>
>>      /* Fix me, in future, will be protected by wr seqlock when in parellel
>>       * with reader
>> @@ -1772,6 +1783,7 @@ static void core_commit(MemoryListener *listener)
>>      cur_map_nodes = next_map_nodes;
>>      cur_phys_sections = next_phys_sections;
>>      cur_alloc_info = next_alloc_info;
>> +    cur_roots = next_roots;
>>
>>      /* since each CPU stores ram addresses in its TLB cache, we must
>>         reset the modified entries */
>> @@ -1826,11 +1838,36 @@ static MemoryListener io_memory_listener = {
>>      .priority = 0,
>>  };
>>
>> +static MemoryListener tcg_memory_listener = {
>> +    .commit = tcg_commit,
>> +};
>
> Rebase artifact?
>
>> +#define MAX_IDX (1<<15)
>> +static unsigned long *address_space_id_map;
>> +static QemuMutex id_lock;
>> +
>> +static void address_space_release_id(int16_t idx)
>> +{
>> +    qemu_mutex_lock(&id_lock);
>> +    clear_bit(idx, address_space_id_map);
>> +    qemu_mutex_unlock(&id_lock);
>> +}
>> +
>> +static int16_t address_space_alloc_id()
>> +{
>> +    unsigned long idx;
>> +
>> +    qemu_mutex_lock(&id_lock);
>> +    idx = find_first_zero_bit(address_space_id_map, MAX_IDX/BITS_PER_LONG);
>> +    set_bit(idx, address_space_id_map);
>> +    qemu_mutex_unlock(&id_lock);
>> +}
>> +
>>  void address_space_init_dispatch(AddressSpace *as)
>>  {
>>      AddressSpaceDispatch *d = g_new(AddressSpaceDispatch, 1);
>>
>> -    d->phys_map  = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
>> +    d->idx = address_space_alloc_id();
>>      d->listener = (MemoryListener) {
>>          .begin = mem_begin,
>>          .region_add = mem_add,
>> @@ -1845,6 +1882,7 @@ void address_space_destroy_dispatch(AddressSpace *as)
>>  {
>>      AddressSpaceDispatch *d = as->dispatch;
>>
>> +    address_space_release_id(d->idx);
>>      memory_listener_unregister(&d->listener);
>>      g_free(d);
>>      as->dispatch = NULL;
>> @@ -1852,6 +1890,8 @@ void address_space_destroy_dispatch(AddressSpace *as)
>>
>>  static void memory_map_init(void)
>>  {
>> +    qemu_mutex_init(&id_lock);
>> +    address_space_id_map = g_malloc0(MAX_IDX/BITS_PER_BYTE);
>>      system_memory = g_malloc(sizeof(*system_memory));
>>      memory_region_init(system_memory, "system", INT64_MAX);
>>      address_space_init(&address_space_memory, system_memory, "memory");
>> diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
>> index 799c02a..54a3635 100644
>> --- a/include/exec/memory-internal.h
>> +++ b/include/exec/memory-internal.h
>> @@ -36,7 +36,7 @@ struct AddressSpaceDispatch {
>>      /* This is a multi-level map on the physical address space.
>>       * The bottom level has pointers to MemoryRegionSections.
>>       */
>> -    PhysPageEntry phys_map;
>> +    int16_t idx;
>>      MemoryListener listener;
>>  };
>>
>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v1 4/6] mem: concenter the root of each AddressSpaceDispatch
  2013-05-29  7:48     ` liu ping fan
@ 2013-05-29  8:31       ` Paolo Bonzini
  2013-05-29  9:24         ` liu ping fan
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2013-05-29  8:31 UTC (permalink / raw)
  To: liu ping fan; +Cc: qemu-devel, Anthony Liguori

Il 29/05/2013 09:48, liu ping fan ha scritto:
> On Wed, May 29, 2013 at 3:03 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 29/05/2013 04:11, Liu Ping Fan ha scritto:
>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>
>>> All of AddressSpaceDispatch's roots are part of dispatch context,
>>> along with cur_map_nodes, cur_phys_sections, and we should walk
>>> through AddressSpaceDispatchs in the same dispatch context, ie
>>> the same memory topology.  Concenter the roots, so we can switch
>>> to next more easily.
>>>
>>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>> ---
>>>  exec.c                         |   48 ++++++++++++++++++++++++++++++++++++---
>>>  include/exec/memory-internal.h |    2 +-
>>>  2 files changed, 45 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/exec.c b/exec.c
>>> index eb69a98..315960d 100644
>>> --- a/exec.c
>>> +++ b/exec.c
>>> @@ -49,6 +49,7 @@
>>>  #include "translate-all.h"
>>>
>>>  #include "exec/memory-internal.h"
>>> +#include "qemu/bitops.h"
>>>
>>>  //#define DEBUG_SUBPAGE
>>>
>>> @@ -95,6 +96,7 @@ typedef struct AllocInfo {
>>>      /* Only used for release purpose */
>>>      Node *map;
>>>      MemoryRegionSection *sections;
>>> +    PhysPageEntry *roots;
>>>  } AllocInfo;
>>
>> I wouldn't put it here.  I would put it in AddressSpaceDispatch (as
>> next_phys_map/next_sections/next_nodes: initialize
>> next_sections/next_nodes in the "begin" hook, switch under seqlock in
>> the "commit" hook).
>>
> Implement based on separate AddressSpaceDispatch is broken. We should
> walk through the ASD chain under the same mem topology.  Take the
> following scenario:
> (ASD-x is followed by ASD-y)
> walk through ASD-x under topology-A
>           ----before walk on ASD-y,  mem topology changes, and switch
> from A to B
> continue to walk ASD-y in B (but it should be A not B)

It is perfectly fine.  If you do a topology change during an access, and
the topology change includes a change for the region that is being
accessed, it is undefined whether you get the "before" or the "after".

In particular it is acceptable that, for example, the CPU accesses the
memory "after" the change and a device concurrently accesses the memory
"before" the change.

This is exactly why we can use RCU, in fact!

Paolo

> To elimate such issue, I concenter the
> roots/phys_sections/phys_map_nodes, so we can switch from topology-A
> to B  atomiclly.
> 
> Regards,
> Pingfan
>> This requires refcounting AllocInfo, but it removes the need for the
>> ->idx indirection and the id management.
>>
>> Paolo
>>
>>>  /* For performance, fetch page map related pointers directly, other than
>>> @@ -102,10 +104,12 @@ typedef struct AllocInfo {
>>>   */
>>>  static MemoryRegionSection *cur_phys_sections;
>>>  static Node *cur_map_nodes;
>>> +static PhysPageEntry *cur_roots;
>>>  static AllocInfo *cur_alloc_info;
>>>
>>>  static MemoryRegionSection *next_phys_sections;
>>>  static Node *next_map_nodes;
>>> +static PhysPageEntry *next_roots;
>>>  static AllocInfo *next_alloc_info;
>>>
>>>  #define PHYS_MAP_NODE_NIL (((uint16_t)~0) >> 1)
>>> @@ -191,12 +195,13 @@ static void phys_page_set(AddressSpaceDispatch *d,
>>>      /* Wildly overreserve - it doesn't matter much. */
>>>      phys_map_node_reserve(3 * P_L2_LEVELS);
>>>
>>> -    phys_page_set_level(&d->phys_map, &index, &nb, leaf, P_L2_LEVELS - 1);
>>> +    phys_page_set_level(&next_roots[d->idx], &index, &nb, leaf,
>>> +                        P_L2_LEVELS - 1);
>>>  }
>>>
>>>  static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr index, bool cur)
>>>  {
>>> -    PhysPageEntry lp = d->phys_map;
>>> +    PhysPageEntry lp;
>>>      PhysPageEntry *p;
>>>      int i;
>>>      Node *map_nodes;
>>> @@ -205,9 +210,11 @@ static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr index
>>>      if (likely(cur)) {
>>>          map_nodes = cur_map_nodes;
>>>          sections = cur_phys_sections;
>>> +        lp = cur_roots[d->idx];
>>>      } else {
>>>          map_nodes = next_map_nodes;
>>>          sections = next_phys_sections;
>>> +        lp = next_roots[d->idx];
>>>      }
>>>      for (i = P_L2_LEVELS - 1; i >= 0 && !lp.is_leaf; i--) {
>>>          if (lp.ptr == PHYS_MAP_NODE_NIL) {
>>> @@ -1729,7 +1736,8 @@ static void mem_begin(MemoryListener *listener)
>>>  {
>>>      AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch, listener);
>>>
>>> -    d->phys_map.ptr = PHYS_MAP_NODE_NIL;
>>> +    next_roots[d->idx] = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL,
>>> +                                    is_leaf = 0 };
>>>  }
>>>
>>>  static void core_begin(MemoryListener *listener)
>>> @@ -1737,6 +1745,7 @@ static void core_begin(MemoryListener *listener)
>>>      uint16_t n;
>>>
>>>      next_alloc_info = g_malloc0(sizeof(AllocInfo));
>>> +    next_roots = g_malloc0(2048*sizeof(PhysPageEntry));
>>>      n = dummy_section(&io_mem_unassigned);
>>>      assert(phys_section_unassigned == n);
>>>      n = dummy_section(&io_mem_notdirty);
>>> @@ -1752,6 +1761,7 @@ static void release_dispatch_map(AllocInfo *info)
>>>      phys_sections_clear(info->phys_sections_nb, info->sections);
>>>      g_free(info->map);
>>>      g_free(info->sections);
>>> +    g_free(info->roots);
>>>      g_free(info);
>>>  }
>>>
>>> @@ -1765,6 +1775,7 @@ static void core_commit(MemoryListener *listener)
>>>      AllocInfo *info = cur_alloc_info;
>>>      info->map = cur_map_nodes;
>>>      info->sections = cur_phys_sections;
>>> +    info->roots = cur_roots;
>>>
>>>      /* Fix me, in future, will be protected by wr seqlock when in parellel
>>>       * with reader
>>> @@ -1772,6 +1783,7 @@ static void core_commit(MemoryListener *listener)
>>>      cur_map_nodes = next_map_nodes;
>>>      cur_phys_sections = next_phys_sections;
>>>      cur_alloc_info = next_alloc_info;
>>> +    cur_roots = next_roots;
>>>
>>>      /* since each CPU stores ram addresses in its TLB cache, we must
>>>         reset the modified entries */
>>> @@ -1826,11 +1838,36 @@ static MemoryListener io_memory_listener = {
>>>      .priority = 0,
>>>  };
>>>
>>> +static MemoryListener tcg_memory_listener = {
>>> +    .commit = tcg_commit,
>>> +};
>>
>> Rebase artifact?
>>
>>> +#define MAX_IDX (1<<15)
>>> +static unsigned long *address_space_id_map;
>>> +static QemuMutex id_lock;
>>> +
>>> +static void address_space_release_id(int16_t idx)
>>> +{
>>> +    qemu_mutex_lock(&id_lock);
>>> +    clear_bit(idx, address_space_id_map);
>>> +    qemu_mutex_unlock(&id_lock);
>>> +}
>>> +
>>> +static int16_t address_space_alloc_id()
>>> +{
>>> +    unsigned long idx;
>>> +
>>> +    qemu_mutex_lock(&id_lock);
>>> +    idx = find_first_zero_bit(address_space_id_map, MAX_IDX/BITS_PER_LONG);
>>> +    set_bit(idx, address_space_id_map);
>>> +    qemu_mutex_unlock(&id_lock);
>>> +}
>>> +
>>>  void address_space_init_dispatch(AddressSpace *as)
>>>  {
>>>      AddressSpaceDispatch *d = g_new(AddressSpaceDispatch, 1);
>>>
>>> -    d->phys_map  = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
>>> +    d->idx = address_space_alloc_id();
>>>      d->listener = (MemoryListener) {
>>>          .begin = mem_begin,
>>>          .region_add = mem_add,
>>> @@ -1845,6 +1882,7 @@ void address_space_destroy_dispatch(AddressSpace *as)
>>>  {
>>>      AddressSpaceDispatch *d = as->dispatch;
>>>
>>> +    address_space_release_id(d->idx);
>>>      memory_listener_unregister(&d->listener);
>>>      g_free(d);
>>>      as->dispatch = NULL;
>>> @@ -1852,6 +1890,8 @@ void address_space_destroy_dispatch(AddressSpace *as)
>>>
>>>  static void memory_map_init(void)
>>>  {
>>> +    qemu_mutex_init(&id_lock);
>>> +    address_space_id_map = g_malloc0(MAX_IDX/BITS_PER_BYTE);
>>>      system_memory = g_malloc(sizeof(*system_memory));
>>>      memory_region_init(system_memory, "system", INT64_MAX);
>>>      address_space_init(&address_space_memory, system_memory, "memory");
>>> diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
>>> index 799c02a..54a3635 100644
>>> --- a/include/exec/memory-internal.h
>>> +++ b/include/exec/memory-internal.h
>>> @@ -36,7 +36,7 @@ struct AddressSpaceDispatch {
>>>      /* This is a multi-level map on the physical address space.
>>>       * The bottom level has pointers to MemoryRegionSections.
>>>       */
>>> -    PhysPageEntry phys_map;
>>> +    int16_t idx;
>>>      MemoryListener listener;
>>>  };
>>>
>>>
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v1 6/6] mem: change tcg code to rcu style
  2013-05-29  7:22   ` Paolo Bonzini
@ 2013-05-29  9:00     ` liu ping fan
  2013-05-29  9:03       ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: liu ping fan @ 2013-05-29  9:00 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Anthony Liguori

On Wed, May 29, 2013 at 3:22 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 29/05/2013 04:11, Liu Ping Fan ha scritto:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> When adopting rcu style, for tcg code, need to fix two kind of path:
>>  -tlb_set_page() will cache translation info.
>>  -instruction emualation path
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ----
>> Not sure about tcg code, so I took helper_st as the example.
>> ---
>>  cputlb.c                        |   25 +++++++++++++++++-
>>  exec.c                          |    2 +-
>>  include/exec/softmmu_template.h |   53 +++++++++++++++++++++++++++++++-------
>>  3 files changed, 68 insertions(+), 12 deletions(-)
>>
>> diff --git a/cputlb.c b/cputlb.c
>> index 86666c8..6b55c70 100644
>> --- a/cputlb.c
>> +++ b/cputlb.c
>> @@ -249,6 +249,10 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr,
>>      uintptr_t addend;
>>      CPUTLBEntry *te;
>>      hwaddr iotlb, xlat, sz;
>> +    unsigned long start;
>> +    Node *map_base;
>> +    MemoryRegionSection *sections_base;
>> +    PhysPageEntry *roots_base;
>>
>>      assert(size >= TARGET_PAGE_SIZE);
>>      if (size != TARGET_PAGE_SIZE) {
>> @@ -256,8 +260,18 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr,
>>      }
>>
>>      sz = size;
>> +
>> +reload:
>> +    rcu_read_lock();
>
> It's really the whole of cpu_exec that has to be protected by a RCU
> critical section.
>
Ok.
>> +    start = seqlock_read_begin(&dispatch_table_lock);
>> +    map_base = cur_map_nodes;
>> +    sections_base = cur_phys_sections;
>> +    roots_base = cur_roots;
>
> Why the seqlock_read_check at the end and not here?
>
In case the tlb_flush() empty tlb entry, and we record incorrect index

> Remember that this code is running under the BQL, so there is no need to
> protect the TLB flushes otherwise.  There is also no need to do anything

Do you mean that cpu_exec() can not be out of BQL like kvm path?  If
it is true, then there is no changes needed for tcg code, right?

> as long as you ref the regions that are entered in the map, and unref
> them when you destroy the map.  Those refs will protect TCG's TLB too.
>
Worry about whether the TLB is the right index in the right
phys_sections, if the mem topology changes between tlb_set_page() and
instruction emulated.

> In the end, all of TCG's execution consists of a large RCU critical section.
>
Even stronger? If it runs with BQL protection.

Regards,
Pingfan

[...]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v1 6/6] mem: change tcg code to rcu style
  2013-05-29  9:00     ` liu ping fan
@ 2013-05-29  9:03       ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2013-05-29  9:03 UTC (permalink / raw)
  To: liu ping fan; +Cc: qemu-devel, Anthony Liguori

Il 29/05/2013 11:00, liu ping fan ha scritto:
>> Remember that this code is running under the BQL, so there is no need to
>> protect the TLB flushes otherwise.  There is also no need to do anything
> 
> Do you mean that cpu_exec() can not be out of BQL like kvm path?  If
> it is true, then there is no changes needed for tcg code, right?

Indeed.  When/if a fine-grained lock is introduced, it will protect TLB
accesses vs. TLB flushes.  For now, the BQL will do.

Making TCG multi-threaded is a wholly different problem, and a superset
of what's needed for KVM.

(Xen instead is almost the same, only the mapcache has to be protected
by its own mutex).

>> as long as you ref the regions that are entered in the map, and unref
>> them when you destroy the map.  Those refs will protect TCG's TLB too.
>>
> Worry about whether the TLB is the right index in the right
> phys_sections, if the mem topology changes between tlb_set_page() and
> instruction emulated.

That's not possible because of BQL protection.

>> In the end, all of TCG's execution consists of a large RCU critical section.
>
> Even stronger? If it runs with BQL protection.

Yes.

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v1 1/6] mem: change variable to macro
  2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 1/6] mem: change variable to macro Liu Ping Fan
@ 2013-05-29  9:06   ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2013-05-29  9:06 UTC (permalink / raw)
  To: Liu Ping Fan; +Cc: qemu-devel, Anthony Liguori

Il 29/05/2013 04:11, Liu Ping Fan ha scritto:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> The secions like phys_section_unassigned always has fixed address
> in phys_sections, make them declared as macro, so we can use them
> when having more than one phys_sections, ie, more than one dispatch
> context.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  exec.c |   22 ++++++++++++++--------
>  1 files changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 281c007..6934c2d 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -82,10 +82,10 @@ int use_icount;
>  
>  static MemoryRegionSection *phys_sections;
>  static unsigned phys_sections_nb, phys_sections_nb_alloc;
> -static uint16_t phys_section_unassigned;
> -static uint16_t phys_section_notdirty;
> -static uint16_t phys_section_rom;
> -static uint16_t phys_section_watch;
> +#define phys_section_unassigned 0
> +#define phys_section_notdirty 1
> +#define phys_section_rom 2
> +#define phys_section_watch 3

Please make these uppercase.  Otherwise,

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

>  /* Simple allocator for PhysPageEntry nodes */
>  static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
> @@ -1747,11 +1747,17 @@ static void mem_begin(MemoryListener *listener)
>  
>  static void core_begin(MemoryListener *listener)
>  {
> +    uint16_t n;
> +
>      phys_sections_clear();
> -    phys_section_unassigned = dummy_section(&io_mem_unassigned);
> -    phys_section_notdirty = dummy_section(&io_mem_notdirty);
> -    phys_section_rom = dummy_section(&io_mem_rom);
> -    phys_section_watch = dummy_section(&io_mem_watch);
> +    n = dummy_section(&io_mem_unassigned);
> +    assert(phys_section_unassigned == n);
> +    n = dummy_section(&io_mem_notdirty);
> +    assert(phys_section_notdirty == n);
> +    n = dummy_section(&io_mem_rom);
> +    assert(phys_section_rom == n);
> +    n = dummy_section(&io_mem_watch);
> +    assert(phys_section_watch == n);
>  }
>  
>  static void tcg_commit(MemoryListener *listener)
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v1 4/6] mem: concenter the root of each AddressSpaceDispatch
  2013-05-29  8:31       ` Paolo Bonzini
@ 2013-05-29  9:24         ` liu ping fan
  2013-05-29 11:30           ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: liu ping fan @ 2013-05-29  9:24 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Anthony Liguori

On Wed, May 29, 2013 at 4:31 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 29/05/2013 09:48, liu ping fan ha scritto:
>> On Wed, May 29, 2013 at 3:03 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> Il 29/05/2013 04:11, Liu Ping Fan ha scritto:
>>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>>
>>>> All of AddressSpaceDispatch's roots are part of dispatch context,
>>>> along with cur_map_nodes, cur_phys_sections, and we should walk
>>>> through AddressSpaceDispatchs in the same dispatch context, ie
>>>> the same memory topology.  Concenter the roots, so we can switch
>>>> to next more easily.
>>>>
>>>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>> ---
>>>>  exec.c                         |   48 ++++++++++++++++++++++++++++++++++++---
>>>>  include/exec/memory-internal.h |    2 +-
>>>>  2 files changed, 45 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/exec.c b/exec.c
>>>> index eb69a98..315960d 100644
>>>> --- a/exec.c
>>>> +++ b/exec.c
>>>> @@ -49,6 +49,7 @@
>>>>  #include "translate-all.h"
>>>>
>>>>  #include "exec/memory-internal.h"
>>>> +#include "qemu/bitops.h"
>>>>
>>>>  //#define DEBUG_SUBPAGE
>>>>
>>>> @@ -95,6 +96,7 @@ typedef struct AllocInfo {
>>>>      /* Only used for release purpose */
>>>>      Node *map;
>>>>      MemoryRegionSection *sections;
>>>> +    PhysPageEntry *roots;
>>>>  } AllocInfo;
>>>
>>> I wouldn't put it here.  I would put it in AddressSpaceDispatch (as
>>> next_phys_map/next_sections/next_nodes: initialize
>>> next_sections/next_nodes in the "begin" hook, switch under seqlock in
>>> the "commit" hook).
>>>
>> Implement based on separate AddressSpaceDispatch is broken. We should
>> walk through the ASD chain under the same mem topology.  Take the
>> following scenario:
>> (ASD-x is followed by ASD-y)
>> walk through ASD-x under topology-A
>>           ----before walk on ASD-y,  mem topology changes, and switch
>> from A to B
>> continue to walk ASD-y in B (but it should be A not B)
>
> It is perfectly fine.  If you do a topology change during an access, and
> the topology change includes a change for the region that is being
> accessed, it is undefined whether you get the "before" or the "after".
>
> In particular it is acceptable that, for example, the CPU accesses the
> memory "after" the change and a device concurrently accesses the memory
> "before" the change.
>
The rcu reader is guaranteed to see obj-prev or obj-next in
atomically.  Just as you said " it is undefined whether you get the
"before" or the "after".  But for both cases, the obj should be
intact.  Here the "obj" is memory topology,  _not_  ADS(ADS just
express it in radix tree).  Combine ADS from different memory topology
will give us a broken obj, not atomically.
What is your opinion?

Thanks and regards,
Pingfan
> This is exactly why we can use RCU, in fact!
>
> Paolo
>
>> To elimate such issue, I concenter the
>> roots/phys_sections/phys_map_nodes, so we can switch from topology-A
>> to B  atomiclly.
>>
>> Regards,
>> Pingfan
>>> This requires refcounting AllocInfo, but it removes the need for the
>>> ->idx indirection and the id management.
>>>
>>> Paolo
>>>
>>>>  /* For performance, fetch page map related pointers directly, other than
>>>> @@ -102,10 +104,12 @@ typedef struct AllocInfo {
>>>>   */
>>>>  static MemoryRegionSection *cur_phys_sections;
>>>>  static Node *cur_map_nodes;
>>>> +static PhysPageEntry *cur_roots;
>>>>  static AllocInfo *cur_alloc_info;
>>>>
>>>>  static MemoryRegionSection *next_phys_sections;
>>>>  static Node *next_map_nodes;
>>>> +static PhysPageEntry *next_roots;
>>>>  static AllocInfo *next_alloc_info;
>>>>
>>>>  #define PHYS_MAP_NODE_NIL (((uint16_t)~0) >> 1)
>>>> @@ -191,12 +195,13 @@ static void phys_page_set(AddressSpaceDispatch *d,
>>>>      /* Wildly overreserve - it doesn't matter much. */
>>>>      phys_map_node_reserve(3 * P_L2_LEVELS);
>>>>
>>>> -    phys_page_set_level(&d->phys_map, &index, &nb, leaf, P_L2_LEVELS - 1);
>>>> +    phys_page_set_level(&next_roots[d->idx], &index, &nb, leaf,
>>>> +                        P_L2_LEVELS - 1);
>>>>  }
>>>>
>>>>  static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr index, bool cur)
>>>>  {
>>>> -    PhysPageEntry lp = d->phys_map;
>>>> +    PhysPageEntry lp;
>>>>      PhysPageEntry *p;
>>>>      int i;
>>>>      Node *map_nodes;
>>>> @@ -205,9 +210,11 @@ static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr index
>>>>      if (likely(cur)) {
>>>>          map_nodes = cur_map_nodes;
>>>>          sections = cur_phys_sections;
>>>> +        lp = cur_roots[d->idx];
>>>>      } else {
>>>>          map_nodes = next_map_nodes;
>>>>          sections = next_phys_sections;
>>>> +        lp = next_roots[d->idx];
>>>>      }
>>>>      for (i = P_L2_LEVELS - 1; i >= 0 && !lp.is_leaf; i--) {
>>>>          if (lp.ptr == PHYS_MAP_NODE_NIL) {
>>>> @@ -1729,7 +1736,8 @@ static void mem_begin(MemoryListener *listener)
>>>>  {
>>>>      AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch, listener);
>>>>
>>>> -    d->phys_map.ptr = PHYS_MAP_NODE_NIL;
>>>> +    next_roots[d->idx] = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL,
>>>> +                                    is_leaf = 0 };
>>>>  }
>>>>
>>>>  static void core_begin(MemoryListener *listener)
>>>> @@ -1737,6 +1745,7 @@ static void core_begin(MemoryListener *listener)
>>>>      uint16_t n;
>>>>
>>>>      next_alloc_info = g_malloc0(sizeof(AllocInfo));
>>>> +    next_roots = g_malloc0(2048*sizeof(PhysPageEntry));
>>>>      n = dummy_section(&io_mem_unassigned);
>>>>      assert(phys_section_unassigned == n);
>>>>      n = dummy_section(&io_mem_notdirty);
>>>> @@ -1752,6 +1761,7 @@ static void release_dispatch_map(AllocInfo *info)
>>>>      phys_sections_clear(info->phys_sections_nb, info->sections);
>>>>      g_free(info->map);
>>>>      g_free(info->sections);
>>>> +    g_free(info->roots);
>>>>      g_free(info);
>>>>  }
>>>>
>>>> @@ -1765,6 +1775,7 @@ static void core_commit(MemoryListener *listener)
>>>>      AllocInfo *info = cur_alloc_info;
>>>>      info->map = cur_map_nodes;
>>>>      info->sections = cur_phys_sections;
>>>> +    info->roots = cur_roots;
>>>>
>>>>      /* Fix me, in future, will be protected by wr seqlock when in parellel
>>>>       * with reader
>>>> @@ -1772,6 +1783,7 @@ static void core_commit(MemoryListener *listener)
>>>>      cur_map_nodes = next_map_nodes;
>>>>      cur_phys_sections = next_phys_sections;
>>>>      cur_alloc_info = next_alloc_info;
>>>> +    cur_roots = next_roots;
>>>>
>>>>      /* since each CPU stores ram addresses in its TLB cache, we must
>>>>         reset the modified entries */
>>>> @@ -1826,11 +1838,36 @@ static MemoryListener io_memory_listener = {
>>>>      .priority = 0,
>>>>  };
>>>>
>>>> +static MemoryListener tcg_memory_listener = {
>>>> +    .commit = tcg_commit,
>>>> +};
>>>
>>> Rebase artifact?
>>>
>>>> +#define MAX_IDX (1<<15)
>>>> +static unsigned long *address_space_id_map;
>>>> +static QemuMutex id_lock;
>>>> +
>>>> +static void address_space_release_id(int16_t idx)
>>>> +{
>>>> +    qemu_mutex_lock(&id_lock);
>>>> +    clear_bit(idx, address_space_id_map);
>>>> +    qemu_mutex_unlock(&id_lock);
>>>> +}
>>>> +
>>>> +static int16_t address_space_alloc_id()
>>>> +{
>>>> +    unsigned long idx;
>>>> +
>>>> +    qemu_mutex_lock(&id_lock);
>>>> +    idx = find_first_zero_bit(address_space_id_map, MAX_IDX/BITS_PER_LONG);
>>>> +    set_bit(idx, address_space_id_map);
>>>> +    qemu_mutex_unlock(&id_lock);
>>>> +}
>>>> +
>>>>  void address_space_init_dispatch(AddressSpace *as)
>>>>  {
>>>>      AddressSpaceDispatch *d = g_new(AddressSpaceDispatch, 1);
>>>>
>>>> -    d->phys_map  = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
>>>> +    d->idx = address_space_alloc_id();
>>>>      d->listener = (MemoryListener) {
>>>>          .begin = mem_begin,
>>>>          .region_add = mem_add,
>>>> @@ -1845,6 +1882,7 @@ void address_space_destroy_dispatch(AddressSpace *as)
>>>>  {
>>>>      AddressSpaceDispatch *d = as->dispatch;
>>>>
>>>> +    address_space_release_id(d->idx);
>>>>      memory_listener_unregister(&d->listener);
>>>>      g_free(d);
>>>>      as->dispatch = NULL;
>>>> @@ -1852,6 +1890,8 @@ void address_space_destroy_dispatch(AddressSpace *as)
>>>>
>>>>  static void memory_map_init(void)
>>>>  {
>>>> +    qemu_mutex_init(&id_lock);
>>>> +    address_space_id_map = g_malloc0(MAX_IDX/BITS_PER_BYTE);
>>>>      system_memory = g_malloc(sizeof(*system_memory));
>>>>      memory_region_init(system_memory, "system", INT64_MAX);
>>>>      address_space_init(&address_space_memory, system_memory, "memory");
>>>> diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
>>>> index 799c02a..54a3635 100644
>>>> --- a/include/exec/memory-internal.h
>>>> +++ b/include/exec/memory-internal.h
>>>> @@ -36,7 +36,7 @@ struct AddressSpaceDispatch {
>>>>      /* This is a multi-level map on the physical address space.
>>>>       * The bottom level has pointers to MemoryRegionSections.
>>>>       */
>>>> -    PhysPageEntry phys_map;
>>>> +    int16_t idx;
>>>>      MemoryListener listener;
>>>>  };
>>>>
>>>>
>>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v1 4/6] mem: concenter the root of each AddressSpaceDispatch
  2013-05-29  9:24         ` liu ping fan
@ 2013-05-29 11:30           ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2013-05-29 11:30 UTC (permalink / raw)
  To: liu ping fan; +Cc: qemu-devel, Anthony Liguori

Il 29/05/2013 11:24, liu ping fan ha scritto:
>>> Implement based on separate AddressSpaceDispatch is broken. We should
>>> walk through the ASD chain under the same mem topology.  Take the
>>> following scenario:
>>> (ASD-x is followed by ASD-y)
>>> walk through ASD-x under topology-A
>>>           ----before walk on ASD-y,  mem topology changes, and switch
>>> from A to B
>>> continue to walk ASD-y in B (but it should be A not B)
>>
>> It is perfectly fine.  If you do a topology change during an access, and
>> the topology change includes a change for the region that is being
>> accessed, it is undefined whether you get the "before" or the "after".
>>
>> In particular it is acceptable that, for example, the CPU accesses the
>> memory "after" the change and a device concurrently accesses the memory
>> "before" the change.
>>
> The rcu reader is guaranteed to see obj-prev or obj-next in
> atomically.  Just as you said " it is undefined whether you get the
> "before" or the "after".  But for both cases, the obj should be
> intact.  Here the "obj" is memory topology,  _not_  ADS(ADS just
> express it in radix tree).  Combine ADS from different memory topology
> will give us a broken obj, not atomically.

Each access is independent from the others.  Of course accesses to the
same address space must be consistent ("cannot go back in time"), but
there is no such constraint for accesses to different address spaces.

Let's assume we have a device that can access two address spaces.  x
points into the first address space, y points into the second address
space.  Both x and y end up accessing the memory region z.  The CPU
concurrently changes the mapping of z.

         Device                  VCPU
   ---------------------------------------------------
                                 change mapping of z
                                     | x's address space is updated
         a = x (new topology)        |
         b = y (old topology)        |
                                     | y's address space is updated
                                 done

I think this is legal, because the outcome of the device's access is
undefined.  The VCPU should not have changed the topology under the
device's feet.

The same is true if you have two VCPUs.

Paolo

> What is your opinion?
> 
> Thanks and regards,
> Pingfan
>> This is exactly why we can use RCU, in fact!
>>
>> Paolo
>>
>>> To elimate such issue, I concenter the
>>> roots/phys_sections/phys_map_nodes, so we can switch from topology-A
>>> to B  atomiclly.
>>>
>>> Regards,
>>> Pingfan
>>>> This requires refcounting AllocInfo, but it removes the need for the
>>>> ->idx indirection and the id management.
>>>>
>>>> Paolo
>>>>
>>>>>  /* For performance, fetch page map related pointers directly, other than
>>>>> @@ -102,10 +104,12 @@ typedef struct AllocInfo {
>>>>>   */
>>>>>  static MemoryRegionSection *cur_phys_sections;
>>>>>  static Node *cur_map_nodes;
>>>>> +static PhysPageEntry *cur_roots;
>>>>>  static AllocInfo *cur_alloc_info;
>>>>>
>>>>>  static MemoryRegionSection *next_phys_sections;
>>>>>  static Node *next_map_nodes;
>>>>> +static PhysPageEntry *next_roots;
>>>>>  static AllocInfo *next_alloc_info;
>>>>>
>>>>>  #define PHYS_MAP_NODE_NIL (((uint16_t)~0) >> 1)
>>>>> @@ -191,12 +195,13 @@ static void phys_page_set(AddressSpaceDispatch *d,
>>>>>      /* Wildly overreserve - it doesn't matter much. */
>>>>>      phys_map_node_reserve(3 * P_L2_LEVELS);
>>>>>
>>>>> -    phys_page_set_level(&d->phys_map, &index, &nb, leaf, P_L2_LEVELS - 1);
>>>>> +    phys_page_set_level(&next_roots[d->idx], &index, &nb, leaf,
>>>>> +                        P_L2_LEVELS - 1);
>>>>>  }
>>>>>
>>>>>  static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr index, bool cur)
>>>>>  {
>>>>> -    PhysPageEntry lp = d->phys_map;
>>>>> +    PhysPageEntry lp;
>>>>>      PhysPageEntry *p;
>>>>>      int i;
>>>>>      Node *map_nodes;
>>>>> @@ -205,9 +210,11 @@ static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr index
>>>>>      if (likely(cur)) {
>>>>>          map_nodes = cur_map_nodes;
>>>>>          sections = cur_phys_sections;
>>>>> +        lp = cur_roots[d->idx];
>>>>>      } else {
>>>>>          map_nodes = next_map_nodes;
>>>>>          sections = next_phys_sections;
>>>>> +        lp = next_roots[d->idx];
>>>>>      }
>>>>>      for (i = P_L2_LEVELS - 1; i >= 0 && !lp.is_leaf; i--) {
>>>>>          if (lp.ptr == PHYS_MAP_NODE_NIL) {
>>>>> @@ -1729,7 +1736,8 @@ static void mem_begin(MemoryListener *listener)
>>>>>  {
>>>>>      AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch, listener);
>>>>>
>>>>> -    d->phys_map.ptr = PHYS_MAP_NODE_NIL;
>>>>> +    next_roots[d->idx] = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL,
>>>>> +                                    is_leaf = 0 };
>>>>>  }
>>>>>
>>>>>  static void core_begin(MemoryListener *listener)
>>>>> @@ -1737,6 +1745,7 @@ static void core_begin(MemoryListener *listener)
>>>>>      uint16_t n;
>>>>>
>>>>>      next_alloc_info = g_malloc0(sizeof(AllocInfo));
>>>>> +    next_roots = g_malloc0(2048*sizeof(PhysPageEntry));
>>>>>      n = dummy_section(&io_mem_unassigned);
>>>>>      assert(phys_section_unassigned == n);
>>>>>      n = dummy_section(&io_mem_notdirty);
>>>>> @@ -1752,6 +1761,7 @@ static void release_dispatch_map(AllocInfo *info)
>>>>>      phys_sections_clear(info->phys_sections_nb, info->sections);
>>>>>      g_free(info->map);
>>>>>      g_free(info->sections);
>>>>> +    g_free(info->roots);
>>>>>      g_free(info);
>>>>>  }
>>>>>
>>>>> @@ -1765,6 +1775,7 @@ static void core_commit(MemoryListener *listener)
>>>>>      AllocInfo *info = cur_alloc_info;
>>>>>      info->map = cur_map_nodes;
>>>>>      info->sections = cur_phys_sections;
>>>>> +    info->roots = cur_roots;
>>>>>
>>>>>      /* Fix me, in future, will be protected by wr seqlock when in parellel
>>>>>       * with reader
>>>>> @@ -1772,6 +1783,7 @@ static void core_commit(MemoryListener *listener)
>>>>>      cur_map_nodes = next_map_nodes;
>>>>>      cur_phys_sections = next_phys_sections;
>>>>>      cur_alloc_info = next_alloc_info;
>>>>> +    cur_roots = next_roots;
>>>>>
>>>>>      /* since each CPU stores ram addresses in its TLB cache, we must
>>>>>         reset the modified entries */
>>>>> @@ -1826,11 +1838,36 @@ static MemoryListener io_memory_listener = {
>>>>>      .priority = 0,
>>>>>  };
>>>>>
>>>>> +static MemoryListener tcg_memory_listener = {
>>>>> +    .commit = tcg_commit,
>>>>> +};
>>>>
>>>> Rebase artifact?
>>>>
>>>>> +#define MAX_IDX (1<<15)
>>>>> +static unsigned long *address_space_id_map;
>>>>> +static QemuMutex id_lock;
>>>>> +
>>>>> +static void address_space_release_id(int16_t idx)
>>>>> +{
>>>>> +    qemu_mutex_lock(&id_lock);
>>>>> +    clear_bit(idx, address_space_id_map);
>>>>> +    qemu_mutex_unlock(&id_lock);
>>>>> +}
>>>>> +
>>>>> +static int16_t address_space_alloc_id()
>>>>> +{
>>>>> +    unsigned long idx;
>>>>> +
>>>>> +    qemu_mutex_lock(&id_lock);
>>>>> +    idx = find_first_zero_bit(address_space_id_map, MAX_IDX/BITS_PER_LONG);
>>>>> +    set_bit(idx, address_space_id_map);
>>>>> +    qemu_mutex_unlock(&id_lock);
>>>>> +}
>>>>> +
>>>>>  void address_space_init_dispatch(AddressSpace *as)
>>>>>  {
>>>>>      AddressSpaceDispatch *d = g_new(AddressSpaceDispatch, 1);
>>>>>
>>>>> -    d->phys_map  = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
>>>>> +    d->idx = address_space_alloc_id();
>>>>>      d->listener = (MemoryListener) {
>>>>>          .begin = mem_begin,
>>>>>          .region_add = mem_add,
>>>>> @@ -1845,6 +1882,7 @@ void address_space_destroy_dispatch(AddressSpace *as)
>>>>>  {
>>>>>      AddressSpaceDispatch *d = as->dispatch;
>>>>>
>>>>> +    address_space_release_id(d->idx);
>>>>>      memory_listener_unregister(&d->listener);
>>>>>      g_free(d);
>>>>>      as->dispatch = NULL;
>>>>> @@ -1852,6 +1890,8 @@ void address_space_destroy_dispatch(AddressSpace *as)
>>>>>
>>>>>  static void memory_map_init(void)
>>>>>  {
>>>>> +    qemu_mutex_init(&id_lock);
>>>>> +    address_space_id_map = g_malloc0(MAX_IDX/BITS_PER_BYTE);
>>>>>      system_memory = g_malloc(sizeof(*system_memory));
>>>>>      memory_region_init(system_memory, "system", INT64_MAX);
>>>>>      address_space_init(&address_space_memory, system_memory, "memory");
>>>>> diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
>>>>> index 799c02a..54a3635 100644
>>>>> --- a/include/exec/memory-internal.h
>>>>> +++ b/include/exec/memory-internal.h
>>>>> @@ -36,7 +36,7 @@ struct AddressSpaceDispatch {
>>>>>      /* This is a multi-level map on the physical address space.
>>>>>       * The bottom level has pointers to MemoryRegionSections.
>>>>>       */
>>>>> -    PhysPageEntry phys_map;
>>>>> +    int16_t idx;
>>>>>      MemoryListener listener;
>>>>>  };
>>>>>
>>>>>
>>>>
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-05-29 11:31 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-29  2:11 [Qemu-devel] [PATCH v1 0/6] make memory listener prepared for rcu style Liu Ping Fan
2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 1/6] mem: change variable to macro Liu Ping Fan
2013-05-29  9:06   ` Paolo Bonzini
2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 2/6] mem: make global dispatch table ready for rcu Liu Ping Fan
2013-05-29  7:07   ` Paolo Bonzini
2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 3/6] mem: fold tcg listener's logic into core memory listener Liu Ping Fan
2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 4/6] mem: concenter the root of each AddressSpaceDispatch Liu Ping Fan
2013-05-29  7:03   ` Paolo Bonzini
2013-05-29  7:48     ` liu ping fan
2013-05-29  8:31       ` Paolo Bonzini
2013-05-29  9:24         ` liu ping fan
2013-05-29 11:30           ` Paolo Bonzini
2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 5/6] mem: make dispatch path satify rcu style Liu Ping Fan
2013-05-29  7:06   ` Paolo Bonzini
2013-05-29  7:15   ` Paolo Bonzini
2013-05-29  2:11 ` [Qemu-devel] [PATCH v1 6/6] mem: change tcg code to " Liu Ping Fan
2013-05-29  7:22   ` Paolo Bonzini
2013-05-29  9:00     ` liu ping fan
2013-05-29  9:03       ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.