[PATCH 0/15 v2] prepare unplug out of protection of global lock

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/15 v2] prepare unplug out of protection of global lock
@ 2012-08-08  6:25 ` Liu Ping Fan
  0 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemulist, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

background:
refer to orignal plan posted by Marcelo Tosatti,
http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04315.html

prev version:
https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg03312.html


changes v1->v2

--introduce atomic ops
--introduce ref cnt for MemoryRegion
--make memory's flat view and radix tree toward rcu style.

^ permalink raw reply	[flat|nested] 154+ messages in thread

* [Qemu-devel] [PATCH 0/15 v2] prepare unplug out of protection of global lock
@ 2012-08-08  6:25 ` Liu Ping Fan
  0 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemulist, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

background:
refer to orignal plan posted by Marcelo Tosatti,
http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04315.html

prev version:
https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg03312.html


changes v1->v2

--introduce atomic ops
--introduce ref cnt for MemoryRegion
--make memory's flat view and radix tree toward rcu style.

^ permalink raw reply	[flat|nested] 154+ messages in thread

* [PATCH 01/15] atomic: introduce atomic operations
  2012-08-08  6:25 ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  6:25   ` Liu Ping Fan
  -1 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemulist, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

If out of global lock, we will be challenged by SMP in low level,
so need atomic ops.

This file is heavily copied from kernel. Currently, only x86 atomic ops
included, and will be extended for other arch for future.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 include/qemu/atomic.h |  161 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 161 insertions(+), 0 deletions(-)
 create mode 100644 include/qemu/atomic.h

diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
new file mode 100644
index 0000000..8e1fc3e
--- /dev/null
+++ b/include/qemu/atomic.h
@@ -0,0 +1,161 @@
+/*
+ * Simple interface for atomic operations.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef __QEMU_ATOMIC_H
+#define __QEMU_ATOMIC_H 1
+
+typedef struct Atomic {
+    int counter;
+} Atomic;
+
+
+#if defined(__i386__) || defined(__x86_64__)
+
+/**
+ *  * atomic_read - read atomic variable
+ *   * @v: pointer of type Atomic
+ *    *
+ *     * Atomically reads the value of @v.
+ *      */
+static inline int atomic_read(const Atomic *v)
+{
+    return (*(volatile int *)&(v)->counter);
+}
+
+/**
+ *  * atomic_set - set atomic variable
+ *   * @v: pointer of type Atomic
+ *    * @i: required value
+ *     *
+ *      * Atomically sets the value of @v to @i.
+ *       */
+static inline void atomic_set(Atomic *v, int i)
+{
+    v->counter = i;
+}
+
+/**
+ *  * atomic_add - add integer to atomic variable
+ *   * @i: integer value to add
+ *    * @v: pointer of type Atomic
+ *     *
+ *      * Atomically adds @i to @v.
+ *       */
+static inline void atomic_add(int i, Atomic *v)
+{
+    asm volatile("lock; addl %1,%0"
+             : "+m" (v->counter)
+             : "ir" (i));
+}
+
+/**
+ *  * atomic_sub - subtract integer from atomic variable
+ *   * @i: integer value to subtract
+ *    * @v: pointer of type Atomic
+ *     *
+ *      * Atomically subtracts @i from @v.
+ *       */
+static inline void atomic_sub(int i, Atomic *v)
+{
+    asm volatile("lock; subl %1,%0"
+             : "+m" (v->counter)
+             : "ir" (i));
+}
+
+/**
+ *  * atomic_sub_and_test - subtract value from variable and test result
+ *   * @i: integer value to subtract
+ *    * @v: pointer of type Atomic
+ *     *
+ *      * Atomically subtracts @i from @v and returns
+ *       * true if the result is zero, or false for all
+ *        * other cases.
+ *         */
+static inline int atomic_sub_and_test(int i, Atomic *v)
+{
+    unsigned char c;
+
+    asm volatile("lock; subl %2,%0; sete %1"
+             : "+m" (v->counter), "=qm" (c)
+             : "ir" (i) : "memory");
+    return c;
+}
+
+/**
+ *  * atomic_inc - increment atomic variable
+ *   * @v: pointer of type Atomic
+ *    *
+ *     * Atomically increments @v by 1.
+ *      */
+static inline void atomic_inc(Atomic *v)
+{
+    asm volatile("lock; incl %0"
+             : "+m" (v->counter));
+}
+
+/**
+ *  * atomic_dec - decrement atomic variable
+ *   * @v: pointer of type Atomic
+ *    *
+ *     * Atomically decrements @v by 1.
+ *      */
+static inline void atomic_dec(Atomic *v)
+{
+    asm volatile("lock; decl %0"
+             : "+m" (v->counter));
+}
+
+/**
+ *  * atomic_dec_and_test - decrement and test
+ *   * @v: pointer of type Atomic
+ *    *
+ *     * Atomically decrements @v by 1 and
+ *      * returns true if the result is 0, or false for all other
+ *       * cases.
+ *        */
+static inline int atomic_dec_and_test(Atomic *v)
+{
+    unsigned char c;
+
+    asm volatile("lock; decl %0; sete %1"
+             : "+m" (v->counter), "=qm" (c)
+             : : "memory");
+    return c != 0;
+}
+
+/**
+ *  * atomic_inc_and_test - increment and test
+ *   * @v: pointer of type Atomic
+ *    *
+ *     * Atomically increments @v by 1
+ *      * and returns true if the result is zero, or false for all
+ *       * other cases.
+ *        */
+static inline int atomic_inc_and_test(Atomic *v)
+{
+    unsigned char c;
+
+    asm volatile("lock; incl %0; sete %1"
+             : "+m" (v->counter), "=qm" (c)
+             : : "memory");
+    return c != 0;
+}
+
+static inline int atomic_add_and_return(int i, Atomic *v)
+{
+    int ret = i;
+
+    asm volatile ("lock; xaddl %0, %1"
+            : "+r" (ret), "+m" (v->counter)
+            : : "memory", "cc");
+
+    return ret + i;
+}
+#endif
+
+#endif
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [Qemu-devel] [PATCH 01/15] atomic: introduce atomic operations
@ 2012-08-08  6:25   ` Liu Ping Fan
  0 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemulist, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

If out of global lock, we will be challenged by SMP in low level,
so need atomic ops.

This file is heavily copied from kernel. Currently, only x86 atomic ops
included, and will be extended for other arch for future.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 include/qemu/atomic.h |  161 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 161 insertions(+), 0 deletions(-)
 create mode 100644 include/qemu/atomic.h

diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
new file mode 100644
index 0000000..8e1fc3e
--- /dev/null
+++ b/include/qemu/atomic.h
@@ -0,0 +1,161 @@
+/*
+ * Simple interface for atomic operations.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef __QEMU_ATOMIC_H
+#define __QEMU_ATOMIC_H 1
+
+typedef struct Atomic {
+    int counter;
+} Atomic;
+
+
+#if defined(__i386__) || defined(__x86_64__)
+
+/**
+ *  * atomic_read - read atomic variable
+ *   * @v: pointer of type Atomic
+ *    *
+ *     * Atomically reads the value of @v.
+ *      */
+static inline int atomic_read(const Atomic *v)
+{
+    return (*(volatile int *)&(v)->counter);
+}
+
+/**
+ *  * atomic_set - set atomic variable
+ *   * @v: pointer of type Atomic
+ *    * @i: required value
+ *     *
+ *      * Atomically sets the value of @v to @i.
+ *       */
+static inline void atomic_set(Atomic *v, int i)
+{
+    v->counter = i;
+}
+
+/**
+ *  * atomic_add - add integer to atomic variable
+ *   * @i: integer value to add
+ *    * @v: pointer of type Atomic
+ *     *
+ *      * Atomically adds @i to @v.
+ *       */
+static inline void atomic_add(int i, Atomic *v)
+{
+    asm volatile("lock; addl %1,%0"
+             : "+m" (v->counter)
+             : "ir" (i));
+}
+
+/**
+ *  * atomic_sub - subtract integer from atomic variable
+ *   * @i: integer value to subtract
+ *    * @v: pointer of type Atomic
+ *     *
+ *      * Atomically subtracts @i from @v.
+ *       */
+static inline void atomic_sub(int i, Atomic *v)
+{
+    asm volatile("lock; subl %1,%0"
+             : "+m" (v->counter)
+             : "ir" (i));
+}
+
+/**
+ *  * atomic_sub_and_test - subtract value from variable and test result
+ *   * @i: integer value to subtract
+ *    * @v: pointer of type Atomic
+ *     *
+ *      * Atomically subtracts @i from @v and returns
+ *       * true if the result is zero, or false for all
+ *        * other cases.
+ *         */
+static inline int atomic_sub_and_test(int i, Atomic *v)
+{
+    unsigned char c;
+
+    asm volatile("lock; subl %2,%0; sete %1"
+             : "+m" (v->counter), "=qm" (c)
+             : "ir" (i) : "memory");
+    return c;
+}
+
+/**
+ *  * atomic_inc - increment atomic variable
+ *   * @v: pointer of type Atomic
+ *    *
+ *     * Atomically increments @v by 1.
+ *      */
+static inline void atomic_inc(Atomic *v)
+{
+    asm volatile("lock; incl %0"
+             : "+m" (v->counter));
+}
+
+/**
+ *  * atomic_dec - decrement atomic variable
+ *   * @v: pointer of type Atomic
+ *    *
+ *     * Atomically decrements @v by 1.
+ *      */
+static inline void atomic_dec(Atomic *v)
+{
+    asm volatile("lock; decl %0"
+             : "+m" (v->counter));
+}
+
+/**
+ *  * atomic_dec_and_test - decrement and test
+ *   * @v: pointer of type Atomic
+ *    *
+ *     * Atomically decrements @v by 1 and
+ *      * returns true if the result is 0, or false for all other
+ *       * cases.
+ *        */
+static inline int atomic_dec_and_test(Atomic *v)
+{
+    unsigned char c;
+
+    asm volatile("lock; decl %0; sete %1"
+             : "+m" (v->counter), "=qm" (c)
+             : : "memory");
+    return c != 0;
+}
+
+/**
+ *  * atomic_inc_and_test - increment and test
+ *   * @v: pointer of type Atomic
+ *    *
+ *     * Atomically increments @v by 1
+ *      * and returns true if the result is zero, or false for all
+ *       * other cases.
+ *        */
+static inline int atomic_inc_and_test(Atomic *v)
+{
+    unsigned char c;
+
+    asm volatile("lock; incl %0; sete %1"
+             : "+m" (v->counter), "=qm" (c)
+             : : "memory");
+    return c != 0;
+}
+
+static inline int atomic_add_and_return(int i, Atomic *v)
+{
+    int ret = i;
+
+    asm volatile ("lock; xaddl %0, %1"
+            : "+r" (ret), "+m" (v->counter)
+            : : "memory", "cc");
+
+    return ret + i;
+}
+#endif
+
+#endif
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [PATCH 02/15] qom: using atomic ops to re-implement object_ref
  2012-08-08  6:25 ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  6:25   ` Liu Ping Fan
  -1 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Anthony Liguori, Avi Kivity, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber,
	qemulist

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 include/qemu/object.h |    3 ++-
 qom/object.c          |   13 +++++--------
 2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/include/qemu/object.h b/include/qemu/object.h
index 8b17776..58db9d0 100644
--- a/include/qemu/object.h
+++ b/include/qemu/object.h
@@ -18,6 +18,7 @@
 #include <stdint.h>
 #include <stdbool.h>
 #include "qemu-queue.h"
+#include "qemu/atomic.h"
 
 struct Visitor;
 struct Error;
@@ -262,7 +263,7 @@ struct Object
     ObjectClass *class;
     GSList *interfaces;
     QTAILQ_HEAD(, ObjectProperty) properties;
-    uint32_t ref;
+    Atomic ref;
     Object *parent;
 };
 
diff --git a/qom/object.c b/qom/object.c
index 00bb3b0..822bdb7 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -378,7 +378,7 @@ void object_finalize(void *data)
     object_deinit(obj, ti);
     object_property_del_all(obj);
 
-    g_assert(obj->ref == 0);
+    g_assert(atomic_read(&obj->ref) == 0);
 }
 
 Object *object_new_with_type(Type type)
@@ -405,7 +405,7 @@ Object *object_new(const char *typename)
 void object_delete(Object *obj)
 {
     object_unref(obj);
-    g_assert(obj->ref == 0);
+    g_assert(atomic_read(&obj->ref) == 0);
     g_free(obj);
 }
 
@@ -639,16 +639,13 @@ GSList *object_class_get_list(const char *implements_type,
 
 void object_ref(Object *obj)
 {
-    obj->ref++;
+    atomic_inc(&obj->ref);
 }
 
 void object_unref(Object *obj)
 {
-    g_assert(obj->ref > 0);
-    obj->ref--;
-
-    /* parent always holds a reference to its children */
-    if (obj->ref == 0) {
+    g_assert(atomic_read(&obj->ref) > 0);
+    if (atomic_dec_and_test(&obj->ref)) {
         object_finalize(obj);
     }
 }
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [Qemu-devel] [PATCH 02/15] qom: using atomic ops to re-implement object_ref
@ 2012-08-08  6:25   ` Liu Ping Fan
  0 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemulist, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 include/qemu/object.h |    3 ++-
 qom/object.c          |   13 +++++--------
 2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/include/qemu/object.h b/include/qemu/object.h
index 8b17776..58db9d0 100644
--- a/include/qemu/object.h
+++ b/include/qemu/object.h
@@ -18,6 +18,7 @@
 #include <stdint.h>
 #include <stdbool.h>
 #include "qemu-queue.h"
+#include "qemu/atomic.h"
 
 struct Visitor;
 struct Error;
@@ -262,7 +263,7 @@ struct Object
     ObjectClass *class;
     GSList *interfaces;
     QTAILQ_HEAD(, ObjectProperty) properties;
-    uint32_t ref;
+    Atomic ref;
     Object *parent;
 };
 
diff --git a/qom/object.c b/qom/object.c
index 00bb3b0..822bdb7 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -378,7 +378,7 @@ void object_finalize(void *data)
     object_deinit(obj, ti);
     object_property_del_all(obj);
 
-    g_assert(obj->ref == 0);
+    g_assert(atomic_read(&obj->ref) == 0);
 }
 
 Object *object_new_with_type(Type type)
@@ -405,7 +405,7 @@ Object *object_new(const char *typename)
 void object_delete(Object *obj)
 {
     object_unref(obj);
-    g_assert(obj->ref == 0);
+    g_assert(atomic_read(&obj->ref) == 0);
     g_free(obj);
 }
 
@@ -639,16 +639,13 @@ GSList *object_class_get_list(const char *implements_type,
 
 void object_ref(Object *obj)
 {
-    obj->ref++;
+    atomic_inc(&obj->ref);
 }
 
 void object_unref(Object *obj)
 {
-    g_assert(obj->ref > 0);
-    obj->ref--;
-
-    /* parent always holds a reference to its children */
-    if (obj->ref == 0) {
+    g_assert(atomic_read(&obj->ref) > 0);
+    if (atomic_dec_and_test(&obj->ref)) {
         object_finalize(obj);
     }
 }
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [PATCH 03/15] qom: introduce reclaimer to release obj
  2012-08-08  6:25 ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  6:25   ` Liu Ping Fan
  -1 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Anthony Liguori, Avi Kivity, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber,
	qemulist

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Collect unused object and release them at caller demand.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 include/qemu/reclaimer.h |   28 ++++++++++++++++++++++
 main-loop.c              |    5 ++++
 qemu-tool.c              |    5 ++++
 qom/Makefile.objs        |    2 +-
 qom/reclaimer.c          |   58 ++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 97 insertions(+), 1 deletions(-)
 create mode 100644 include/qemu/reclaimer.h
 create mode 100644 qom/reclaimer.c

diff --git a/include/qemu/reclaimer.h b/include/qemu/reclaimer.h
new file mode 100644
index 0000000..9307e93
--- /dev/null
+++ b/include/qemu/reclaimer.h
@@ -0,0 +1,28 @@
+/*
+ * QEMU reclaimer
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_RECLAIMER
+#define QEMU_RECLAIMER
+
+typedef void ReleaseHandler(void *opaque);
+typedef struct Chunk {
+    QLIST_ENTRY(Chunk) list;
+    void *opaque;
+    ReleaseHandler *release;
+} Chunk;
+
+typedef struct ChunkHead {
+        struct Chunk *lh_first;
+} ChunkHead;
+
+void reclaimer_enqueue(ChunkHead *head, void *opaque, ReleaseHandler *release);
+void reclaimer_worker(ChunkHead *head);
+void qemu_reclaimer_enqueue(void *opaque, ReleaseHandler *release);
+void qemu_reclaimer(void);
+#endif
diff --git a/main-loop.c b/main-loop.c
index eb3b6e6..be9d095 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -26,6 +26,7 @@
 #include "qemu-timer.h"
 #include "slirp/slirp.h"
 #include "main-loop.h"
+#include "qemu/reclaimer.h"
 
 #ifndef _WIN32
 
@@ -505,5 +506,9 @@ int main_loop_wait(int nonblocking)
        them.  */
     qemu_bh_poll();
 
+    /* ref to device from iohandler/bh/timer do not obey the rules, so delay
+     * reclaiming until now.
+     */
+    qemu_reclaimer();
     return ret;
 }
diff --git a/qemu-tool.c b/qemu-tool.c
index 318c5fc..f5fe319 100644
--- a/qemu-tool.c
+++ b/qemu-tool.c
@@ -21,6 +21,7 @@
 #include "main-loop.h"
 #include "qemu_socket.h"
 #include "slirp/libslirp.h"
+#include "qemu/reclaimer.h"
 
 #include <sys/time.h>
 
@@ -75,6 +76,10 @@ void qemu_mutex_unlock_iothread(void)
 {
 }
 
+void qemu_reclaimer(void)
+{
+}
+
 int use_icount;
 
 void qemu_clock_warp(QEMUClock *clock)
diff --git a/qom/Makefile.objs b/qom/Makefile.objs
index 5ef060a..a579261 100644
--- a/qom/Makefile.objs
+++ b/qom/Makefile.objs
@@ -1,4 +1,4 @@
-qom-obj-y = object.o container.o qom-qobject.o
+qom-obj-y = object.o container.o qom-qobject.o reclaimer.o
 qom-obj-twice-y = cpu.o
 common-obj-y = $(qom-obj-twice-y)
 user-obj-y = $(qom-obj-twice-y)
diff --git a/qom/reclaimer.c b/qom/reclaimer.c
new file mode 100644
index 0000000..6cb53e3
--- /dev/null
+++ b/qom/reclaimer.c
@@ -0,0 +1,58 @@
+/*
+ * QEMU reclaimer
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu-common.h"
+#include "qemu-thread.h"
+#include "main-loop.h"
+#include "qemu-queue.h"
+#include "qemu/reclaimer.h"
+
+static struct QemuMutex reclaimer_lock;
+static QLIST_HEAD(rcl, Chunk) reclaimer_list;
+
+void reclaimer_enqueue(ChunkHead *head, void *opaque, ReleaseHandler *release)
+{
+    Chunk *r = g_malloc0(sizeof(Chunk));
+    r->opaque = opaque;
+    r->release = release;
+    QLIST_INSERT_HEAD_RCU(head, r, list);
+}
+
+void reclaimer_worker(ChunkHead *head)
+{
+    Chunk *cur, *next;
+
+    QLIST_FOREACH_SAFE(cur, head, list, next) {
+        QLIST_REMOVE(cur, list);
+        cur->release(cur->opaque);
+        g_free(cur);
+    }
+}
+
+void qemu_reclaimer_enqueue(void *opaque, ReleaseHandler *release)
+{
+    Chunk *r = g_malloc0(sizeof(Chunk));
+    r->opaque = opaque;
+    r->release = release;
+    qemu_mutex_lock(&reclaimer_lock);
+    QLIST_INSERT_HEAD_RCU(&reclaimer_list, r, list);
+    qemu_mutex_unlock(&reclaimer_lock);
+}
+
+
+void qemu_reclaimer(void)
+{
+    Chunk *cur, *next;
+
+    QLIST_FOREACH_SAFE(cur, &reclaimer_list, list, next) {
+        QLIST_REMOVE(cur, list);
+        cur->release(cur->opaque);
+        g_free(cur);
+    }
+}
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [Qemu-devel] [PATCH 03/15] qom: introduce reclaimer to release obj
@ 2012-08-08  6:25   ` Liu Ping Fan
  0 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemulist, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Collect unused object and release them at caller demand.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 include/qemu/reclaimer.h |   28 ++++++++++++++++++++++
 main-loop.c              |    5 ++++
 qemu-tool.c              |    5 ++++
 qom/Makefile.objs        |    2 +-
 qom/reclaimer.c          |   58 ++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 97 insertions(+), 1 deletions(-)
 create mode 100644 include/qemu/reclaimer.h
 create mode 100644 qom/reclaimer.c

diff --git a/include/qemu/reclaimer.h b/include/qemu/reclaimer.h
new file mode 100644
index 0000000..9307e93
--- /dev/null
+++ b/include/qemu/reclaimer.h
@@ -0,0 +1,28 @@
+/*
+ * QEMU reclaimer
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_RECLAIMER
+#define QEMU_RECLAIMER
+
+typedef void ReleaseHandler(void *opaque);
+typedef struct Chunk {
+    QLIST_ENTRY(Chunk) list;
+    void *opaque;
+    ReleaseHandler *release;
+} Chunk;
+
+typedef struct ChunkHead {
+        struct Chunk *lh_first;
+} ChunkHead;
+
+void reclaimer_enqueue(ChunkHead *head, void *opaque, ReleaseHandler *release);
+void reclaimer_worker(ChunkHead *head);
+void qemu_reclaimer_enqueue(void *opaque, ReleaseHandler *release);
+void qemu_reclaimer(void);
+#endif
diff --git a/main-loop.c b/main-loop.c
index eb3b6e6..be9d095 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -26,6 +26,7 @@
 #include "qemu-timer.h"
 #include "slirp/slirp.h"
 #include "main-loop.h"
+#include "qemu/reclaimer.h"
 
 #ifndef _WIN32
 
@@ -505,5 +506,9 @@ int main_loop_wait(int nonblocking)
        them.  */
     qemu_bh_poll();
 
+    /* ref to device from iohandler/bh/timer do not obey the rules, so delay
+     * reclaiming until now.
+     */
+    qemu_reclaimer();
     return ret;
 }
diff --git a/qemu-tool.c b/qemu-tool.c
index 318c5fc..f5fe319 100644
--- a/qemu-tool.c
+++ b/qemu-tool.c
@@ -21,6 +21,7 @@
 #include "main-loop.h"
 #include "qemu_socket.h"
 #include "slirp/libslirp.h"
+#include "qemu/reclaimer.h"
 
 #include <sys/time.h>
 
@@ -75,6 +76,10 @@ void qemu_mutex_unlock_iothread(void)
 {
 }
 
+void qemu_reclaimer(void)
+{
+}
+
 int use_icount;
 
 void qemu_clock_warp(QEMUClock *clock)
diff --git a/qom/Makefile.objs b/qom/Makefile.objs
index 5ef060a..a579261 100644
--- a/qom/Makefile.objs
+++ b/qom/Makefile.objs
@@ -1,4 +1,4 @@
-qom-obj-y = object.o container.o qom-qobject.o
+qom-obj-y = object.o container.o qom-qobject.o reclaimer.o
 qom-obj-twice-y = cpu.o
 common-obj-y = $(qom-obj-twice-y)
 user-obj-y = $(qom-obj-twice-y)
diff --git a/qom/reclaimer.c b/qom/reclaimer.c
new file mode 100644
index 0000000..6cb53e3
--- /dev/null
+++ b/qom/reclaimer.c
@@ -0,0 +1,58 @@
+/*
+ * QEMU reclaimer
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu-common.h"
+#include "qemu-thread.h"
+#include "main-loop.h"
+#include "qemu-queue.h"
+#include "qemu/reclaimer.h"
+
+static struct QemuMutex reclaimer_lock;
+static QLIST_HEAD(rcl, Chunk) reclaimer_list;
+
+void reclaimer_enqueue(ChunkHead *head, void *opaque, ReleaseHandler *release)
+{
+    Chunk *r = g_malloc0(sizeof(Chunk));
+    r->opaque = opaque;
+    r->release = release;
+    QLIST_INSERT_HEAD_RCU(head, r, list);
+}
+
+void reclaimer_worker(ChunkHead *head)
+{
+    Chunk *cur, *next;
+
+    QLIST_FOREACH_SAFE(cur, head, list, next) {
+        QLIST_REMOVE(cur, list);
+        cur->release(cur->opaque);
+        g_free(cur);
+    }
+}
+
+void qemu_reclaimer_enqueue(void *opaque, ReleaseHandler *release)
+{
+    Chunk *r = g_malloc0(sizeof(Chunk));
+    r->opaque = opaque;
+    r->release = release;
+    qemu_mutex_lock(&reclaimer_lock);
+    QLIST_INSERT_HEAD_RCU(&reclaimer_list, r, list);
+    qemu_mutex_unlock(&reclaimer_lock);
+}
+
+
+void qemu_reclaimer(void)
+{
+    Chunk *cur, *next;
+
+    QLIST_FOREACH_SAFE(cur, &reclaimer_list, list, next) {
+        QLIST_REMOVE(cur, list);
+        cur->release(cur->opaque);
+        g_free(cur);
+    }
+}
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [PATCH 04/15] memory: MemoryRegion topology must be stable when updating
  2012-08-08  6:25 ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  6:25   ` Liu Ping Fan
  -1 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Anthony Liguori, Avi Kivity, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber,
	qemulist

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Using mem_map_lock to protect among updaters. So we can get the intact
snapshot of mem topology -- FlatView & radix-tree.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 exec.c   |    3 +++
 memory.c |   22 ++++++++++++++++++++++
 memory.h |    2 ++
 3 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/exec.c b/exec.c
index 8244d54..0e29ef9 100644
--- a/exec.c
+++ b/exec.c
@@ -210,6 +210,8 @@ static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
    The bottom level has pointers to MemoryRegionSections.  */
 static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
 
+QemuMutex mem_map_lock;
+
 static void io_mem_init(void);
 static void memory_map_init(void);
 
@@ -637,6 +639,7 @@ void cpu_exec_init_all(void)
 #if !defined(CONFIG_USER_ONLY)
     memory_map_init();
     io_mem_init();
+    qemu_mutex_init(&mem_map_lock);
 #endif
 }
 
diff --git a/memory.c b/memory.c
index aab4a31..5986532 100644
--- a/memory.c
+++ b/memory.c
@@ -761,7 +761,9 @@ void memory_region_transaction_commit(void)
     assert(memory_region_transaction_depth);
     --memory_region_transaction_depth;
     if (!memory_region_transaction_depth && memory_region_update_pending) {
+        qemu_mutex_lock(&mem_map_lock);
         memory_region_update_topology(NULL);
+        qemu_mutex_unlock(&mem_map_lock);
     }
 }
 
@@ -1069,8 +1071,10 @@ void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
 {
     uint8_t mask = 1 << client;
 
+    qemu_mutex_lock(&mem_map_lock);
     mr->dirty_log_mask = (mr->dirty_log_mask & ~mask) | (log * mask);
     memory_region_update_topology(mr);
+    qemu_mutex_unlock(&mem_map_lock);
 }
 
 bool memory_region_get_dirty(MemoryRegion *mr, target_phys_addr_t addr,
@@ -1103,8 +1107,10 @@ void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
 void memory_region_set_readonly(MemoryRegion *mr, bool readonly)
 {
     if (mr->readonly != readonly) {
+        qemu_mutex_lock(&mem_map_lock);
         mr->readonly = readonly;
         memory_region_update_topology(mr);
+        qemu_mutex_unlock(&mem_map_lock);
     }
 }
 
@@ -1112,7 +1118,9 @@ void memory_region_rom_device_set_readable(MemoryRegion *mr, bool readable)
 {
     if (mr->readable != readable) {
         mr->readable = readable;
+        qemu_mutex_lock(&mem_map_lock);
         memory_region_update_topology(mr);
+        qemu_mutex_unlock(&mem_map_lock);
     }
 }
 
@@ -1206,6 +1214,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
     };
     unsigned i;
 
+    qemu_mutex_lock(&mem_map_lock);
     for (i = 0; i < mr->ioeventfd_nb; ++i) {
         if (memory_region_ioeventfd_before(mrfd, mr->ioeventfds[i])) {
             break;
@@ -1218,6 +1227,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
             sizeof(*mr->ioeventfds) * (mr->ioeventfd_nb-1 - i));
     mr->ioeventfds[i] = mrfd;
     memory_region_update_topology(mr);
+    qemu_mutex_unlock(&mem_map_lock);
 }
 
 void memory_region_del_eventfd(MemoryRegion *mr,
@@ -1236,6 +1246,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
     };
     unsigned i;
 
+    qemu_mutex_lock(&mem_map_lock);
     for (i = 0; i < mr->ioeventfd_nb; ++i) {
         if (memory_region_ioeventfd_equal(mrfd, mr->ioeventfds[i])) {
             break;
@@ -1248,6 +1259,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
     mr->ioeventfds = g_realloc(mr->ioeventfds,
                                   sizeof(*mr->ioeventfds)*mr->ioeventfd_nb + 1);
     memory_region_update_topology(mr);
+    qemu_mutex_unlock(&mem_map_lock);
 }
 
 static void memory_region_add_subregion_common(MemoryRegion *mr,
@@ -1259,6 +1271,8 @@ static void memory_region_add_subregion_common(MemoryRegion *mr,
     assert(!subregion->parent);
     subregion->parent = mr;
     subregion->addr = offset;
+
+    qemu_mutex_lock(&mem_map_lock);
     QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
         if (subregion->may_overlap || other->may_overlap) {
             continue;
@@ -1289,6 +1303,7 @@ static void memory_region_add_subregion_common(MemoryRegion *mr,
     QTAILQ_INSERT_TAIL(&mr->subregions, subregion, subregions_link);
 done:
     memory_region_update_topology(mr);
+    qemu_mutex_unlock(&mem_map_lock);
 }
 
 
@@ -1316,8 +1331,11 @@ void memory_region_del_subregion(MemoryRegion *mr,
 {
     assert(subregion->parent == mr);
     subregion->parent = NULL;
+
+    qemu_mutex_lock(&mem_map_lock);
     QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
     memory_region_update_topology(mr);
+    qemu_mutex_unlock(&mem_map_lock);
 }
 
 void memory_region_set_enabled(MemoryRegion *mr, bool enabled)
@@ -1325,8 +1343,10 @@ void memory_region_set_enabled(MemoryRegion *mr, bool enabled)
     if (enabled == mr->enabled) {
         return;
     }
+    qemu_mutex_lock(&mem_map_lock);
     mr->enabled = enabled;
     memory_region_update_topology(NULL);
+    qemu_mutex_unlock(&mem_map_lock);
 }
 
 void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr)
@@ -1361,7 +1381,9 @@ void memory_region_set_alias_offset(MemoryRegion *mr, target_phys_addr_t offset)
         return;
     }
 
+    qemu_mutex_lock(&mem_map_lock);
     memory_region_update_topology(mr);
+    qemu_mutex_unlock(&mem_map_lock);
 }
 
 ram_addr_t memory_region_get_ram_addr(MemoryRegion *mr)
diff --git a/memory.h b/memory.h
index 740c48e..fe6aefa 100644
--- a/memory.h
+++ b/memory.h
@@ -25,6 +25,7 @@
 #include "iorange.h"
 #include "ioport.h"
 #include "int128.h"
+#include "qemu-thread.h"
 
 typedef struct MemoryRegionOps MemoryRegionOps;
 typedef struct MemoryRegion MemoryRegion;
@@ -207,6 +208,7 @@ struct MemoryListener {
     QTAILQ_ENTRY(MemoryListener) link;
 };
 
+extern QemuMutex mem_map_lock;
 /**
  * memory_region_init: Initialize a memory region
  *
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [Qemu-devel] [PATCH 04/15] memory: MemoryRegion topology must be stable when updating
@ 2012-08-08  6:25   ` Liu Ping Fan
  0 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemulist, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Using mem_map_lock to protect among updaters. So we can get the intact
snapshot of mem topology -- FlatView & radix-tree.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 exec.c   |    3 +++
 memory.c |   22 ++++++++++++++++++++++
 memory.h |    2 ++
 3 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/exec.c b/exec.c
index 8244d54..0e29ef9 100644
--- a/exec.c
+++ b/exec.c
@@ -210,6 +210,8 @@ static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
    The bottom level has pointers to MemoryRegionSections.  */
 static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
 
+QemuMutex mem_map_lock;
+
 static void io_mem_init(void);
 static void memory_map_init(void);
 
@@ -637,6 +639,7 @@ void cpu_exec_init_all(void)
 #if !defined(CONFIG_USER_ONLY)
     memory_map_init();
     io_mem_init();
+    qemu_mutex_init(&mem_map_lock);
 #endif
 }
 
diff --git a/memory.c b/memory.c
index aab4a31..5986532 100644
--- a/memory.c
+++ b/memory.c
@@ -761,7 +761,9 @@ void memory_region_transaction_commit(void)
     assert(memory_region_transaction_depth);
     --memory_region_transaction_depth;
     if (!memory_region_transaction_depth && memory_region_update_pending) {
+        qemu_mutex_lock(&mem_map_lock);
         memory_region_update_topology(NULL);
+        qemu_mutex_unlock(&mem_map_lock);
     }
 }
 
@@ -1069,8 +1071,10 @@ void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
 {
     uint8_t mask = 1 << client;
 
+    qemu_mutex_lock(&mem_map_lock);
     mr->dirty_log_mask = (mr->dirty_log_mask & ~mask) | (log * mask);
     memory_region_update_topology(mr);
+    qemu_mutex_unlock(&mem_map_lock);
 }
 
 bool memory_region_get_dirty(MemoryRegion *mr, target_phys_addr_t addr,
@@ -1103,8 +1107,10 @@ void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
 void memory_region_set_readonly(MemoryRegion *mr, bool readonly)
 {
     if (mr->readonly != readonly) {
+        qemu_mutex_lock(&mem_map_lock);
         mr->readonly = readonly;
         memory_region_update_topology(mr);
+        qemu_mutex_unlock(&mem_map_lock);
     }
 }
 
@@ -1112,7 +1118,9 @@ void memory_region_rom_device_set_readable(MemoryRegion *mr, bool readable)
 {
     if (mr->readable != readable) {
         mr->readable = readable;
+        qemu_mutex_lock(&mem_map_lock);
         memory_region_update_topology(mr);
+        qemu_mutex_unlock(&mem_map_lock);
     }
 }
 
@@ -1206,6 +1214,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
     };
     unsigned i;
 
+    qemu_mutex_lock(&mem_map_lock);
     for (i = 0; i < mr->ioeventfd_nb; ++i) {
         if (memory_region_ioeventfd_before(mrfd, mr->ioeventfds[i])) {
             break;
@@ -1218,6 +1227,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
             sizeof(*mr->ioeventfds) * (mr->ioeventfd_nb-1 - i));
     mr->ioeventfds[i] = mrfd;
     memory_region_update_topology(mr);
+    qemu_mutex_unlock(&mem_map_lock);
 }
 
 void memory_region_del_eventfd(MemoryRegion *mr,
@@ -1236,6 +1246,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
     };
     unsigned i;
 
+    qemu_mutex_lock(&mem_map_lock);
     for (i = 0; i < mr->ioeventfd_nb; ++i) {
         if (memory_region_ioeventfd_equal(mrfd, mr->ioeventfds[i])) {
             break;
@@ -1248,6 +1259,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
     mr->ioeventfds = g_realloc(mr->ioeventfds,
                                   sizeof(*mr->ioeventfds)*mr->ioeventfd_nb + 1);
     memory_region_update_topology(mr);
+    qemu_mutex_unlock(&mem_map_lock);
 }
 
 static void memory_region_add_subregion_common(MemoryRegion *mr,
@@ -1259,6 +1271,8 @@ static void memory_region_add_subregion_common(MemoryRegion *mr,
     assert(!subregion->parent);
     subregion->parent = mr;
     subregion->addr = offset;
+
+    qemu_mutex_lock(&mem_map_lock);
     QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
         if (subregion->may_overlap || other->may_overlap) {
             continue;
@@ -1289,6 +1303,7 @@ static void memory_region_add_subregion_common(MemoryRegion *mr,
     QTAILQ_INSERT_TAIL(&mr->subregions, subregion, subregions_link);
 done:
     memory_region_update_topology(mr);
+    qemu_mutex_unlock(&mem_map_lock);
 }
 
 
@@ -1316,8 +1331,11 @@ void memory_region_del_subregion(MemoryRegion *mr,
 {
     assert(subregion->parent == mr);
     subregion->parent = NULL;
+
+    qemu_mutex_lock(&mem_map_lock);
     QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
     memory_region_update_topology(mr);
+    qemu_mutex_unlock(&mem_map_lock);
 }
 
 void memory_region_set_enabled(MemoryRegion *mr, bool enabled)
@@ -1325,8 +1343,10 @@ void memory_region_set_enabled(MemoryRegion *mr, bool enabled)
     if (enabled == mr->enabled) {
         return;
     }
+    qemu_mutex_lock(&mem_map_lock);
     mr->enabled = enabled;
     memory_region_update_topology(NULL);
+    qemu_mutex_unlock(&mem_map_lock);
 }
 
 void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr)
@@ -1361,7 +1381,9 @@ void memory_region_set_alias_offset(MemoryRegion *mr, target_phys_addr_t offset)
         return;
     }
 
+    qemu_mutex_lock(&mem_map_lock);
     memory_region_update_topology(mr);
+    qemu_mutex_unlock(&mem_map_lock);
 }
 
 ram_addr_t memory_region_get_ram_addr(MemoryRegion *mr)
diff --git a/memory.h b/memory.h
index 740c48e..fe6aefa 100644
--- a/memory.h
+++ b/memory.h
@@ -25,6 +25,7 @@
 #include "iorange.h"
 #include "ioport.h"
 #include "int128.h"
+#include "qemu-thread.h"
 
 typedef struct MemoryRegionOps MemoryRegionOps;
 typedef struct MemoryRegion MemoryRegion;
@@ -207,6 +208,7 @@ struct MemoryListener {
     QTAILQ_ENTRY(MemoryListener) link;
 };
 
+extern QemuMutex mem_map_lock;
 /**
  * memory_region_init: Initialize a memory region
  *
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [PATCH 05/15] memory: introduce life_ops to MemoryRegion
  2012-08-08  6:25 ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  6:25   ` Liu Ping Fan
  -1 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Anthony Liguori, Avi Kivity, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber,
	qemulist

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

The types of referred object by MemoryRegion are variable, ex,
another mr, DeviceState, or other struct defined by drivers.
So the refer/unrefer may be different by drivers.

Using this ops, we can mange the backend object.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/ide/piix.c |    6 ++--
 hw/pckbd.c    |    6 +++-
 hw/serial.c   |    2 +-
 ioport.c      |    3 +-
 memory.c      |   69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 memory.h      |   16 +++++++++++++
 6 files changed, 94 insertions(+), 8 deletions(-)

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index f5a74c2..bdd70b1 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -93,11 +93,11 @@ static void bmdma_setup_bar(PCIIDEState *d)
     for(i = 0;i < 2; i++) {
         BMDMAState *bm = &d->bmdma[i];
 
-        memory_region_init_io(&bm->extra_io, &piix_bmdma_ops, bm,
+        memory_region_init_io_ext(&bm->extra_io, &piix_bmdma_ops, NULL, bm,
                               "piix-bmdma", 4);
         memory_region_add_subregion(&d->bmdma_bar, i * 8, &bm->extra_io);
-        memory_region_init_io(&bm->addr_ioport, &bmdma_addr_ioport_ops, bm,
-                              "bmdma", 4);
+        memory_region_init_io_ext(&bm->addr_ioport, &bmdma_addr_ioport_ops,
+                              NULL, bm, "bmdma", 4);
         memory_region_add_subregion(&d->bmdma_bar, i * 8 + 4, &bm->addr_ioport);
     }
 }
diff --git a/hw/pckbd.c b/hw/pckbd.c
index 69857ba..de3c46d 100644
--- a/hw/pckbd.c
+++ b/hw/pckbd.c
@@ -485,10 +485,12 @@ static int i8042_initfn(ISADevice *dev)
     isa_init_irq(dev, &s->irq_kbd, 1);
     isa_init_irq(dev, &s->irq_mouse, 12);
 
-    memory_region_init_io(isa_s->io + 0, &i8042_data_ops, s, "i8042-data", 1);
+    memory_region_init_io_ext(isa_s->io + 0, &i8042_data_ops, NULL, s,
+                                "i8042-data", 1);
     isa_register_ioport(dev, isa_s->io + 0, 0x60);
 
-    memory_region_init_io(isa_s->io + 1, &i8042_cmd_ops, s, "i8042-cmd", 1);
+    memory_region_init_io_ext(isa_s->io + 1, &i8042_cmd_ops, NULL, s,
+                                "i8042-cmd", 1);
     isa_register_ioport(dev, isa_s->io + 1, 0x64);
 
     s->kbd = ps2_kbd_init(kbd_update_kbd_irq, s);
diff --git a/hw/serial.c b/hw/serial.c
index a421d1e..e992c6a 100644
--- a/hw/serial.c
+++ b/hw/serial.c
@@ -794,7 +794,7 @@ static int serial_isa_initfn(ISADevice *dev)
     serial_init_core(s);
     qdev_set_legacy_instance_id(&dev->qdev, isa->iobase, 3);
 
-    memory_region_init_io(&s->io, &serial_io_ops, s, "serial", 8);
+    memory_region_init_io_ext(&s->io, &serial_io_ops, NULL, s, "serial", 8);
     isa_register_ioport(dev, &s->io, isa->iobase);
     return 0;
 }
diff --git a/ioport.c b/ioport.c
index 6e4ca0d..768e271 100644
--- a/ioport.c
+++ b/ioport.c
@@ -384,7 +384,8 @@ static void portio_list_add_1(PortioList *piolist,
      * Use an alias so that the callback is called with an absolute address,
      * rather than an offset relative to to start + off_low.
      */
-    memory_region_init_io(region, ops, piolist->opaque, piolist->name,
+    memory_region_init_io_ext(region, ops, NULL, piolist->opaque,
+                          piolist->name,
                           INT64_MAX);
     memory_region_init_alias(alias, piolist->name,
                              region, start + off_low, off_high - off_low);
diff --git a/memory.c b/memory.c
index 5986532..80c7529 100644
--- a/memory.c
+++ b/memory.c
@@ -19,6 +19,7 @@
 #include "bitops.h"
 #include "kvm.h"
 #include <assert.h>
+#include "hw/qdev.h"
 
 #define WANT_EXEC_OBSOLETE
 #include "exec-obsolete.h"
@@ -799,6 +800,7 @@ static bool memory_region_wrong_endianness(MemoryRegion *mr)
 #endif
 }
 
+static MemoryRegionLifeOps nops;
 void memory_region_init(MemoryRegion *mr,
                         const char *name,
                         uint64_t size)
@@ -809,6 +811,7 @@ void memory_region_init(MemoryRegion *mr,
     if (size == UINT64_MAX) {
         mr->size = int128_2_64();
     }
+    mr->life_ops = &nops;
     mr->addr = 0;
     mr->subpage = false;
     mr->enabled = true;
@@ -931,6 +934,66 @@ static void memory_region_dispatch_write(MemoryRegion *mr,
                               memory_region_write_accessor, mr);
 }
 
+static void mr_object_get(MemoryRegion *mr)
+{
+    object_dynamic_cast_assert(OBJECT(mr->opaque), TYPE_DEVICE);
+    object_ref(OBJECT(mr->opaque));
+}
+
+static void mr_object_put(MemoryRegion *mr)
+{
+    object_unref(OBJECT(mr->opaque));
+}
+
+static MemoryRegionLifeOps obj_ops = {
+    .get = mr_object_get,
+    .put = mr_object_put,
+};
+
+static void mr_alias_get(MemoryRegion *mr)
+{
+}
+
+static void mr_alias_put(MemoryRegion *mr)
+{
+}
+
+static MemoryRegionLifeOps alias_ops = {
+    .get = mr_alias_get,
+    .put = mr_alias_put,
+};
+
+static void mr_nop_get(MemoryRegion *mr)
+{
+}
+
+static void mr_nop_put(MemoryRegion *mr)
+{
+}
+
+static MemoryRegionLifeOps nops = {
+    .get = mr_nop_get,
+    .put = mr_nop_put,
+};
+
+void memory_region_init_io_ext(MemoryRegion *mr,
+                           const MemoryRegionOps *ops,
+                           MemoryRegionLifeOps *life_ops,
+                           void *opaque,
+                           const char *name,
+                           uint64_t size)
+{
+    memory_region_init(mr, name, size);
+    mr->ops = ops;
+    if (life_ops == NULL) {
+        mr->life_ops = &nops;
+    }
+    mr->opaque = opaque;
+    mr->terminates = true;
+    mr->destructor = memory_region_destructor_iomem;
+    mr->ram_addr = ~(ram_addr_t)0;
+}
+
 void memory_region_init_io(MemoryRegion *mr,
                            const MemoryRegionOps *ops,
                            void *opaque,
@@ -939,6 +1002,9 @@ void memory_region_init_io(MemoryRegion *mr,
 {
     memory_region_init(mr, name, size);
     mr->ops = ops;
+    if (opaque != NULL) {
+        mr->life_ops = &obj_ops;
+    }
     mr->opaque = opaque;
     mr->terminates = true;
     mr->destructor = memory_region_destructor_iomem;
@@ -975,6 +1041,7 @@ void memory_region_init_alias(MemoryRegion *mr,
                               uint64_t size)
 {
     memory_region_init(mr, name, size);
+    mr->life_ops = &alias_ops;
     mr->alias = orig;
     mr->alias_offset = offset;
 }
@@ -1027,7 +1094,7 @@ void memory_region_init_reservation(MemoryRegion *mr,
                                     const char *name,
                                     uint64_t size)
 {
-    memory_region_init_io(mr, &reservation_ops, mr, name, size);
+    memory_region_init_io_ext(mr, &reservation_ops, &nops, mr, name, size);
 }
 
 void memory_region_destroy(MemoryRegion *mr)
diff --git a/memory.h b/memory.h
index fe6aefa..8fb543b 100644
--- a/memory.h
+++ b/memory.h
@@ -28,6 +28,7 @@
 #include "qemu-thread.h"
 
 typedef struct MemoryRegionOps MemoryRegionOps;
+typedef struct MemoryRegionLifeOps MemoryRegionLifeOps;
 typedef struct MemoryRegion MemoryRegion;
 typedef struct MemoryRegionPortio MemoryRegionPortio;
 typedef struct MemoryRegionMmio MemoryRegionMmio;
@@ -52,6 +53,11 @@ struct MemoryRegionIORange {
     target_phys_addr_t offset;
 };
 
+struct MemoryRegionLifeOps {
+    void (*get)(MemoryRegion *mr);
+    void (*put)(MemoryRegion *mr);
+};
+
 /*
  * Memory region callbacks
  */
@@ -120,6 +126,7 @@ typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
 struct MemoryRegion {
     /* All fields are private - violators will be prosecuted */
     const MemoryRegionOps *ops;
+    MemoryRegionLifeOps *life_ops;
     void *opaque;
     MemoryRegion *parent;
     Int128 size;
@@ -209,6 +216,7 @@ struct MemoryListener {
 };
 
 extern QemuMutex mem_map_lock;
+
 /**
  * memory_region_init: Initialize a memory region
  *
@@ -222,6 +230,14 @@ extern QemuMutex mem_map_lock;
 void memory_region_init(MemoryRegion *mr,
                         const char *name,
                         uint64_t size);
+
+void memory_region_init_io_ext(MemoryRegion *mr,
+                           const MemoryRegionOps *ops,
+                           MemoryRegionLifeOps *life_ops,
+                           void *opaque,
+                           const char *name,
+                           uint64_t size);
+
 /**
  * memory_region_init_io: Initialize an I/O memory region.
  *
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [Qemu-devel] [PATCH 05/15] memory: introduce life_ops to MemoryRegion
@ 2012-08-08  6:25   ` Liu Ping Fan
  0 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemulist, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

The types of referred object by MemoryRegion are variable, ex,
another mr, DeviceState, or other struct defined by drivers.
So the refer/unrefer may be different by drivers.

Using this ops, we can mange the backend object.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/ide/piix.c |    6 ++--
 hw/pckbd.c    |    6 +++-
 hw/serial.c   |    2 +-
 ioport.c      |    3 +-
 memory.c      |   69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 memory.h      |   16 +++++++++++++
 6 files changed, 94 insertions(+), 8 deletions(-)

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index f5a74c2..bdd70b1 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -93,11 +93,11 @@ static void bmdma_setup_bar(PCIIDEState *d)
     for(i = 0;i < 2; i++) {
         BMDMAState *bm = &d->bmdma[i];
 
-        memory_region_init_io(&bm->extra_io, &piix_bmdma_ops, bm,
+        memory_region_init_io_ext(&bm->extra_io, &piix_bmdma_ops, NULL, bm,
                               "piix-bmdma", 4);
         memory_region_add_subregion(&d->bmdma_bar, i * 8, &bm->extra_io);
-        memory_region_init_io(&bm->addr_ioport, &bmdma_addr_ioport_ops, bm,
-                              "bmdma", 4);
+        memory_region_init_io_ext(&bm->addr_ioport, &bmdma_addr_ioport_ops,
+                              NULL, bm, "bmdma", 4);
         memory_region_add_subregion(&d->bmdma_bar, i * 8 + 4, &bm->addr_ioport);
     }
 }
diff --git a/hw/pckbd.c b/hw/pckbd.c
index 69857ba..de3c46d 100644
--- a/hw/pckbd.c
+++ b/hw/pckbd.c
@@ -485,10 +485,12 @@ static int i8042_initfn(ISADevice *dev)
     isa_init_irq(dev, &s->irq_kbd, 1);
     isa_init_irq(dev, &s->irq_mouse, 12);
 
-    memory_region_init_io(isa_s->io + 0, &i8042_data_ops, s, "i8042-data", 1);
+    memory_region_init_io_ext(isa_s->io + 0, &i8042_data_ops, NULL, s,
+                                "i8042-data", 1);
     isa_register_ioport(dev, isa_s->io + 0, 0x60);
 
-    memory_region_init_io(isa_s->io + 1, &i8042_cmd_ops, s, "i8042-cmd", 1);
+    memory_region_init_io_ext(isa_s->io + 1, &i8042_cmd_ops, NULL, s,
+                                "i8042-cmd", 1);
     isa_register_ioport(dev, isa_s->io + 1, 0x64);
 
     s->kbd = ps2_kbd_init(kbd_update_kbd_irq, s);
diff --git a/hw/serial.c b/hw/serial.c
index a421d1e..e992c6a 100644
--- a/hw/serial.c
+++ b/hw/serial.c
@@ -794,7 +794,7 @@ static int serial_isa_initfn(ISADevice *dev)
     serial_init_core(s);
     qdev_set_legacy_instance_id(&dev->qdev, isa->iobase, 3);
 
-    memory_region_init_io(&s->io, &serial_io_ops, s, "serial", 8);
+    memory_region_init_io_ext(&s->io, &serial_io_ops, NULL, s, "serial", 8);
     isa_register_ioport(dev, &s->io, isa->iobase);
     return 0;
 }
diff --git a/ioport.c b/ioport.c
index 6e4ca0d..768e271 100644
--- a/ioport.c
+++ b/ioport.c
@@ -384,7 +384,8 @@ static void portio_list_add_1(PortioList *piolist,
      * Use an alias so that the callback is called with an absolute address,
      * rather than an offset relative to to start + off_low.
      */
-    memory_region_init_io(region, ops, piolist->opaque, piolist->name,
+    memory_region_init_io_ext(region, ops, NULL, piolist->opaque,
+                          piolist->name,
                           INT64_MAX);
     memory_region_init_alias(alias, piolist->name,
                              region, start + off_low, off_high - off_low);
diff --git a/memory.c b/memory.c
index 5986532..80c7529 100644
--- a/memory.c
+++ b/memory.c
@@ -19,6 +19,7 @@
 #include "bitops.h"
 #include "kvm.h"
 #include <assert.h>
+#include "hw/qdev.h"
 
 #define WANT_EXEC_OBSOLETE
 #include "exec-obsolete.h"
@@ -799,6 +800,7 @@ static bool memory_region_wrong_endianness(MemoryRegion *mr)
 #endif
 }
 
+static MemoryRegionLifeOps nops;
 void memory_region_init(MemoryRegion *mr,
                         const char *name,
                         uint64_t size)
@@ -809,6 +811,7 @@ void memory_region_init(MemoryRegion *mr,
     if (size == UINT64_MAX) {
         mr->size = int128_2_64();
     }
+    mr->life_ops = &nops;
     mr->addr = 0;
     mr->subpage = false;
     mr->enabled = true;
@@ -931,6 +934,66 @@ static void memory_region_dispatch_write(MemoryRegion *mr,
                               memory_region_write_accessor, mr);
 }
 
+static void mr_object_get(MemoryRegion *mr)
+{
+    object_dynamic_cast_assert(OBJECT(mr->opaque), TYPE_DEVICE);
+    object_ref(OBJECT(mr->opaque));
+}
+
+static void mr_object_put(MemoryRegion *mr)
+{
+    object_unref(OBJECT(mr->opaque));
+}
+
+static MemoryRegionLifeOps obj_ops = {
+    .get = mr_object_get,
+    .put = mr_object_put,
+};
+
+static void mr_alias_get(MemoryRegion *mr)
+{
+}
+
+static void mr_alias_put(MemoryRegion *mr)
+{
+}
+
+static MemoryRegionLifeOps alias_ops = {
+    .get = mr_alias_get,
+    .put = mr_alias_put,
+};
+
+static void mr_nop_get(MemoryRegion *mr)
+{
+}
+
+static void mr_nop_put(MemoryRegion *mr)
+{
+}
+
+static MemoryRegionLifeOps nops = {
+    .get = mr_nop_get,
+    .put = mr_nop_put,
+};
+
+void memory_region_init_io_ext(MemoryRegion *mr,
+                           const MemoryRegionOps *ops,
+                           MemoryRegionLifeOps *life_ops,
+                           void *opaque,
+                           const char *name,
+                           uint64_t size)
+{
+    memory_region_init(mr, name, size);
+    mr->ops = ops;
+    if (life_ops == NULL) {
+        mr->life_ops = &nops;
+    }
+    mr->opaque = opaque;
+    mr->terminates = true;
+    mr->destructor = memory_region_destructor_iomem;
+    mr->ram_addr = ~(ram_addr_t)0;
+}
+
 void memory_region_init_io(MemoryRegion *mr,
                            const MemoryRegionOps *ops,
                            void *opaque,
@@ -939,6 +1002,9 @@ void memory_region_init_io(MemoryRegion *mr,
 {
     memory_region_init(mr, name, size);
     mr->ops = ops;
+    if (opaque != NULL) {
+        mr->life_ops = &obj_ops;
+    }
     mr->opaque = opaque;
     mr->terminates = true;
     mr->destructor = memory_region_destructor_iomem;
@@ -975,6 +1041,7 @@ void memory_region_init_alias(MemoryRegion *mr,
                               uint64_t size)
 {
     memory_region_init(mr, name, size);
+    mr->life_ops = &alias_ops;
     mr->alias = orig;
     mr->alias_offset = offset;
 }
@@ -1027,7 +1094,7 @@ void memory_region_init_reservation(MemoryRegion *mr,
                                     const char *name,
                                     uint64_t size)
 {
-    memory_region_init_io(mr, &reservation_ops, mr, name, size);
+    memory_region_init_io_ext(mr, &reservation_ops, &nops, mr, name, size);
 }
 
 void memory_region_destroy(MemoryRegion *mr)
diff --git a/memory.h b/memory.h
index fe6aefa..8fb543b 100644
--- a/memory.h
+++ b/memory.h
@@ -28,6 +28,7 @@
 #include "qemu-thread.h"
 
 typedef struct MemoryRegionOps MemoryRegionOps;
+typedef struct MemoryRegionLifeOps MemoryRegionLifeOps;
 typedef struct MemoryRegion MemoryRegion;
 typedef struct MemoryRegionPortio MemoryRegionPortio;
 typedef struct MemoryRegionMmio MemoryRegionMmio;
@@ -52,6 +53,11 @@ struct MemoryRegionIORange {
     target_phys_addr_t offset;
 };
 
+struct MemoryRegionLifeOps {
+    void (*get)(MemoryRegion *mr);
+    void (*put)(MemoryRegion *mr);
+};
+
 /*
  * Memory region callbacks
  */
@@ -120,6 +126,7 @@ typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
 struct MemoryRegion {
     /* All fields are private - violators will be prosecuted */
     const MemoryRegionOps *ops;
+    MemoryRegionLifeOps *life_ops;
     void *opaque;
     MemoryRegion *parent;
     Int128 size;
@@ -209,6 +216,7 @@ struct MemoryListener {
 };
 
 extern QemuMutex mem_map_lock;
+
 /**
  * memory_region_init: Initialize a memory region
  *
@@ -222,6 +230,14 @@ extern QemuMutex mem_map_lock;
 void memory_region_init(MemoryRegion *mr,
                         const char *name,
                         uint64_t size);
+
+void memory_region_init_io_ext(MemoryRegion *mr,
+                           const MemoryRegionOps *ops,
+                           MemoryRegionLifeOps *life_ops,
+                           void *opaque,
+                           const char *name,
+                           uint64_t size);
+
 /**
  * memory_region_init_io: Initialize an I/O memory region.
  *
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [PATCH 06/15] memory: use refcnt to manage MemoryRegion
  2012-08-08  6:25 ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  6:25   ` Liu Ping Fan
  -1 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Anthony Liguori, Avi Kivity, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber,
	qemulist

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Using refcnt for mr, so we can separate mr's life cycle management
from refered object.
  When mr->ref 0->1, inc the refered object.
  When mr->ref 1->0, dec the refered object.

The refered object can be DeviceStae, another mr, or other opaque.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 memory.c |   18 ++++++++++++++++++
 memory.h |    5 +++++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/memory.c b/memory.c
index 80c7529..5dc8b59 100644
--- a/memory.c
+++ b/memory.c
@@ -811,6 +811,7 @@ void memory_region_init(MemoryRegion *mr,
     if (size == UINT64_MAX) {
         mr->size = int128_2_64();
     }
+    atomic_set(&mr->ref, 0);
     mr->life_ops = &nops;
     mr->addr = 0;
     mr->subpage = false;
@@ -1090,6 +1091,23 @@ static const MemoryRegionOps reservation_ops = {
     .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
+void memory_region_get(MemoryRegion *mr)
+{
+    if (atomic_add_and_return(1, &mr->ref) == 1) {
+        mr->life_ops->get(mr);
+    }
+}
+
+void memory_region_put(MemoryRegion *mr)
+{
+    assert(atomic_read(&mr->ref) > 0);
+
+    if (atomic_dec_and_test(&mr->ref)) {
+        /* to fix, using call_rcu( ,release) */
+        mr->life_ops->put(mr);
+    }
+}
+
 void memory_region_init_reservation(MemoryRegion *mr,
                                     const char *name,
                                     uint64_t size)
diff --git a/memory.h b/memory.h
index 8fb543b..740f018 100644
--- a/memory.h
+++ b/memory.h
@@ -18,6 +18,7 @@
 
 #include <stdint.h>
 #include <stdbool.h>
+#include "qemu/atomic.h"
 #include "qemu-common.h"
 #include "cpu-common.h"
 #include "targphys.h"
@@ -26,6 +27,7 @@
 #include "ioport.h"
 #include "int128.h"
 #include "qemu-thread.h"
+#include "qemu/reclaimer.h"
 
 typedef struct MemoryRegionOps MemoryRegionOps;
 typedef struct MemoryRegionLifeOps MemoryRegionLifeOps;
@@ -126,6 +128,7 @@ typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
 struct MemoryRegion {
     /* All fields are private - violators will be prosecuted */
     const MemoryRegionOps *ops;
+    Atomic ref;
     MemoryRegionLifeOps *life_ops;
     void *opaque;
     MemoryRegion *parent;
@@ -766,6 +769,8 @@ void memory_global_dirty_log_stop(void);
 
 void mtree_info(fprintf_function mon_printf, void *f);
 
+void memory_region_get(MemoryRegion *mr);
+void memory_region_put(MemoryRegion *mr);
 #endif
 
 #endif
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [Qemu-devel] [PATCH 06/15] memory: use refcnt to manage MemoryRegion
@ 2012-08-08  6:25   ` Liu Ping Fan
  0 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemulist, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Using refcnt for mr, so we can separate mr's life cycle management
from refered object.
  When mr->ref 0->1, inc the refered object.
  When mr->ref 1->0, dec the refered object.

The refered object can be DeviceStae, another mr, or other opaque.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 memory.c |   18 ++++++++++++++++++
 memory.h |    5 +++++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/memory.c b/memory.c
index 80c7529..5dc8b59 100644
--- a/memory.c
+++ b/memory.c
@@ -811,6 +811,7 @@ void memory_region_init(MemoryRegion *mr,
     if (size == UINT64_MAX) {
         mr->size = int128_2_64();
     }
+    atomic_set(&mr->ref, 0);
     mr->life_ops = &nops;
     mr->addr = 0;
     mr->subpage = false;
@@ -1090,6 +1091,23 @@ static const MemoryRegionOps reservation_ops = {
     .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
+void memory_region_get(MemoryRegion *mr)
+{
+    if (atomic_add_and_return(1, &mr->ref) == 1) {
+        mr->life_ops->get(mr);
+    }
+}
+
+void memory_region_put(MemoryRegion *mr)
+{
+    assert(atomic_read(&mr->ref) > 0);
+
+    if (atomic_dec_and_test(&mr->ref)) {
+        /* to fix, using call_rcu( ,release) */
+        mr->life_ops->put(mr);
+    }
+}
+
 void memory_region_init_reservation(MemoryRegion *mr,
                                     const char *name,
                                     uint64_t size)
diff --git a/memory.h b/memory.h
index 8fb543b..740f018 100644
--- a/memory.h
+++ b/memory.h
@@ -18,6 +18,7 @@
 
 #include <stdint.h>
 #include <stdbool.h>
+#include "qemu/atomic.h"
 #include "qemu-common.h"
 #include "cpu-common.h"
 #include "targphys.h"
@@ -26,6 +27,7 @@
 #include "ioport.h"
 #include "int128.h"
 #include "qemu-thread.h"
+#include "qemu/reclaimer.h"
 
 typedef struct MemoryRegionOps MemoryRegionOps;
 typedef struct MemoryRegionLifeOps MemoryRegionLifeOps;
@@ -126,6 +128,7 @@ typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
 struct MemoryRegion {
     /* All fields are private - violators will be prosecuted */
     const MemoryRegionOps *ops;
+    Atomic ref;
     MemoryRegionLifeOps *life_ops;
     void *opaque;
     MemoryRegion *parent;
@@ -766,6 +769,8 @@ void memory_global_dirty_log_stop(void);
 
 void mtree_info(fprintf_function mon_printf, void *f);
 
+void memory_region_get(MemoryRegion *mr);
+void memory_region_put(MemoryRegion *mr);
 #endif
 
 #endif
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [PATCH 07/15] memory: inc/dec mr's ref when adding/removing from mem view
  2012-08-08  6:25 ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  6:25   ` Liu Ping Fan
  -1 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Anthony Liguori, Avi Kivity, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber,
	qemulist

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

memory_region_{add,del}_subregion will inc/dec mr's refcnt.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 memory.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/memory.c b/memory.c
index 5dc8b59..2eaa2fc 100644
--- a/memory.c
+++ b/memory.c
@@ -1356,7 +1356,7 @@ static void memory_region_add_subregion_common(MemoryRegion *mr,
     assert(!subregion->parent);
     subregion->parent = mr;
     subregion->addr = offset;
-
+    memory_region_get(subregion);
     qemu_mutex_lock(&mem_map_lock);
     QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
         if (subregion->may_overlap || other->may_overlap) {
@@ -1420,6 +1420,8 @@ void memory_region_del_subregion(MemoryRegion *mr,
     qemu_mutex_lock(&mem_map_lock);
     QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
     memory_region_update_topology(mr);
+    /* mr may be still in use by reader of radix, must delay to release */
+    memory_region_put(subregion);
     qemu_mutex_unlock(&mem_map_lock);
 }
 
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [Qemu-devel] [PATCH 07/15] memory: inc/dec mr's ref when adding/removing from mem view
@ 2012-08-08  6:25   ` Liu Ping Fan
  0 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemulist, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

memory_region_{add,del}_subregion will inc/dec mr's refcnt.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 memory.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/memory.c b/memory.c
index 5dc8b59..2eaa2fc 100644
--- a/memory.c
+++ b/memory.c
@@ -1356,7 +1356,7 @@ static void memory_region_add_subregion_common(MemoryRegion *mr,
     assert(!subregion->parent);
     subregion->parent = mr;
     subregion->addr = offset;
-
+    memory_region_get(subregion);
     qemu_mutex_lock(&mem_map_lock);
     QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
         if (subregion->may_overlap || other->may_overlap) {
@@ -1420,6 +1420,8 @@ void memory_region_del_subregion(MemoryRegion *mr,
     qemu_mutex_lock(&mem_map_lock);
     QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
     memory_region_update_topology(mr);
+    /* mr may be still in use by reader of radix, must delay to release */
+    memory_region_put(subregion);
     qemu_mutex_unlock(&mem_map_lock);
 }
 
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [PATCH 08/15] memory: introduce PhysMap to present snapshot of toploygy
  2012-08-08  6:25 ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  6:25   ` Liu Ping Fan
  -1 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Anthony Liguori, Avi Kivity, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber,
	qemulist

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

PhysMap contain the flatview and radix-tree view, they are snapshot
of system topology and should be consistent. With PhysMap, we can
swap the pointer when updating and achieve the atomic.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 exec.c   |    8 --------
 memory.c |   33 ---------------------------------
 memory.h |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 60 insertions(+), 43 deletions(-)

diff --git a/exec.c b/exec.c
index 0e29ef9..01b91b0 100644
--- a/exec.c
+++ b/exec.c
@@ -156,8 +156,6 @@ typedef struct PageDesc {
 #endif
 
 /* Size of the L2 (and L3, etc) page tables.  */
-#define L2_BITS 10
-#define L2_SIZE (1 << L2_BITS)
 
 #define P_L2_LEVELS \
     (((TARGET_PHYS_ADDR_SPACE_BITS - TARGET_PAGE_BITS - 1) / L2_BITS) + 1)
@@ -185,7 +183,6 @@ uintptr_t qemu_host_page_mask;
 static void *l1_map[V_L1_SIZE];
 
 #if !defined(CONFIG_USER_ONLY)
-typedef struct PhysPageEntry PhysPageEntry;
 
 static MemoryRegionSection *phys_sections;
 static unsigned phys_sections_nb, phys_sections_nb_alloc;
@@ -194,11 +191,6 @@ static uint16_t phys_section_notdirty;
 static uint16_t phys_section_rom;
 static uint16_t phys_section_watch;
 
-struct PhysPageEntry {
-    uint16_t is_leaf : 1;
-     /* index into phys_sections (is_leaf) or phys_map_nodes (!is_leaf) */
-    uint16_t ptr : 15;
-};
 
 /* Simple allocator for PhysPageEntry nodes */
 static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
diff --git a/memory.c b/memory.c
index 2eaa2fc..c7f2cfd 100644
--- a/memory.c
+++ b/memory.c
@@ -31,17 +31,6 @@ static bool global_dirty_log = false;
 static QTAILQ_HEAD(memory_listeners, MemoryListener) memory_listeners
     = QTAILQ_HEAD_INITIALIZER(memory_listeners);
 
-typedef struct AddrRange AddrRange;
-
-/*
- * Note using signed integers limits us to physical addresses at most
- * 63 bits wide.  They are needed for negative offsetting in aliases
- * (large MemoryRegion::alias_offset).
- */
-struct AddrRange {
-    Int128 start;
-    Int128 size;
-};
 
 static AddrRange addrrange_make(Int128 start, Int128 size)
 {
@@ -197,28 +186,6 @@ static bool memory_region_ioeventfd_equal(MemoryRegionIoeventfd a,
         && !memory_region_ioeventfd_before(b, a);
 }
 
-typedef struct FlatRange FlatRange;
-typedef struct FlatView FlatView;
-
-/* Range of memory in the global map.  Addresses are absolute. */
-struct FlatRange {
-    MemoryRegion *mr;
-    target_phys_addr_t offset_in_region;
-    AddrRange addr;
-    uint8_t dirty_log_mask;
-    bool readable;
-    bool readonly;
-};
-
-/* Flattened global view of current active memory hierarchy.  Kept in sorted
- * order.
- */
-struct FlatView {
-    FlatRange *ranges;
-    unsigned nr;
-    unsigned nr_allocated;
-};
-
 typedef struct AddressSpace AddressSpace;
 typedef struct AddressSpaceOps AddressSpaceOps;
 
diff --git a/memory.h b/memory.h
index 740f018..357edd8 100644
--- a/memory.h
+++ b/memory.h
@@ -29,12 +29,72 @@
 #include "qemu-thread.h"
 #include "qemu/reclaimer.h"
 
+typedef struct AddrRange AddrRange;
+typedef struct FlatRange FlatRange;
+typedef struct FlatView FlatView;
+typedef struct PhysPageEntry PhysPageEntry;
+typedef struct PhysMap PhysMap;
+typedef struct MemoryRegionSection MemoryRegionSection;
 typedef struct MemoryRegionOps MemoryRegionOps;
 typedef struct MemoryRegionLifeOps MemoryRegionLifeOps;
 typedef struct MemoryRegion MemoryRegion;
 typedef struct MemoryRegionPortio MemoryRegionPortio;
 typedef struct MemoryRegionMmio MemoryRegionMmio;
 
+/*
+ * Note using signed integers limits us to physical addresses at most
+ * 63 bits wide.  They are needed for negative offsetting in aliases
+ * (large MemoryRegion::alias_offset).
+ */
+struct AddrRange {
+    Int128 start;
+    Int128 size;
+};
+
+/* Range of memory in the global map.  Addresses are absolute. */
+struct FlatRange {
+    MemoryRegion *mr;
+    target_phys_addr_t offset_in_region;
+    AddrRange addr;
+    uint8_t dirty_log_mask;
+    bool readable;
+    bool readonly;
+};
+
+/* Flattened global view of current active memory hierarchy.  Kept in sorted
+ * order.
+ */
+struct FlatView {
+    FlatRange *ranges;
+    unsigned nr;
+    unsigned nr_allocated;
+};
+
+struct PhysPageEntry {
+    uint16_t is_leaf:1;
+     /* index into phys_sections (is_leaf) or phys_map_nodes (!is_leaf) */
+    uint16_t ptr:15;
+};
+
+#define L2_BITS 10
+#define L2_SIZE (1 << L2_BITS)
+/* This is a multi-level map on the physical address space.
+   The bottom level has pointers to MemoryRegionSections.  */
+struct PhysMap {
+    Atomic ref;
+    PhysPageEntry root;
+    PhysPageEntry (*phys_map_nodes)[L2_SIZE];
+    unsigned phys_map_nodes_nb;
+    unsigned phys_map_nodes_nb_alloc;
+
+    MemoryRegionSection *phys_sections;
+    unsigned phys_sections_nb;
+    unsigned phys_sections_nb_alloc;
+
+    /* FlatView */
+    FlatView views[2];
+};
+
 /* Must match *_DIRTY_FLAGS in cpu-all.h.  To be replaced with dynamic
  * registration.
  */
@@ -167,8 +227,6 @@ struct MemoryRegionPortio {
 
 #define PORTIO_END_OF_LIST() { }
 
-typedef struct MemoryRegionSection MemoryRegionSection;
-
 /**
  * MemoryRegionSection: describes a fragment of a #MemoryRegion
  *
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [Qemu-devel] [PATCH 08/15] memory: introduce PhysMap to present snapshot of toploygy
@ 2012-08-08  6:25   ` Liu Ping Fan
  0 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemulist, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

PhysMap contain the flatview and radix-tree view, they are snapshot
of system topology and should be consistent. With PhysMap, we can
swap the pointer when updating and achieve the atomic.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 exec.c   |    8 --------
 memory.c |   33 ---------------------------------
 memory.h |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 60 insertions(+), 43 deletions(-)

diff --git a/exec.c b/exec.c
index 0e29ef9..01b91b0 100644
--- a/exec.c
+++ b/exec.c
@@ -156,8 +156,6 @@ typedef struct PageDesc {
 #endif
 
 /* Size of the L2 (and L3, etc) page tables.  */
-#define L2_BITS 10
-#define L2_SIZE (1 << L2_BITS)
 
 #define P_L2_LEVELS \
     (((TARGET_PHYS_ADDR_SPACE_BITS - TARGET_PAGE_BITS - 1) / L2_BITS) + 1)
@@ -185,7 +183,6 @@ uintptr_t qemu_host_page_mask;
 static void *l1_map[V_L1_SIZE];
 
 #if !defined(CONFIG_USER_ONLY)
-typedef struct PhysPageEntry PhysPageEntry;
 
 static MemoryRegionSection *phys_sections;
 static unsigned phys_sections_nb, phys_sections_nb_alloc;
@@ -194,11 +191,6 @@ static uint16_t phys_section_notdirty;
 static uint16_t phys_section_rom;
 static uint16_t phys_section_watch;
 
-struct PhysPageEntry {
-    uint16_t is_leaf : 1;
-     /* index into phys_sections (is_leaf) or phys_map_nodes (!is_leaf) */
-    uint16_t ptr : 15;
-};
 
 /* Simple allocator for PhysPageEntry nodes */
 static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
diff --git a/memory.c b/memory.c
index 2eaa2fc..c7f2cfd 100644
--- a/memory.c
+++ b/memory.c
@@ -31,17 +31,6 @@ static bool global_dirty_log = false;
 static QTAILQ_HEAD(memory_listeners, MemoryListener) memory_listeners
     = QTAILQ_HEAD_INITIALIZER(memory_listeners);
 
-typedef struct AddrRange AddrRange;
-
-/*
- * Note using signed integers limits us to physical addresses at most
- * 63 bits wide.  They are needed for negative offsetting in aliases
- * (large MemoryRegion::alias_offset).
- */
-struct AddrRange {
-    Int128 start;
-    Int128 size;
-};
 
 static AddrRange addrrange_make(Int128 start, Int128 size)
 {
@@ -197,28 +186,6 @@ static bool memory_region_ioeventfd_equal(MemoryRegionIoeventfd a,
         && !memory_region_ioeventfd_before(b, a);
 }
 
-typedef struct FlatRange FlatRange;
-typedef struct FlatView FlatView;
-
-/* Range of memory in the global map.  Addresses are absolute. */
-struct FlatRange {
-    MemoryRegion *mr;
-    target_phys_addr_t offset_in_region;
-    AddrRange addr;
-    uint8_t dirty_log_mask;
-    bool readable;
-    bool readonly;
-};
-
-/* Flattened global view of current active memory hierarchy.  Kept in sorted
- * order.
- */
-struct FlatView {
-    FlatRange *ranges;
-    unsigned nr;
-    unsigned nr_allocated;
-};
-
 typedef struct AddressSpace AddressSpace;
 typedef struct AddressSpaceOps AddressSpaceOps;
 
diff --git a/memory.h b/memory.h
index 740f018..357edd8 100644
--- a/memory.h
+++ b/memory.h
@@ -29,12 +29,72 @@
 #include "qemu-thread.h"
 #include "qemu/reclaimer.h"
 
+typedef struct AddrRange AddrRange;
+typedef struct FlatRange FlatRange;
+typedef struct FlatView FlatView;
+typedef struct PhysPageEntry PhysPageEntry;
+typedef struct PhysMap PhysMap;
+typedef struct MemoryRegionSection MemoryRegionSection;
 typedef struct MemoryRegionOps MemoryRegionOps;
 typedef struct MemoryRegionLifeOps MemoryRegionLifeOps;
 typedef struct MemoryRegion MemoryRegion;
 typedef struct MemoryRegionPortio MemoryRegionPortio;
 typedef struct MemoryRegionMmio MemoryRegionMmio;
 
+/*
+ * Note using signed integers limits us to physical addresses at most
+ * 63 bits wide.  They are needed for negative offsetting in aliases
+ * (large MemoryRegion::alias_offset).
+ */
+struct AddrRange {
+    Int128 start;
+    Int128 size;
+};
+
+/* Range of memory in the global map.  Addresses are absolute. */
+struct FlatRange {
+    MemoryRegion *mr;
+    target_phys_addr_t offset_in_region;
+    AddrRange addr;
+    uint8_t dirty_log_mask;
+    bool readable;
+    bool readonly;
+};
+
+/* Flattened global view of current active memory hierarchy.  Kept in sorted
+ * order.
+ */
+struct FlatView {
+    FlatRange *ranges;
+    unsigned nr;
+    unsigned nr_allocated;
+};
+
+struct PhysPageEntry {
+    uint16_t is_leaf:1;
+     /* index into phys_sections (is_leaf) or phys_map_nodes (!is_leaf) */
+    uint16_t ptr:15;
+};
+
+#define L2_BITS 10
+#define L2_SIZE (1 << L2_BITS)
+/* This is a multi-level map on the physical address space.
+   The bottom level has pointers to MemoryRegionSections.  */
+struct PhysMap {
+    Atomic ref;
+    PhysPageEntry root;
+    PhysPageEntry (*phys_map_nodes)[L2_SIZE];
+    unsigned phys_map_nodes_nb;
+    unsigned phys_map_nodes_nb_alloc;
+
+    MemoryRegionSection *phys_sections;
+    unsigned phys_sections_nb;
+    unsigned phys_sections_nb_alloc;
+
+    /* FlatView */
+    FlatView views[2];
+};
+
 /* Must match *_DIRTY_FLAGS in cpu-all.h.  To be replaced with dynamic
  * registration.
  */
@@ -167,8 +227,6 @@ struct MemoryRegionPortio {
 
 #define PORTIO_END_OF_LIST() { }
 
-typedef struct MemoryRegionSection MemoryRegionSection;
-
 /**
  * MemoryRegionSection: describes a fragment of a #MemoryRegion
  *
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [PATCH 09/15] memory: prepare flatview and radix-tree for rcu style access
  2012-08-08  6:25 ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  6:25   ` Liu Ping Fan
  -1 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Anthony Liguori, Avi Kivity, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber,
	qemulist

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Flatview and radix view are all under the protection of pointer.
And this make sure the change of them seem to be atomic!

The mr accessed by radix-tree leaf or flatview will be reclaimed
after the prev PhysMap not in use any longer

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 exec.c      |  303 +++++++++++++++++++++++++++++++++++++++-------------------
 hw/vhost.c  |    2 +-
 hw/xen_pt.c |    2 +-
 kvm-all.c   |    2 +-
 memory.c    |   92 ++++++++++++++-----
 memory.h    |    9 ++-
 vl.c        |    1 +
 xen-all.c   |    2 +-
 8 files changed, 286 insertions(+), 127 deletions(-)

diff --git a/exec.c b/exec.c
index 01b91b0..97addb9 100644
--- a/exec.c
+++ b/exec.c
@@ -24,6 +24,7 @@
 #include <sys/mman.h>
 #endif
 
+#include "qemu/atomic.h"
 #include "qemu-common.h"
 #include "cpu.h"
 #include "tcg.h"
@@ -35,6 +36,8 @@
 #include "qemu-timer.h"
 #include "memory.h"
 #include "exec-memory.h"
+#include "qemu-thread.h"
+#include "qemu/reclaimer.h"
 #if defined(CONFIG_USER_ONLY)
 #include <qemu.h>
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
@@ -184,25 +187,17 @@ static void *l1_map[V_L1_SIZE];
 
 #if !defined(CONFIG_USER_ONLY)
 
-static MemoryRegionSection *phys_sections;
-static unsigned phys_sections_nb, phys_sections_nb_alloc;
 static uint16_t phys_section_unassigned;
 static uint16_t phys_section_notdirty;
 static uint16_t phys_section_rom;
 static uint16_t phys_section_watch;
 
-
-/* Simple allocator for PhysPageEntry nodes */
-static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
-static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
-
 #define PHYS_MAP_NODE_NIL (((uint16_t)~0) >> 1)
 
-/* This is a multi-level map on the physical address space.
-   The bottom level has pointers to MemoryRegionSections.  */
-static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
-
+static QemuMutex cur_map_lock;
+static PhysMap *cur_map;
 QemuMutex mem_map_lock;
+static PhysMap *next_map;
 
 static void io_mem_init(void);
 static void memory_map_init(void);
@@ -383,41 +378,38 @@ static inline PageDesc *page_find(tb_page_addr_t index)
 
 #if !defined(CONFIG_USER_ONLY)
 
-static void phys_map_node_reserve(unsigned nodes)
+static void phys_map_node_reserve(PhysMap *map, unsigned nodes)
 {
-    if (phys_map_nodes_nb + nodes > phys_map_nodes_nb_alloc) {
+    if (map->phys_map_nodes_nb + nodes > map->phys_map_nodes_nb_alloc) {
         typedef PhysPageEntry Node[L2_SIZE];
-        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc * 2, 16);
-        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc,
-                                      phys_map_nodes_nb + nodes);
-        phys_map_nodes = g_renew(Node, phys_map_nodes,
-                                 phys_map_nodes_nb_alloc);
+        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc * 2,
+                                                                        16);
+        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc,
+                                      map->phys_map_nodes_nb + nodes);
+        map->phys_map_nodes = g_renew(Node, map->phys_map_nodes,
+                                 map->phys_map_nodes_nb_alloc);
     }
 }
 
-static uint16_t phys_map_node_alloc(void)
+static uint16_t phys_map_node_alloc(PhysMap *map)
 {
     unsigned i;
     uint16_t ret;
 
-    ret = phys_map_nodes_nb++;
+    ret = map->phys_map_nodes_nb++;
     assert(ret != PHYS_MAP_NODE_NIL);
-    assert(ret != phys_map_nodes_nb_alloc);
+    assert(ret != map->phys_map_nodes_nb_alloc);
     for (i = 0; i < L2_SIZE; ++i) {
-        phys_map_nodes[ret][i].is_leaf = 0;
-        phys_map_nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
+        map->phys_map_nodes[ret][i].is_leaf = 0;
+        map->phys_map_nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
     }
     return ret;
 }
 
-static void phys_map_nodes_reset(void)
-{
-    phys_map_nodes_nb = 0;
-}
-
-
-static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
-                                target_phys_addr_t *nb, uint16_t leaf,
+static void phys_page_set_level(PhysMap *map, PhysPageEntry *lp,
+                                target_phys_addr_t *index,
+                                target_phys_addr_t *nb,
+                                uint16_t leaf,
                                 int level)
 {
     PhysPageEntry *p;
@@ -425,8 +417,8 @@ static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
     target_phys_addr_t step = (target_phys_addr_t)1 << (level * L2_BITS);
 
     if (!lp->is_leaf && lp->ptr == PHYS_MAP_NODE_NIL) {
-        lp->ptr = phys_map_node_alloc();
-        p = phys_map_nodes[lp->ptr];
+        lp->ptr = phys_map_node_alloc(map);
+        p = map->phys_map_nodes[lp->ptr];
         if (level == 0) {
             for (i = 0; i < L2_SIZE; i++) {
                 p[i].is_leaf = 1;
@@ -434,7 +426,7 @@ static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
             }
         }
     } else {
-        p = phys_map_nodes[lp->ptr];
+        p = map->phys_map_nodes[lp->ptr];
     }
     lp = &p[(*index >> (level * L2_BITS)) & (L2_SIZE - 1)];
 
@@ -445,24 +437,27 @@ static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
             *index += step;
             *nb -= step;
         } else {
-            phys_page_set_level(lp, index, nb, leaf, level - 1);
+            phys_page_set_level(map, lp, index, nb, leaf, level - 1);
         }
         ++lp;
     }
 }
 
-static void phys_page_set(target_phys_addr_t index, target_phys_addr_t nb,
-                          uint16_t leaf)
+static void phys_page_set(PhysMap *map, target_phys_addr_t index,
+                            target_phys_addr_t nb,
+                            uint16_t leaf)
 {
     /* Wildly overreserve - it doesn't matter much. */
-    phys_map_node_reserve(3 * P_L2_LEVELS);
+    phys_map_node_reserve(map, 3 * P_L2_LEVELS);
 
-    phys_page_set_level(&phys_map, &index, &nb, leaf, P_L2_LEVELS - 1);
+    /* update in new tree*/
+    phys_page_set_level(map, &map->root, &index, &nb, leaf, P_L2_LEVELS - 1);
 }
 
-MemoryRegionSection *phys_page_find(target_phys_addr_t index)
+static MemoryRegionSection *phys_page_find_internal(PhysMap *map,
+                           target_phys_addr_t index)
 {
-    PhysPageEntry lp = phys_map;
+    PhysPageEntry lp = map->root;
     PhysPageEntry *p;
     int i;
     uint16_t s_index = phys_section_unassigned;
@@ -471,13 +466,79 @@ MemoryRegionSection *phys_page_find(target_phys_addr_t index)
         if (lp.ptr == PHYS_MAP_NODE_NIL) {
             goto not_found;
         }
-        p = phys_map_nodes[lp.ptr];
+        p = map->phys_map_nodes[lp.ptr];
         lp = p[(index >> (i * L2_BITS)) & (L2_SIZE - 1)];
     }
 
     s_index = lp.ptr;
 not_found:
-    return &phys_sections[s_index];
+    return &map->phys_sections[s_index];
+}
+
+MemoryRegionSection *phys_page_find(target_phys_addr_t index)
+{
+    return phys_page_find_internal(cur_map, index);
+}
+
+void physmap_get(PhysMap *map)
+{
+    atomic_inc(&map->ref);
+}
+
+/* Untill rcu read side finished, do this reclaim */
+static ChunkHead physmap_reclaimer_list = { .lh_first = NULL };
+void physmap_reclaimer_enqueue(void *opaque, ReleaseHandler *release)
+{
+    reclaimer_enqueue(&physmap_reclaimer_list, opaque, release);
+}
+
+static void destroy_all_mappings(PhysMap *map);
+static void phys_map_release(PhysMap *map)
+{
+    /* emulate for rcu reclaimer for mr */
+    reclaimer_worker(&physmap_reclaimer_list);
+
+    destroy_all_mappings(map);
+    g_free(map->phys_map_nodes);
+    g_free(map->phys_sections);
+    g_free(map->views[0].ranges);
+    g_free(map->views[1].ranges);
+    g_free(map);
+}
+
+void physmap_put(PhysMap *map)
+{
+    if (atomic_dec_and_test(&map->ref)) {
+        phys_map_release(map);
+    }
+}
+
+void cur_map_update(PhysMap *next)
+{
+    qemu_mutex_lock(&cur_map_lock);
+    physmap_put(cur_map);
+    cur_map = next;
+    smp_mb();
+    qemu_mutex_unlock(&cur_map_lock);
+}
+
+PhysMap *cur_map_get(void)
+{
+    PhysMap *ret;
+
+    qemu_mutex_lock(&cur_map_lock);
+    ret = cur_map;
+    physmap_get(ret);
+    smp_mb();
+    qemu_mutex_unlock(&cur_map_lock);
+    return ret;
+}
+
+PhysMap *alloc_next_map(void)
+{
+    PhysMap *next = g_malloc0(sizeof(PhysMap));
+    atomic_set(&next->ref, 1);
+    return next;
 }
 
 bool memory_region_is_unassigned(MemoryRegion *mr)
@@ -632,6 +693,7 @@ void cpu_exec_init_all(void)
     memory_map_init();
     io_mem_init();
     qemu_mutex_init(&mem_map_lock);
+    qemu_mutex_init(&cur_map_lock);
 #endif
 }
 
@@ -2161,17 +2223,18 @@ int page_unprotect(target_ulong address, uintptr_t pc, void *puc)
 
 #define SUBPAGE_IDX(addr) ((addr) & ~TARGET_PAGE_MASK)
 typedef struct subpage_t {
+    PhysMap *map;
     MemoryRegion iomem;
     target_phys_addr_t base;
     uint16_t sub_section[TARGET_PAGE_SIZE];
 } subpage_t;
 
-static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
-                             uint16_t section);
-static subpage_t *subpage_init(target_phys_addr_t base);
-static void destroy_page_desc(uint16_t section_index)
+static int subpage_register(PhysMap *map, subpage_t *mmio, uint32_t start,
+                            uint32_t end, uint16_t section);
+static subpage_t *subpage_init(PhysMap *map, target_phys_addr_t base);
+static void destroy_page_desc(PhysMap *map, uint16_t section_index)
 {
-    MemoryRegionSection *section = &phys_sections[section_index];
+    MemoryRegionSection *section = &map->phys_sections[section_index];
     MemoryRegion *mr = section->mr;
 
     if (mr->subpage) {
@@ -2181,7 +2244,7 @@ static void destroy_page_desc(uint16_t section_index)
     }
 }
 
-static void destroy_l2_mapping(PhysPageEntry *lp, unsigned level)
+static void destroy_l2_mapping(PhysMap *map, PhysPageEntry *lp, unsigned level)
 {
     unsigned i;
     PhysPageEntry *p;
@@ -2190,38 +2253,34 @@ static void destroy_l2_mapping(PhysPageEntry *lp, unsigned level)
         return;
     }
 
-    p = phys_map_nodes[lp->ptr];
+    p = map->phys_map_nodes[lp->ptr];
     for (i = 0; i < L2_SIZE; ++i) {
         if (!p[i].is_leaf) {
-            destroy_l2_mapping(&p[i], level - 1);
+            destroy_l2_mapping(map, &p[i], level - 1);
         } else {
-            destroy_page_desc(p[i].ptr);
+            destroy_page_desc(map, p[i].ptr);
         }
     }
     lp->is_leaf = 0;
     lp->ptr = PHYS_MAP_NODE_NIL;
 }
 
-static void destroy_all_mappings(void)
+static void destroy_all_mappings(PhysMap *map)
 {
-    destroy_l2_mapping(&phys_map, P_L2_LEVELS - 1);
-    phys_map_nodes_reset();
-}
+    PhysPageEntry *root = &map->root;
 
-static uint16_t phys_section_add(MemoryRegionSection *section)
-{
-    if (phys_sections_nb == phys_sections_nb_alloc) {
-        phys_sections_nb_alloc = MAX(phys_sections_nb_alloc * 2, 16);
-        phys_sections = g_renew(MemoryRegionSection, phys_sections,
-                                phys_sections_nb_alloc);
-    }
-    phys_sections[phys_sections_nb] = *section;
-    return phys_sections_nb++;
+    destroy_l2_mapping(map, root, P_L2_LEVELS - 1);
 }
 
-static void phys_sections_clear(void)
+static uint16_t phys_section_add(PhysMap *map, MemoryRegionSection *section)
 {
-    phys_sections_nb = 0;
+    if (map->phys_sections_nb == map->phys_sections_nb_alloc) {
+        map->phys_sections_nb_alloc = MAX(map->phys_sections_nb_alloc * 2, 16);
+        map->phys_sections = g_renew(MemoryRegionSection, map->phys_sections,
+                                map->phys_sections_nb_alloc);
+    }
+    map->phys_sections[map->phys_sections_nb] = *section;
+    return map->phys_sections_nb++;
 }
 
 /* register physical memory.
@@ -2232,12 +2291,13 @@ static void phys_sections_clear(void)
    start_addr and region_offset are rounded down to a page boundary
    before calculating this offset.  This should not be a problem unless
    the low bits of start_addr and region_offset differ.  */
-static void register_subpage(MemoryRegionSection *section)
+static void register_subpage(PhysMap *map, MemoryRegionSection *section)
 {
     subpage_t *subpage;
     target_phys_addr_t base = section->offset_within_address_space
         & TARGET_PAGE_MASK;
-    MemoryRegionSection *existing = phys_page_find(base >> TARGET_PAGE_BITS);
+    MemoryRegionSection *existing = phys_page_find_internal(map,
+                                            base >> TARGET_PAGE_BITS);
     MemoryRegionSection subsection = {
         .offset_within_address_space = base,
         .size = TARGET_PAGE_SIZE,
@@ -2247,30 +2307,30 @@ static void register_subpage(MemoryRegionSection *section)
     assert(existing->mr->subpage || existing->mr == &io_mem_unassigned);
 
     if (!(existing->mr->subpage)) {
-        subpage = subpage_init(base);
+        subpage = subpage_init(map, base);
         subsection.mr = &subpage->iomem;
-        phys_page_set(base >> TARGET_PAGE_BITS, 1,
-                      phys_section_add(&subsection));
+        phys_page_set(map, base >> TARGET_PAGE_BITS, 1,
+                      phys_section_add(map, &subsection));
     } else {
         subpage = container_of(existing->mr, subpage_t, iomem);
     }
     start = section->offset_within_address_space & ~TARGET_PAGE_MASK;
     end = start + section->size;
-    subpage_register(subpage, start, end, phys_section_add(section));
+    subpage_register(map, subpage, start, end, phys_section_add(map, section));
 }
 
 
-static void register_multipage(MemoryRegionSection *section)
+static void register_multipage(PhysMap *map, MemoryRegionSection *section)
 {
     target_phys_addr_t start_addr = section->offset_within_address_space;
     ram_addr_t size = section->size;
     target_phys_addr_t addr;
-    uint16_t section_index = phys_section_add(section);
+    uint16_t section_index = phys_section_add(map, section);
 
     assert(size);
 
     addr = start_addr;
-    phys_page_set(addr >> TARGET_PAGE_BITS, size >> TARGET_PAGE_BITS,
+    phys_page_set(map, addr >> TARGET_PAGE_BITS, size >> TARGET_PAGE_BITS,
                   section_index);
 }
 
@@ -2278,13 +2338,14 @@ void cpu_register_physical_memory_log(MemoryRegionSection *section,
                                       bool readonly)
 {
     MemoryRegionSection now = *section, remain = *section;
+    PhysMap *map = next_map;
 
     if ((now.offset_within_address_space & ~TARGET_PAGE_MASK)
         || (now.size < TARGET_PAGE_SIZE)) {
         now.size = MIN(TARGET_PAGE_ALIGN(now.offset_within_address_space)
                        - now.offset_within_address_space,
                        now.size);
-        register_subpage(&now);
+        register_subpage(map, &now);
         remain.size -= now.size;
         remain.offset_within_address_space += now.size;
         remain.offset_within_region += now.size;
@@ -2292,14 +2353,14 @@ void cpu_register_physical_memory_log(MemoryRegionSection *section,
     now = remain;
     now.size &= TARGET_PAGE_MASK;
     if (now.size) {
-        register_multipage(&now);
+        register_multipage(map, &now);
         remain.size -= now.size;
         remain.offset_within_address_space += now.size;
         remain.offset_within_region += now.size;
     }
     now = remain;
     if (now.size) {
-        register_subpage(&now);
+        register_subpage(map, &now);
     }
 }
 
@@ -3001,7 +3062,7 @@ static uint64_t subpage_read(void *opaque, target_phys_addr_t addr,
            mmio, len, addr, idx);
 #endif
 
-    section = &phys_sections[mmio->sub_section[idx]];
+    section = &mmio->map->phys_sections[mmio->sub_section[idx]];
     addr += mmio->base;
     addr -= section->offset_within_address_space;
     addr += section->offset_within_region;
@@ -3020,7 +3081,7 @@ static void subpage_write(void *opaque, target_phys_addr_t addr,
            __func__, mmio, len, addr, idx, value);
 #endif
 
-    section = &phys_sections[mmio->sub_section[idx]];
+    section = &mmio->map->phys_sections[mmio->sub_section[idx]];
     addr += mmio->base;
     addr -= section->offset_within_address_space;
     addr += section->offset_within_region;
@@ -3065,8 +3126,8 @@ static const MemoryRegionOps subpage_ram_ops = {
     .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
-static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
-                             uint16_t section)
+static int subpage_register(PhysMap *map, subpage_t *mmio, uint32_t start,
+                              uint32_t end, uint16_t section)
 {
     int idx, eidx;
 
@@ -3078,10 +3139,10 @@ static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
     printf("%s: %p start %08x end %08x idx %08x eidx %08x mem %ld\n", __func__,
            mmio, start, end, idx, eidx, memory);
 #endif
-    if (memory_region_is_ram(phys_sections[section].mr)) {
-        MemoryRegionSection new_section = phys_sections[section];
+    if (memory_region_is_ram(map->phys_sections[section].mr)) {
+        MemoryRegionSection new_section = map->phys_sections[section];
         new_section.mr = &io_mem_subpage_ram;
-        section = phys_section_add(&new_section);
+        section = phys_section_add(map, &new_section);
     }
     for (; idx <= eidx; idx++) {
         mmio->sub_section[idx] = section;
@@ -3090,12 +3151,13 @@ static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
     return 0;
 }
 
-static subpage_t *subpage_init(target_phys_addr_t base)
+static subpage_t *subpage_init(PhysMap *map, target_phys_addr_t base)
 {
     subpage_t *mmio;
 
     mmio = g_malloc0(sizeof(subpage_t));
 
+    mmio->map = map;
     mmio->base = base;
     memory_region_init_io(&mmio->iomem, &subpage_ops, mmio,
                           "subpage", TARGET_PAGE_SIZE);
@@ -3104,12 +3166,12 @@ static subpage_t *subpage_init(target_phys_addr_t base)
     printf("%s: %p base " TARGET_FMT_plx " len %08x %d\n", __func__,
            mmio, base, TARGET_PAGE_SIZE, subpage_memory);
 #endif
-    subpage_register(mmio, 0, TARGET_PAGE_SIZE-1, phys_section_unassigned);
+    subpage_register(map, mmio, 0, TARGET_PAGE_SIZE-1, phys_section_unassigned);
 
     return mmio;
 }
 
-static uint16_t dummy_section(MemoryRegion *mr)
+static uint16_t dummy_section(PhysMap *map, MemoryRegion *mr)
 {
     MemoryRegionSection section = {
         .mr = mr,
@@ -3118,7 +3180,7 @@ static uint16_t dummy_section(MemoryRegion *mr)
         .size = UINT64_MAX,
     };
 
-    return phys_section_add(&section);
+    return phys_section_add(map, &section);
 }
 
 MemoryRegion *iotlb_to_region(target_phys_addr_t index)
@@ -3140,15 +3202,32 @@ static void io_mem_init(void)
                           "watch", UINT64_MAX);
 }
 
-static void core_begin(MemoryListener *listener)
+#if 0
+static void physmap_init(void)
+{
+    FlatView v = { .ranges = NULL,
+                             .nr = 0,
+                             .nr_allocated = 0,
+    };
+
+    init_map.views[0] = v;
+    init_map.views[1] = v;
+    cur_map =  &init_map;
+}
+#endif
+
+static void core_begin(MemoryListener *listener, PhysMap *new_map)
 {
-    destroy_all_mappings();
-    phys_sections_clear();
-    phys_map.ptr = PHYS_MAP_NODE_NIL;
-    phys_section_unassigned = dummy_section(&io_mem_unassigned);
-    phys_section_notdirty = dummy_section(&io_mem_notdirty);
-    phys_section_rom = dummy_section(&io_mem_rom);
-    phys_section_watch = dummy_section(&io_mem_watch);
+
+    new_map->root.ptr = PHYS_MAP_NODE_NIL;
+    new_map->root.is_leaf = 0;
+
+    /* In all the map, these sections have the same index */
+    phys_section_unassigned = dummy_section(new_map, &io_mem_unassigned);
+    phys_section_notdirty = dummy_section(new_map, &io_mem_notdirty);
+    phys_section_rom = dummy_section(new_map, &io_mem_rom);
+    phys_section_watch = dummy_section(new_map, &io_mem_watch);
+    next_map = new_map;
 }
 
 static void core_commit(MemoryListener *listener)
@@ -3161,6 +3240,16 @@ static void core_commit(MemoryListener *listener)
     for(env = first_cpu; env != NULL; env = env->next_cpu) {
         tlb_flush(env, 1);
     }
+
+/* move into high layer
+    qemu_mutex_lock(&cur_map_lock);
+    if (cur_map != NULL) {
+        physmap_put(cur_map);
+    }
+    cur_map = next_map;
+    smp_mb();
+    qemu_mutex_unlock(&cur_map_lock);
+*/
 }
 
 static void core_region_add(MemoryListener *listener,
@@ -3217,7 +3306,7 @@ static void core_eventfd_del(MemoryListener *listener,
 {
 }
 
-static void io_begin(MemoryListener *listener)
+static void io_begin(MemoryListener *listener, PhysMap *next)
 {
 }
 
@@ -3329,6 +3418,20 @@ static void memory_map_init(void)
     memory_listener_register(&io_memory_listener, system_io);
 }
 
+void physmap_init(void)
+{
+    FlatView v = { .ranges = NULL, .nr = 0, .nr_allocated = 0,
+                           };
+    PhysMap *init_map = g_malloc0(sizeof(PhysMap));
+
+    atomic_set(&init_map->ref, 1);
+    init_map->root.ptr = PHYS_MAP_NODE_NIL;
+    init_map->root.is_leaf = 0;
+    init_map->views[0] = v;
+    init_map->views[1] = v;
+    cur_map = init_map;
+}
+
 MemoryRegion *get_system_memory(void)
 {
     return system_memory;
@@ -3391,6 +3494,7 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
     uint32_t val;
     target_phys_addr_t page;
     MemoryRegionSection *section;
+    PhysMap *cur = cur_map_get();
 
     while (len > 0) {
         page = addr & TARGET_PAGE_MASK;
@@ -3472,6 +3576,7 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
         buf += l;
         addr += l;
     }
+    physmap_put(cur);
 }
 
 /* used for ROM loading : can write in RAM and ROM */
diff --git a/hw/vhost.c b/hw/vhost.c
index 43664e7..df58345 100644
--- a/hw/vhost.c
+++ b/hw/vhost.c
@@ -438,7 +438,7 @@ static bool vhost_section(MemoryRegionSection *section)
         && memory_region_is_ram(section->mr);
 }
 
-static void vhost_begin(MemoryListener *listener)
+static void vhost_begin(MemoryListener *listener, PhysMap *next)
 {
 }
 
diff --git a/hw/xen_pt.c b/hw/xen_pt.c
index 3b6d186..fba8586 100644
--- a/hw/xen_pt.c
+++ b/hw/xen_pt.c
@@ -597,7 +597,7 @@ static void xen_pt_region_update(XenPCIPassthroughState *s,
     }
 }
 
-static void xen_pt_begin(MemoryListener *l)
+static void xen_pt_begin(MemoryListener *l, PhysMap *next)
 {
 }
 
diff --git a/kvm-all.c b/kvm-all.c
index f8e4328..bc42cab 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -693,7 +693,7 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, bool add)
     }
 }
 
-static void kvm_begin(MemoryListener *listener)
+static void kvm_begin(MemoryListener *listener, PhysMap *next)
 {
 }
 
diff --git a/memory.c b/memory.c
index c7f2cfd..54cdc7f 100644
--- a/memory.c
+++ b/memory.c
@@ -20,6 +20,7 @@
 #include "kvm.h"
 #include <assert.h>
 #include "hw/qdev.h"
+#include "qemu-thread.h"
 
 #define WANT_EXEC_OBSOLETE
 #include "exec-obsolete.h"
@@ -192,7 +193,7 @@ typedef struct AddressSpaceOps AddressSpaceOps;
 /* A system address space - I/O, memory, etc. */
 struct AddressSpace {
     MemoryRegion *root;
-    FlatView current_map;
+    int view_id;
     int ioeventfd_nb;
     MemoryRegionIoeventfd *ioeventfds;
 };
@@ -232,11 +233,6 @@ static void flatview_insert(FlatView *view, unsigned pos, FlatRange *range)
     ++view->nr;
 }
 
-static void flatview_destroy(FlatView *view)
-{
-    g_free(view->ranges);
-}
-
 static bool can_merge(FlatRange *r1, FlatRange *r2)
 {
     return int128_eq(addrrange_end(r1->addr), r2->addr.start)
@@ -594,8 +590,10 @@ static void address_space_update_ioeventfds(AddressSpace *as)
     MemoryRegionIoeventfd *ioeventfds = NULL;
     AddrRange tmp;
     unsigned i;
+    PhysMap *map = cur_map_get();
+    FlatView *view = &map->views[as->view_id];
 
-    FOR_EACH_FLAT_RANGE(fr, &as->current_map) {
+    FOR_EACH_FLAT_RANGE(fr, view) {
         for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
             tmp = addrrange_shift(fr->mr->ioeventfds[i].addr,
                                   int128_sub(fr->addr.start,
@@ -616,6 +614,7 @@ static void address_space_update_ioeventfds(AddressSpace *as)
     g_free(as->ioeventfds);
     as->ioeventfds = ioeventfds;
     as->ioeventfd_nb = ioeventfd_nb;
+    physmap_put(map);
 }
 
 static void address_space_update_topology_pass(AddressSpace *as,
@@ -681,21 +680,23 @@ static void address_space_update_topology_pass(AddressSpace *as,
 }
 
 
-static void address_space_update_topology(AddressSpace *as)
+static void address_space_update_topology(AddressSpace *as, PhysMap *prev,
+                                            PhysMap *next)
 {
-    FlatView old_view = as->current_map;
+    FlatView old_view = prev->views[as->view_id];
     FlatView new_view = generate_memory_topology(as->root);
 
     address_space_update_topology_pass(as, old_view, new_view, false);
     address_space_update_topology_pass(as, old_view, new_view, true);
+    next->views[as->view_id] = new_view;
 
-    as->current_map = new_view;
-    flatview_destroy(&old_view);
     address_space_update_ioeventfds(as);
 }
 
 static void memory_region_update_topology(MemoryRegion *mr)
 {
+    PhysMap *prev, *next;
+
     if (memory_region_transaction_depth) {
         memory_region_update_pending |= !mr || mr->enabled;
         return;
@@ -705,16 +706,20 @@ static void memory_region_update_topology(MemoryRegion *mr)
         return;
     }
 
-    MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
+     prev = cur_map_get();
+    /* allocate PhysMap next here */
+    next = alloc_next_map();
+    MEMORY_LISTENER_CALL_GLOBAL(begin, Forward, next);
 
     if (address_space_memory.root) {
-        address_space_update_topology(&address_space_memory);
+        address_space_update_topology(&address_space_memory, prev, next);
     }
     if (address_space_io.root) {
-        address_space_update_topology(&address_space_io);
+        address_space_update_topology(&address_space_io, prev, next);
     }
 
     MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
+    cur_map_update(next);
 
     memory_region_update_pending = false;
 }
@@ -1071,7 +1076,7 @@ void memory_region_put(MemoryRegion *mr)
 
     if (atomic_dec_and_test(&mr->ref)) {
         /* to fix, using call_rcu( ,release) */
-        mr->life_ops->put(mr);
+        physmap_reclaimer_enqueue(mr, (ReleaseHandler *)mr->life_ops->put);
     }
 }
 
@@ -1147,13 +1152,18 @@ void memory_region_set_dirty(MemoryRegion *mr, target_phys_addr_t addr,
 void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
 {
     FlatRange *fr;
+    FlatView *fview;
+    PhysMap *map;
 
-    FOR_EACH_FLAT_RANGE(fr, &address_space_memory.current_map) {
+    map = cur_map_get();
+    fview = &map->views[address_space_memory.view_id];
+    FOR_EACH_FLAT_RANGE(fr, fview) {
         if (fr->mr == mr) {
             MEMORY_LISTENER_UPDATE_REGION(fr, &address_space_memory,
                                           Forward, log_sync);
         }
     }
+    physmap_put(map);
 }
 
 void memory_region_set_readonly(MemoryRegion *mr, bool readonly)
@@ -1201,8 +1211,12 @@ static void memory_region_update_coalesced_range(MemoryRegion *mr)
     FlatRange *fr;
     CoalescedMemoryRange *cmr;
     AddrRange tmp;
+    FlatView *fview;
+    PhysMap *map;
 
-    FOR_EACH_FLAT_RANGE(fr, &address_space_memory.current_map) {
+    map = cur_map_get();
+    fview = &map->views[address_space_memory.view_id];
+    FOR_EACH_FLAT_RANGE(fr, fview) {
         if (fr->mr == mr) {
             qemu_unregister_coalesced_mmio(int128_get64(fr->addr.start),
                                            int128_get64(fr->addr.size));
@@ -1219,6 +1233,7 @@ static void memory_region_update_coalesced_range(MemoryRegion *mr)
             }
         }
     }
+    physmap_put(map);
 }
 
 void memory_region_set_coalescing(MemoryRegion *mr)
@@ -1458,29 +1473,49 @@ static int cmp_flatrange_addr(const void *addr_, const void *fr_)
     return 0;
 }
 
-static FlatRange *address_space_lookup(AddressSpace *as, AddrRange addr)
+static FlatRange *address_space_lookup(FlatView *view, AddrRange addr)
 {
-    return bsearch(&addr, as->current_map.ranges, as->current_map.nr,
+    return bsearch(&addr, view->ranges, view->nr,
                    sizeof(FlatRange), cmp_flatrange_addr);
 }
 
+/* dec the ref, which inc by memory_region_find*/
+void memory_region_section_put(MemoryRegionSection *mrs)
+{
+    if (mrs->mr != NULL) {
+        memory_region_put(mrs->mr);
+    }
+}
+
+/* inc mr's ref. Caller need dec mr's ref */
 MemoryRegionSection memory_region_find(MemoryRegion *address_space,
                                        target_phys_addr_t addr, uint64_t size)
 {
+    PhysMap *map;
     AddressSpace *as = memory_region_to_address_space(address_space);
     AddrRange range = addrrange_make(int128_make64(addr),
                                      int128_make64(size));
-    FlatRange *fr = address_space_lookup(as, range);
+    FlatView *fview;
+
+    map = cur_map_get();
+
+    fview = &map->views[as->view_id];
+    FlatRange *fr = address_space_lookup(fview, range);
     MemoryRegionSection ret = { .mr = NULL, .size = 0 };
 
     if (!fr) {
+        physmap_put(map);
         return ret;
     }
 
-    while (fr > as->current_map.ranges
+    while (fr > fview->ranges
            && addrrange_intersects(fr[-1].addr, range)) {
         --fr;
     }
+    /* To fix, the caller must in rcu, or we must inc fr->mr->ref here
+     */
+    memory_region_get(fr->mr);
+    physmap_put(map);
 
     ret.mr = fr->mr;
     range = addrrange_intersection(range, fr->addr);
@@ -1497,10 +1532,13 @@ void memory_global_sync_dirty_bitmap(MemoryRegion *address_space)
 {
     AddressSpace *as = memory_region_to_address_space(address_space);
     FlatRange *fr;
+    PhysMap *map = cur_map_get();
+    FlatView *view = &map->views[as->view_id];
 
-    FOR_EACH_FLAT_RANGE(fr, &as->current_map) {
+    FOR_EACH_FLAT_RANGE(fr, view) {
         MEMORY_LISTENER_UPDATE_REGION(fr, as, Forward, log_sync);
     }
+    physmap_put(map);
 }
 
 void memory_global_dirty_log_start(void)
@@ -1519,6 +1557,8 @@ static void listener_add_address_space(MemoryListener *listener,
                                        AddressSpace *as)
 {
     FlatRange *fr;
+    PhysMap *map;
+    FlatView *view;
 
     if (listener->address_space_filter
         && listener->address_space_filter != as->root) {
@@ -1528,7 +1568,10 @@ static void listener_add_address_space(MemoryListener *listener,
     if (global_dirty_log) {
         listener->log_global_start(listener);
     }
-    FOR_EACH_FLAT_RANGE(fr, &as->current_map) {
+
+    map = cur_map_get();
+    view = &map->views[as->view_id];
+    FOR_EACH_FLAT_RANGE(fr, view) {
         MemoryRegionSection section = {
             .mr = fr->mr,
             .address_space = as->root,
@@ -1539,6 +1582,7 @@ static void listener_add_address_space(MemoryListener *listener,
         };
         listener->region_add(listener, &section);
     }
+    physmap_put(map);
 }
 
 void memory_listener_register(MemoryListener *listener, MemoryRegion *filter)
@@ -1570,12 +1614,14 @@ void memory_listener_unregister(MemoryListener *listener)
 void set_system_memory_map(MemoryRegion *mr)
 {
     address_space_memory.root = mr;
+    address_space_memory.view_id = 0;
     memory_region_update_topology(NULL);
 }
 
 void set_system_io_map(MemoryRegion *mr)
 {
     address_space_io.root = mr;
+    address_space_io.view_id = 1;
     memory_region_update_topology(NULL);
 }
 
diff --git a/memory.h b/memory.h
index 357edd8..18442d4 100644
--- a/memory.h
+++ b/memory.h
@@ -256,7 +256,7 @@ typedef struct MemoryListener MemoryListener;
  * Use with memory_listener_register() and memory_listener_unregister().
  */
 struct MemoryListener {
-    void (*begin)(MemoryListener *listener);
+    void (*begin)(MemoryListener *listener, PhysMap *next);
     void (*commit)(MemoryListener *listener);
     void (*region_add)(MemoryListener *listener, MemoryRegionSection *section);
     void (*region_del)(MemoryListener *listener, MemoryRegionSection *section);
@@ -829,6 +829,13 @@ void mtree_info(fprintf_function mon_printf, void *f);
 
 void memory_region_get(MemoryRegion *mr);
 void memory_region_put(MemoryRegion *mr);
+void physmap_reclaimer_enqueue(void *opaque, ReleaseHandler *release);
+void physmap_get(PhysMap *map);
+void physmap_put(PhysMap *map);
+PhysMap *cur_map_get(void);
+PhysMap *alloc_next_map(void);
+void cur_map_update(PhysMap *next);
+void physmap_init(void);
 #endif
 
 #endif
diff --git a/vl.c b/vl.c
index 1329c30..12af523 100644
--- a/vl.c
+++ b/vl.c
@@ -3346,6 +3346,7 @@ int main(int argc, char **argv, char **envp)
     if (ram_size == 0) {
         ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
     }
+    physmap_init();
 
     configure_accelerator();
 
diff --git a/xen-all.c b/xen-all.c
index 59f2323..41d82fd 100644
--- a/xen-all.c
+++ b/xen-all.c
@@ -452,7 +452,7 @@ static void xen_set_memory(struct MemoryListener *listener,
     }
 }
 
-static void xen_begin(MemoryListener *listener)
+static void xen_begin(MemoryListener *listener, PhysMap *next)
 {
 }
 
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [Qemu-devel] [PATCH 09/15] memory: prepare flatview and radix-tree for rcu style access
@ 2012-08-08  6:25   ` Liu Ping Fan
  0 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemulist, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Flatview and radix view are all under the protection of pointer.
And this make sure the change of them seem to be atomic!

The mr accessed by radix-tree leaf or flatview will be reclaimed
after the prev PhysMap not in use any longer

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 exec.c      |  303 +++++++++++++++++++++++++++++++++++++++-------------------
 hw/vhost.c  |    2 +-
 hw/xen_pt.c |    2 +-
 kvm-all.c   |    2 +-
 memory.c    |   92 ++++++++++++++-----
 memory.h    |    9 ++-
 vl.c        |    1 +
 xen-all.c   |    2 +-
 8 files changed, 286 insertions(+), 127 deletions(-)

diff --git a/exec.c b/exec.c
index 01b91b0..97addb9 100644
--- a/exec.c
+++ b/exec.c
@@ -24,6 +24,7 @@
 #include <sys/mman.h>
 #endif
 
+#include "qemu/atomic.h"
 #include "qemu-common.h"
 #include "cpu.h"
 #include "tcg.h"
@@ -35,6 +36,8 @@
 #include "qemu-timer.h"
 #include "memory.h"
 #include "exec-memory.h"
+#include "qemu-thread.h"
+#include "qemu/reclaimer.h"
 #if defined(CONFIG_USER_ONLY)
 #include <qemu.h>
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
@@ -184,25 +187,17 @@ static void *l1_map[V_L1_SIZE];
 
 #if !defined(CONFIG_USER_ONLY)
 
-static MemoryRegionSection *phys_sections;
-static unsigned phys_sections_nb, phys_sections_nb_alloc;
 static uint16_t phys_section_unassigned;
 static uint16_t phys_section_notdirty;
 static uint16_t phys_section_rom;
 static uint16_t phys_section_watch;
 
-
-/* Simple allocator for PhysPageEntry nodes */
-static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
-static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
-
 #define PHYS_MAP_NODE_NIL (((uint16_t)~0) >> 1)
 
-/* This is a multi-level map on the physical address space.
-   The bottom level has pointers to MemoryRegionSections.  */
-static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
-
+static QemuMutex cur_map_lock;
+static PhysMap *cur_map;
 QemuMutex mem_map_lock;
+static PhysMap *next_map;
 
 static void io_mem_init(void);
 static void memory_map_init(void);
@@ -383,41 +378,38 @@ static inline PageDesc *page_find(tb_page_addr_t index)
 
 #if !defined(CONFIG_USER_ONLY)
 
-static void phys_map_node_reserve(unsigned nodes)
+static void phys_map_node_reserve(PhysMap *map, unsigned nodes)
 {
-    if (phys_map_nodes_nb + nodes > phys_map_nodes_nb_alloc) {
+    if (map->phys_map_nodes_nb + nodes > map->phys_map_nodes_nb_alloc) {
         typedef PhysPageEntry Node[L2_SIZE];
-        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc * 2, 16);
-        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc,
-                                      phys_map_nodes_nb + nodes);
-        phys_map_nodes = g_renew(Node, phys_map_nodes,
-                                 phys_map_nodes_nb_alloc);
+        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc * 2,
+                                                                        16);
+        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc,
+                                      map->phys_map_nodes_nb + nodes);
+        map->phys_map_nodes = g_renew(Node, map->phys_map_nodes,
+                                 map->phys_map_nodes_nb_alloc);
     }
 }
 
-static uint16_t phys_map_node_alloc(void)
+static uint16_t phys_map_node_alloc(PhysMap *map)
 {
     unsigned i;
     uint16_t ret;
 
-    ret = phys_map_nodes_nb++;
+    ret = map->phys_map_nodes_nb++;
     assert(ret != PHYS_MAP_NODE_NIL);
-    assert(ret != phys_map_nodes_nb_alloc);
+    assert(ret != map->phys_map_nodes_nb_alloc);
     for (i = 0; i < L2_SIZE; ++i) {
-        phys_map_nodes[ret][i].is_leaf = 0;
-        phys_map_nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
+        map->phys_map_nodes[ret][i].is_leaf = 0;
+        map->phys_map_nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
     }
     return ret;
 }
 
-static void phys_map_nodes_reset(void)
-{
-    phys_map_nodes_nb = 0;
-}
-
-
-static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
-                                target_phys_addr_t *nb, uint16_t leaf,
+static void phys_page_set_level(PhysMap *map, PhysPageEntry *lp,
+                                target_phys_addr_t *index,
+                                target_phys_addr_t *nb,
+                                uint16_t leaf,
                                 int level)
 {
     PhysPageEntry *p;
@@ -425,8 +417,8 @@ static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
     target_phys_addr_t step = (target_phys_addr_t)1 << (level * L2_BITS);
 
     if (!lp->is_leaf && lp->ptr == PHYS_MAP_NODE_NIL) {
-        lp->ptr = phys_map_node_alloc();
-        p = phys_map_nodes[lp->ptr];
+        lp->ptr = phys_map_node_alloc(map);
+        p = map->phys_map_nodes[lp->ptr];
         if (level == 0) {
             for (i = 0; i < L2_SIZE; i++) {
                 p[i].is_leaf = 1;
@@ -434,7 +426,7 @@ static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
             }
         }
     } else {
-        p = phys_map_nodes[lp->ptr];
+        p = map->phys_map_nodes[lp->ptr];
     }
     lp = &p[(*index >> (level * L2_BITS)) & (L2_SIZE - 1)];
 
@@ -445,24 +437,27 @@ static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
             *index += step;
             *nb -= step;
         } else {
-            phys_page_set_level(lp, index, nb, leaf, level - 1);
+            phys_page_set_level(map, lp, index, nb, leaf, level - 1);
         }
         ++lp;
     }
 }
 
-static void phys_page_set(target_phys_addr_t index, target_phys_addr_t nb,
-                          uint16_t leaf)
+static void phys_page_set(PhysMap *map, target_phys_addr_t index,
+                            target_phys_addr_t nb,
+                            uint16_t leaf)
 {
     /* Wildly overreserve - it doesn't matter much. */
-    phys_map_node_reserve(3 * P_L2_LEVELS);
+    phys_map_node_reserve(map, 3 * P_L2_LEVELS);
 
-    phys_page_set_level(&phys_map, &index, &nb, leaf, P_L2_LEVELS - 1);
+    /* update in new tree*/
+    phys_page_set_level(map, &map->root, &index, &nb, leaf, P_L2_LEVELS - 1);
 }
 
-MemoryRegionSection *phys_page_find(target_phys_addr_t index)
+static MemoryRegionSection *phys_page_find_internal(PhysMap *map,
+                           target_phys_addr_t index)
 {
-    PhysPageEntry lp = phys_map;
+    PhysPageEntry lp = map->root;
     PhysPageEntry *p;
     int i;
     uint16_t s_index = phys_section_unassigned;
@@ -471,13 +466,79 @@ MemoryRegionSection *phys_page_find(target_phys_addr_t index)
         if (lp.ptr == PHYS_MAP_NODE_NIL) {
             goto not_found;
         }
-        p = phys_map_nodes[lp.ptr];
+        p = map->phys_map_nodes[lp.ptr];
         lp = p[(index >> (i * L2_BITS)) & (L2_SIZE - 1)];
     }
 
     s_index = lp.ptr;
 not_found:
-    return &phys_sections[s_index];
+    return &map->phys_sections[s_index];
+}
+
+MemoryRegionSection *phys_page_find(target_phys_addr_t index)
+{
+    return phys_page_find_internal(cur_map, index);
+}
+
+void physmap_get(PhysMap *map)
+{
+    atomic_inc(&map->ref);
+}
+
+/* Untill rcu read side finished, do this reclaim */
+static ChunkHead physmap_reclaimer_list = { .lh_first = NULL };
+void physmap_reclaimer_enqueue(void *opaque, ReleaseHandler *release)
+{
+    reclaimer_enqueue(&physmap_reclaimer_list, opaque, release);
+}
+
+static void destroy_all_mappings(PhysMap *map);
+static void phys_map_release(PhysMap *map)
+{
+    /* emulate for rcu reclaimer for mr */
+    reclaimer_worker(&physmap_reclaimer_list);
+
+    destroy_all_mappings(map);
+    g_free(map->phys_map_nodes);
+    g_free(map->phys_sections);
+    g_free(map->views[0].ranges);
+    g_free(map->views[1].ranges);
+    g_free(map);
+}
+
+void physmap_put(PhysMap *map)
+{
+    if (atomic_dec_and_test(&map->ref)) {
+        phys_map_release(map);
+    }
+}
+
+void cur_map_update(PhysMap *next)
+{
+    qemu_mutex_lock(&cur_map_lock);
+    physmap_put(cur_map);
+    cur_map = next;
+    smp_mb();
+    qemu_mutex_unlock(&cur_map_lock);
+}
+
+PhysMap *cur_map_get(void)
+{
+    PhysMap *ret;
+
+    qemu_mutex_lock(&cur_map_lock);
+    ret = cur_map;
+    physmap_get(ret);
+    smp_mb();
+    qemu_mutex_unlock(&cur_map_lock);
+    return ret;
+}
+
+PhysMap *alloc_next_map(void)
+{
+    PhysMap *next = g_malloc0(sizeof(PhysMap));
+    atomic_set(&next->ref, 1);
+    return next;
 }
 
 bool memory_region_is_unassigned(MemoryRegion *mr)
@@ -632,6 +693,7 @@ void cpu_exec_init_all(void)
     memory_map_init();
     io_mem_init();
     qemu_mutex_init(&mem_map_lock);
+    qemu_mutex_init(&cur_map_lock);
 #endif
 }
 
@@ -2161,17 +2223,18 @@ int page_unprotect(target_ulong address, uintptr_t pc, void *puc)
 
 #define SUBPAGE_IDX(addr) ((addr) & ~TARGET_PAGE_MASK)
 typedef struct subpage_t {
+    PhysMap *map;
     MemoryRegion iomem;
     target_phys_addr_t base;
     uint16_t sub_section[TARGET_PAGE_SIZE];
 } subpage_t;
 
-static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
-                             uint16_t section);
-static subpage_t *subpage_init(target_phys_addr_t base);
-static void destroy_page_desc(uint16_t section_index)
+static int subpage_register(PhysMap *map, subpage_t *mmio, uint32_t start,
+                            uint32_t end, uint16_t section);
+static subpage_t *subpage_init(PhysMap *map, target_phys_addr_t base);
+static void destroy_page_desc(PhysMap *map, uint16_t section_index)
 {
-    MemoryRegionSection *section = &phys_sections[section_index];
+    MemoryRegionSection *section = &map->phys_sections[section_index];
     MemoryRegion *mr = section->mr;
 
     if (mr->subpage) {
@@ -2181,7 +2244,7 @@ static void destroy_page_desc(uint16_t section_index)
     }
 }
 
-static void destroy_l2_mapping(PhysPageEntry *lp, unsigned level)
+static void destroy_l2_mapping(PhysMap *map, PhysPageEntry *lp, unsigned level)
 {
     unsigned i;
     PhysPageEntry *p;
@@ -2190,38 +2253,34 @@ static void destroy_l2_mapping(PhysPageEntry *lp, unsigned level)
         return;
     }
 
-    p = phys_map_nodes[lp->ptr];
+    p = map->phys_map_nodes[lp->ptr];
     for (i = 0; i < L2_SIZE; ++i) {
         if (!p[i].is_leaf) {
-            destroy_l2_mapping(&p[i], level - 1);
+            destroy_l2_mapping(map, &p[i], level - 1);
         } else {
-            destroy_page_desc(p[i].ptr);
+            destroy_page_desc(map, p[i].ptr);
         }
     }
     lp->is_leaf = 0;
     lp->ptr = PHYS_MAP_NODE_NIL;
 }
 
-static void destroy_all_mappings(void)
+static void destroy_all_mappings(PhysMap *map)
 {
-    destroy_l2_mapping(&phys_map, P_L2_LEVELS - 1);
-    phys_map_nodes_reset();
-}
+    PhysPageEntry *root = &map->root;
 
-static uint16_t phys_section_add(MemoryRegionSection *section)
-{
-    if (phys_sections_nb == phys_sections_nb_alloc) {
-        phys_sections_nb_alloc = MAX(phys_sections_nb_alloc * 2, 16);
-        phys_sections = g_renew(MemoryRegionSection, phys_sections,
-                                phys_sections_nb_alloc);
-    }
-    phys_sections[phys_sections_nb] = *section;
-    return phys_sections_nb++;
+    destroy_l2_mapping(map, root, P_L2_LEVELS - 1);
 }
 
-static void phys_sections_clear(void)
+static uint16_t phys_section_add(PhysMap *map, MemoryRegionSection *section)
 {
-    phys_sections_nb = 0;
+    if (map->phys_sections_nb == map->phys_sections_nb_alloc) {
+        map->phys_sections_nb_alloc = MAX(map->phys_sections_nb_alloc * 2, 16);
+        map->phys_sections = g_renew(MemoryRegionSection, map->phys_sections,
+                                map->phys_sections_nb_alloc);
+    }
+    map->phys_sections[map->phys_sections_nb] = *section;
+    return map->phys_sections_nb++;
 }
 
 /* register physical memory.
@@ -2232,12 +2291,13 @@ static void phys_sections_clear(void)
    start_addr and region_offset are rounded down to a page boundary
    before calculating this offset.  This should not be a problem unless
    the low bits of start_addr and region_offset differ.  */
-static void register_subpage(MemoryRegionSection *section)
+static void register_subpage(PhysMap *map, MemoryRegionSection *section)
 {
     subpage_t *subpage;
     target_phys_addr_t base = section->offset_within_address_space
         & TARGET_PAGE_MASK;
-    MemoryRegionSection *existing = phys_page_find(base >> TARGET_PAGE_BITS);
+    MemoryRegionSection *existing = phys_page_find_internal(map,
+                                            base >> TARGET_PAGE_BITS);
     MemoryRegionSection subsection = {
         .offset_within_address_space = base,
         .size = TARGET_PAGE_SIZE,
@@ -2247,30 +2307,30 @@ static void register_subpage(MemoryRegionSection *section)
     assert(existing->mr->subpage || existing->mr == &io_mem_unassigned);
 
     if (!(existing->mr->subpage)) {
-        subpage = subpage_init(base);
+        subpage = subpage_init(map, base);
         subsection.mr = &subpage->iomem;
-        phys_page_set(base >> TARGET_PAGE_BITS, 1,
-                      phys_section_add(&subsection));
+        phys_page_set(map, base >> TARGET_PAGE_BITS, 1,
+                      phys_section_add(map, &subsection));
     } else {
         subpage = container_of(existing->mr, subpage_t, iomem);
     }
     start = section->offset_within_address_space & ~TARGET_PAGE_MASK;
     end = start + section->size;
-    subpage_register(subpage, start, end, phys_section_add(section));
+    subpage_register(map, subpage, start, end, phys_section_add(map, section));
 }
 
 
-static void register_multipage(MemoryRegionSection *section)
+static void register_multipage(PhysMap *map, MemoryRegionSection *section)
 {
     target_phys_addr_t start_addr = section->offset_within_address_space;
     ram_addr_t size = section->size;
     target_phys_addr_t addr;
-    uint16_t section_index = phys_section_add(section);
+    uint16_t section_index = phys_section_add(map, section);
 
     assert(size);
 
     addr = start_addr;
-    phys_page_set(addr >> TARGET_PAGE_BITS, size >> TARGET_PAGE_BITS,
+    phys_page_set(map, addr >> TARGET_PAGE_BITS, size >> TARGET_PAGE_BITS,
                   section_index);
 }
 
@@ -2278,13 +2338,14 @@ void cpu_register_physical_memory_log(MemoryRegionSection *section,
                                       bool readonly)
 {
     MemoryRegionSection now = *section, remain = *section;
+    PhysMap *map = next_map;
 
     if ((now.offset_within_address_space & ~TARGET_PAGE_MASK)
         || (now.size < TARGET_PAGE_SIZE)) {
         now.size = MIN(TARGET_PAGE_ALIGN(now.offset_within_address_space)
                        - now.offset_within_address_space,
                        now.size);
-        register_subpage(&now);
+        register_subpage(map, &now);
         remain.size -= now.size;
         remain.offset_within_address_space += now.size;
         remain.offset_within_region += now.size;
@@ -2292,14 +2353,14 @@ void cpu_register_physical_memory_log(MemoryRegionSection *section,
     now = remain;
     now.size &= TARGET_PAGE_MASK;
     if (now.size) {
-        register_multipage(&now);
+        register_multipage(map, &now);
         remain.size -= now.size;
         remain.offset_within_address_space += now.size;
         remain.offset_within_region += now.size;
     }
     now = remain;
     if (now.size) {
-        register_subpage(&now);
+        register_subpage(map, &now);
     }
 }
 
@@ -3001,7 +3062,7 @@ static uint64_t subpage_read(void *opaque, target_phys_addr_t addr,
            mmio, len, addr, idx);
 #endif
 
-    section = &phys_sections[mmio->sub_section[idx]];
+    section = &mmio->map->phys_sections[mmio->sub_section[idx]];
     addr += mmio->base;
     addr -= section->offset_within_address_space;
     addr += section->offset_within_region;
@@ -3020,7 +3081,7 @@ static void subpage_write(void *opaque, target_phys_addr_t addr,
            __func__, mmio, len, addr, idx, value);
 #endif
 
-    section = &phys_sections[mmio->sub_section[idx]];
+    section = &mmio->map->phys_sections[mmio->sub_section[idx]];
     addr += mmio->base;
     addr -= section->offset_within_address_space;
     addr += section->offset_within_region;
@@ -3065,8 +3126,8 @@ static const MemoryRegionOps subpage_ram_ops = {
     .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
-static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
-                             uint16_t section)
+static int subpage_register(PhysMap *map, subpage_t *mmio, uint32_t start,
+                              uint32_t end, uint16_t section)
 {
     int idx, eidx;
 
@@ -3078,10 +3139,10 @@ static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
     printf("%s: %p start %08x end %08x idx %08x eidx %08x mem %ld\n", __func__,
            mmio, start, end, idx, eidx, memory);
 #endif
-    if (memory_region_is_ram(phys_sections[section].mr)) {
-        MemoryRegionSection new_section = phys_sections[section];
+    if (memory_region_is_ram(map->phys_sections[section].mr)) {
+        MemoryRegionSection new_section = map->phys_sections[section];
         new_section.mr = &io_mem_subpage_ram;
-        section = phys_section_add(&new_section);
+        section = phys_section_add(map, &new_section);
     }
     for (; idx <= eidx; idx++) {
         mmio->sub_section[idx] = section;
@@ -3090,12 +3151,13 @@ static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
     return 0;
 }
 
-static subpage_t *subpage_init(target_phys_addr_t base)
+static subpage_t *subpage_init(PhysMap *map, target_phys_addr_t base)
 {
     subpage_t *mmio;
 
     mmio = g_malloc0(sizeof(subpage_t));
 
+    mmio->map = map;
     mmio->base = base;
     memory_region_init_io(&mmio->iomem, &subpage_ops, mmio,
                           "subpage", TARGET_PAGE_SIZE);
@@ -3104,12 +3166,12 @@ static subpage_t *subpage_init(target_phys_addr_t base)
     printf("%s: %p base " TARGET_FMT_plx " len %08x %d\n", __func__,
            mmio, base, TARGET_PAGE_SIZE, subpage_memory);
 #endif
-    subpage_register(mmio, 0, TARGET_PAGE_SIZE-1, phys_section_unassigned);
+    subpage_register(map, mmio, 0, TARGET_PAGE_SIZE-1, phys_section_unassigned);
 
     return mmio;
 }
 
-static uint16_t dummy_section(MemoryRegion *mr)
+static uint16_t dummy_section(PhysMap *map, MemoryRegion *mr)
 {
     MemoryRegionSection section = {
         .mr = mr,
@@ -3118,7 +3180,7 @@ static uint16_t dummy_section(MemoryRegion *mr)
         .size = UINT64_MAX,
     };
 
-    return phys_section_add(&section);
+    return phys_section_add(map, &section);
 }
 
 MemoryRegion *iotlb_to_region(target_phys_addr_t index)
@@ -3140,15 +3202,32 @@ static void io_mem_init(void)
                           "watch", UINT64_MAX);
 }
 
-static void core_begin(MemoryListener *listener)
+#if 0
+static void physmap_init(void)
+{
+    FlatView v = { .ranges = NULL,
+                             .nr = 0,
+                             .nr_allocated = 0,
+    };
+
+    init_map.views[0] = v;
+    init_map.views[1] = v;
+    cur_map =  &init_map;
+}
+#endif
+
+static void core_begin(MemoryListener *listener, PhysMap *new_map)
 {
-    destroy_all_mappings();
-    phys_sections_clear();
-    phys_map.ptr = PHYS_MAP_NODE_NIL;
-    phys_section_unassigned = dummy_section(&io_mem_unassigned);
-    phys_section_notdirty = dummy_section(&io_mem_notdirty);
-    phys_section_rom = dummy_section(&io_mem_rom);
-    phys_section_watch = dummy_section(&io_mem_watch);
+
+    new_map->root.ptr = PHYS_MAP_NODE_NIL;
+    new_map->root.is_leaf = 0;
+
+    /* In all the map, these sections have the same index */
+    phys_section_unassigned = dummy_section(new_map, &io_mem_unassigned);
+    phys_section_notdirty = dummy_section(new_map, &io_mem_notdirty);
+    phys_section_rom = dummy_section(new_map, &io_mem_rom);
+    phys_section_watch = dummy_section(new_map, &io_mem_watch);
+    next_map = new_map;
 }
 
 static void core_commit(MemoryListener *listener)
@@ -3161,6 +3240,16 @@ static void core_commit(MemoryListener *listener)
     for(env = first_cpu; env != NULL; env = env->next_cpu) {
         tlb_flush(env, 1);
     }
+
+/* move into high layer
+    qemu_mutex_lock(&cur_map_lock);
+    if (cur_map != NULL) {
+        physmap_put(cur_map);
+    }
+    cur_map = next_map;
+    smp_mb();
+    qemu_mutex_unlock(&cur_map_lock);
+*/
 }
 
 static void core_region_add(MemoryListener *listener,
@@ -3217,7 +3306,7 @@ static void core_eventfd_del(MemoryListener *listener,
 {
 }
 
-static void io_begin(MemoryListener *listener)
+static void io_begin(MemoryListener *listener, PhysMap *next)
 {
 }
 
@@ -3329,6 +3418,20 @@ static void memory_map_init(void)
     memory_listener_register(&io_memory_listener, system_io);
 }
 
+void physmap_init(void)
+{
+    FlatView v = { .ranges = NULL, .nr = 0, .nr_allocated = 0,
+                           };
+    PhysMap *init_map = g_malloc0(sizeof(PhysMap));
+
+    atomic_set(&init_map->ref, 1);
+    init_map->root.ptr = PHYS_MAP_NODE_NIL;
+    init_map->root.is_leaf = 0;
+    init_map->views[0] = v;
+    init_map->views[1] = v;
+    cur_map = init_map;
+}
+
 MemoryRegion *get_system_memory(void)
 {
     return system_memory;
@@ -3391,6 +3494,7 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
     uint32_t val;
     target_phys_addr_t page;
     MemoryRegionSection *section;
+    PhysMap *cur = cur_map_get();
 
     while (len > 0) {
         page = addr & TARGET_PAGE_MASK;
@@ -3472,6 +3576,7 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
         buf += l;
         addr += l;
     }
+    physmap_put(cur);
 }
 
 /* used for ROM loading : can write in RAM and ROM */
diff --git a/hw/vhost.c b/hw/vhost.c
index 43664e7..df58345 100644
--- a/hw/vhost.c
+++ b/hw/vhost.c
@@ -438,7 +438,7 @@ static bool vhost_section(MemoryRegionSection *section)
         && memory_region_is_ram(section->mr);
 }
 
-static void vhost_begin(MemoryListener *listener)
+static void vhost_begin(MemoryListener *listener, PhysMap *next)
 {
 }
 
diff --git a/hw/xen_pt.c b/hw/xen_pt.c
index 3b6d186..fba8586 100644
--- a/hw/xen_pt.c
+++ b/hw/xen_pt.c
@@ -597,7 +597,7 @@ static void xen_pt_region_update(XenPCIPassthroughState *s,
     }
 }
 
-static void xen_pt_begin(MemoryListener *l)
+static void xen_pt_begin(MemoryListener *l, PhysMap *next)
 {
 }
 
diff --git a/kvm-all.c b/kvm-all.c
index f8e4328..bc42cab 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -693,7 +693,7 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, bool add)
     }
 }
 
-static void kvm_begin(MemoryListener *listener)
+static void kvm_begin(MemoryListener *listener, PhysMap *next)
 {
 }
 
diff --git a/memory.c b/memory.c
index c7f2cfd..54cdc7f 100644
--- a/memory.c
+++ b/memory.c
@@ -20,6 +20,7 @@
 #include "kvm.h"
 #include <assert.h>
 #include "hw/qdev.h"
+#include "qemu-thread.h"
 
 #define WANT_EXEC_OBSOLETE
 #include "exec-obsolete.h"
@@ -192,7 +193,7 @@ typedef struct AddressSpaceOps AddressSpaceOps;
 /* A system address space - I/O, memory, etc. */
 struct AddressSpace {
     MemoryRegion *root;
-    FlatView current_map;
+    int view_id;
     int ioeventfd_nb;
     MemoryRegionIoeventfd *ioeventfds;
 };
@@ -232,11 +233,6 @@ static void flatview_insert(FlatView *view, unsigned pos, FlatRange *range)
     ++view->nr;
 }
 
-static void flatview_destroy(FlatView *view)
-{
-    g_free(view->ranges);
-}
-
 static bool can_merge(FlatRange *r1, FlatRange *r2)
 {
     return int128_eq(addrrange_end(r1->addr), r2->addr.start)
@@ -594,8 +590,10 @@ static void address_space_update_ioeventfds(AddressSpace *as)
     MemoryRegionIoeventfd *ioeventfds = NULL;
     AddrRange tmp;
     unsigned i;
+    PhysMap *map = cur_map_get();
+    FlatView *view = &map->views[as->view_id];
 
-    FOR_EACH_FLAT_RANGE(fr, &as->current_map) {
+    FOR_EACH_FLAT_RANGE(fr, view) {
         for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
             tmp = addrrange_shift(fr->mr->ioeventfds[i].addr,
                                   int128_sub(fr->addr.start,
@@ -616,6 +614,7 @@ static void address_space_update_ioeventfds(AddressSpace *as)
     g_free(as->ioeventfds);
     as->ioeventfds = ioeventfds;
     as->ioeventfd_nb = ioeventfd_nb;
+    physmap_put(map);
 }
 
 static void address_space_update_topology_pass(AddressSpace *as,
@@ -681,21 +680,23 @@ static void address_space_update_topology_pass(AddressSpace *as,
 }
 
 
-static void address_space_update_topology(AddressSpace *as)
+static void address_space_update_topology(AddressSpace *as, PhysMap *prev,
+                                            PhysMap *next)
 {
-    FlatView old_view = as->current_map;
+    FlatView old_view = prev->views[as->view_id];
     FlatView new_view = generate_memory_topology(as->root);
 
     address_space_update_topology_pass(as, old_view, new_view, false);
     address_space_update_topology_pass(as, old_view, new_view, true);
+    next->views[as->view_id] = new_view;
 
-    as->current_map = new_view;
-    flatview_destroy(&old_view);
     address_space_update_ioeventfds(as);
 }
 
 static void memory_region_update_topology(MemoryRegion *mr)
 {
+    PhysMap *prev, *next;
+
     if (memory_region_transaction_depth) {
         memory_region_update_pending |= !mr || mr->enabled;
         return;
@@ -705,16 +706,20 @@ static void memory_region_update_topology(MemoryRegion *mr)
         return;
     }
 
-    MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
+     prev = cur_map_get();
+    /* allocate PhysMap next here */
+    next = alloc_next_map();
+    MEMORY_LISTENER_CALL_GLOBAL(begin, Forward, next);
 
     if (address_space_memory.root) {
-        address_space_update_topology(&address_space_memory);
+        address_space_update_topology(&address_space_memory, prev, next);
     }
     if (address_space_io.root) {
-        address_space_update_topology(&address_space_io);
+        address_space_update_topology(&address_space_io, prev, next);
     }
 
     MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
+    cur_map_update(next);
 
     memory_region_update_pending = false;
 }
@@ -1071,7 +1076,7 @@ void memory_region_put(MemoryRegion *mr)
 
     if (atomic_dec_and_test(&mr->ref)) {
         /* to fix, using call_rcu( ,release) */
-        mr->life_ops->put(mr);
+        physmap_reclaimer_enqueue(mr, (ReleaseHandler *)mr->life_ops->put);
     }
 }
 
@@ -1147,13 +1152,18 @@ void memory_region_set_dirty(MemoryRegion *mr, target_phys_addr_t addr,
 void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
 {
     FlatRange *fr;
+    FlatView *fview;
+    PhysMap *map;
 
-    FOR_EACH_FLAT_RANGE(fr, &address_space_memory.current_map) {
+    map = cur_map_get();
+    fview = &map->views[address_space_memory.view_id];
+    FOR_EACH_FLAT_RANGE(fr, fview) {
         if (fr->mr == mr) {
             MEMORY_LISTENER_UPDATE_REGION(fr, &address_space_memory,
                                           Forward, log_sync);
         }
     }
+    physmap_put(map);
 }
 
 void memory_region_set_readonly(MemoryRegion *mr, bool readonly)
@@ -1201,8 +1211,12 @@ static void memory_region_update_coalesced_range(MemoryRegion *mr)
     FlatRange *fr;
     CoalescedMemoryRange *cmr;
     AddrRange tmp;
+    FlatView *fview;
+    PhysMap *map;
 
-    FOR_EACH_FLAT_RANGE(fr, &address_space_memory.current_map) {
+    map = cur_map_get();
+    fview = &map->views[address_space_memory.view_id];
+    FOR_EACH_FLAT_RANGE(fr, fview) {
         if (fr->mr == mr) {
             qemu_unregister_coalesced_mmio(int128_get64(fr->addr.start),
                                            int128_get64(fr->addr.size));
@@ -1219,6 +1233,7 @@ static void memory_region_update_coalesced_range(MemoryRegion *mr)
             }
         }
     }
+    physmap_put(map);
 }
 
 void memory_region_set_coalescing(MemoryRegion *mr)
@@ -1458,29 +1473,49 @@ static int cmp_flatrange_addr(const void *addr_, const void *fr_)
     return 0;
 }
 
-static FlatRange *address_space_lookup(AddressSpace *as, AddrRange addr)
+static FlatRange *address_space_lookup(FlatView *view, AddrRange addr)
 {
-    return bsearch(&addr, as->current_map.ranges, as->current_map.nr,
+    return bsearch(&addr, view->ranges, view->nr,
                    sizeof(FlatRange), cmp_flatrange_addr);
 }
 
+/* dec the ref, which inc by memory_region_find*/
+void memory_region_section_put(MemoryRegionSection *mrs)
+{
+    if (mrs->mr != NULL) {
+        memory_region_put(mrs->mr);
+    }
+}
+
+/* inc mr's ref. Caller need dec mr's ref */
 MemoryRegionSection memory_region_find(MemoryRegion *address_space,
                                        target_phys_addr_t addr, uint64_t size)
 {
+    PhysMap *map;
     AddressSpace *as = memory_region_to_address_space(address_space);
     AddrRange range = addrrange_make(int128_make64(addr),
                                      int128_make64(size));
-    FlatRange *fr = address_space_lookup(as, range);
+    FlatView *fview;
+
+    map = cur_map_get();
+
+    fview = &map->views[as->view_id];
+    FlatRange *fr = address_space_lookup(fview, range);
     MemoryRegionSection ret = { .mr = NULL, .size = 0 };
 
     if (!fr) {
+        physmap_put(map);
         return ret;
     }
 
-    while (fr > as->current_map.ranges
+    while (fr > fview->ranges
            && addrrange_intersects(fr[-1].addr, range)) {
         --fr;
     }
+    /* To fix, the caller must in rcu, or we must inc fr->mr->ref here
+     */
+    memory_region_get(fr->mr);
+    physmap_put(map);
 
     ret.mr = fr->mr;
     range = addrrange_intersection(range, fr->addr);
@@ -1497,10 +1532,13 @@ void memory_global_sync_dirty_bitmap(MemoryRegion *address_space)
 {
     AddressSpace *as = memory_region_to_address_space(address_space);
     FlatRange *fr;
+    PhysMap *map = cur_map_get();
+    FlatView *view = &map->views[as->view_id];
 
-    FOR_EACH_FLAT_RANGE(fr, &as->current_map) {
+    FOR_EACH_FLAT_RANGE(fr, view) {
         MEMORY_LISTENER_UPDATE_REGION(fr, as, Forward, log_sync);
     }
+    physmap_put(map);
 }
 
 void memory_global_dirty_log_start(void)
@@ -1519,6 +1557,8 @@ static void listener_add_address_space(MemoryListener *listener,
                                        AddressSpace *as)
 {
     FlatRange *fr;
+    PhysMap *map;
+    FlatView *view;
 
     if (listener->address_space_filter
         && listener->address_space_filter != as->root) {
@@ -1528,7 +1568,10 @@ static void listener_add_address_space(MemoryListener *listener,
     if (global_dirty_log) {
         listener->log_global_start(listener);
     }
-    FOR_EACH_FLAT_RANGE(fr, &as->current_map) {
+
+    map = cur_map_get();
+    view = &map->views[as->view_id];
+    FOR_EACH_FLAT_RANGE(fr, view) {
         MemoryRegionSection section = {
             .mr = fr->mr,
             .address_space = as->root,
@@ -1539,6 +1582,7 @@ static void listener_add_address_space(MemoryListener *listener,
         };
         listener->region_add(listener, &section);
     }
+    physmap_put(map);
 }
 
 void memory_listener_register(MemoryListener *listener, MemoryRegion *filter)
@@ -1570,12 +1614,14 @@ void memory_listener_unregister(MemoryListener *listener)
 void set_system_memory_map(MemoryRegion *mr)
 {
     address_space_memory.root = mr;
+    address_space_memory.view_id = 0;
     memory_region_update_topology(NULL);
 }
 
 void set_system_io_map(MemoryRegion *mr)
 {
     address_space_io.root = mr;
+    address_space_io.view_id = 1;
     memory_region_update_topology(NULL);
 }
 
diff --git a/memory.h b/memory.h
index 357edd8..18442d4 100644
--- a/memory.h
+++ b/memory.h
@@ -256,7 +256,7 @@ typedef struct MemoryListener MemoryListener;
  * Use with memory_listener_register() and memory_listener_unregister().
  */
 struct MemoryListener {
-    void (*begin)(MemoryListener *listener);
+    void (*begin)(MemoryListener *listener, PhysMap *next);
     void (*commit)(MemoryListener *listener);
     void (*region_add)(MemoryListener *listener, MemoryRegionSection *section);
     void (*region_del)(MemoryListener *listener, MemoryRegionSection *section);
@@ -829,6 +829,13 @@ void mtree_info(fprintf_function mon_printf, void *f);
 
 void memory_region_get(MemoryRegion *mr);
 void memory_region_put(MemoryRegion *mr);
+void physmap_reclaimer_enqueue(void *opaque, ReleaseHandler *release);
+void physmap_get(PhysMap *map);
+void physmap_put(PhysMap *map);
+PhysMap *cur_map_get(void);
+PhysMap *alloc_next_map(void);
+void cur_map_update(PhysMap *next);
+void physmap_init(void);
 #endif
 
 #endif
diff --git a/vl.c b/vl.c
index 1329c30..12af523 100644
--- a/vl.c
+++ b/vl.c
@@ -3346,6 +3346,7 @@ int main(int argc, char **argv, char **envp)
     if (ram_size == 0) {
         ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
     }
+    physmap_init();
 
     configure_accelerator();
 
diff --git a/xen-all.c b/xen-all.c
index 59f2323..41d82fd 100644
--- a/xen-all.c
+++ b/xen-all.c
@@ -452,7 +452,7 @@ static void xen_set_memory(struct MemoryListener *listener,
     }
 }
 
-static void xen_begin(MemoryListener *listener)
+static void xen_begin(MemoryListener *listener, PhysMap *next)
 {
 }
 
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [PATCH 10/15] memory: change tcg related code to using PhysMap
  2012-08-08  6:25 ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  6:25   ` Liu Ping Fan
  -1 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Anthony Liguori, Avi Kivity, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber,
	qemulist

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Change tcg code to use PhysMap.
This is separated from the prev patch for review purpose. Should be
merged into prev one.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 exec.c |   27 +++++++++++++++++++++------
 1 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index 97addb9..8d0dea5 100644
--- a/exec.c
+++ b/exec.c
@@ -1923,6 +1923,7 @@ target_phys_addr_t memory_region_section_get_iotlb(CPUArchState *env,
 {
     target_phys_addr_t iotlb;
     CPUWatchpoint *wp;
+    PhysMap *map = cur_map_get();
 
     if (memory_region_is_ram(section->mr)) {
         /* Normal RAM.  */
@@ -1940,7 +1941,7 @@ target_phys_addr_t memory_region_section_get_iotlb(CPUArchState *env,
            and avoid full address decoding in every device.
            We can't use the high bits of pd for this because
            IO_MEM_ROMD uses these as a ram address.  */
-        iotlb = section - phys_sections;
+        iotlb = section - map->phys_sections;
         iotlb += memory_region_section_addr(section, paddr);
     }
 
@@ -1956,6 +1957,7 @@ target_phys_addr_t memory_region_section_get_iotlb(CPUArchState *env,
             }
         }
     }
+    physmap_put(map);
 
     return iotlb;
 }
@@ -3185,7 +3187,12 @@ static uint16_t dummy_section(PhysMap *map, MemoryRegion *mr)
 
 MemoryRegion *iotlb_to_region(target_phys_addr_t index)
 {
-    return phys_sections[index & ~TARGET_PAGE_MASK].mr;
+    MemoryRegion *ret;
+    PhysMap *map = cur_map_get();
+
+    ret = map->phys_sections[index & ~TARGET_PAGE_MASK].mr;
+    physmap_put(map);
+    return ret;
 }
 
 static void io_mem_init(void)
@@ -3946,13 +3953,14 @@ void stl_phys_notdirty(target_phys_addr_t addr, uint32_t val)
 {
     uint8_t *ptr;
     MemoryRegionSection *section;
+    PhysMap *map = cur_map_get();
 
     section = phys_page_find(addr >> TARGET_PAGE_BITS);
 
     if (!memory_region_is_ram(section->mr) || section->readonly) {
         addr = memory_region_section_addr(section, addr);
         if (memory_region_is_ram(section->mr)) {
-            section = &phys_sections[phys_section_rom];
+            section = &map->phys_sections[phys_section_rom];
         }
         io_mem_write(section->mr, addr, val, 4);
     } else {
@@ -3972,19 +3980,21 @@ void stl_phys_notdirty(target_phys_addr_t addr, uint32_t val)
             }
         }
     }
+    physmap_put(map);
 }
 
 void stq_phys_notdirty(target_phys_addr_t addr, uint64_t val)
 {
     uint8_t *ptr;
     MemoryRegionSection *section;
+    PhysMap *map = cur_map_get();
 
     section = phys_page_find(addr >> TARGET_PAGE_BITS);
 
     if (!memory_region_is_ram(section->mr) || section->readonly) {
         addr = memory_region_section_addr(section, addr);
         if (memory_region_is_ram(section->mr)) {
-            section = &phys_sections[phys_section_rom];
+            section = &map->phys_sections[phys_section_rom];
         }
 #ifdef TARGET_WORDS_BIGENDIAN
         io_mem_write(section->mr, addr, val >> 32, 4);
@@ -3999,6 +4009,7 @@ void stq_phys_notdirty(target_phys_addr_t addr, uint64_t val)
                                + memory_region_section_addr(section, addr));
         stq_p(ptr, val);
     }
+    physmap_put(map);
 }
 
 /* warning: addr must be aligned */
@@ -4008,12 +4019,13 @@ static inline void stl_phys_internal(target_phys_addr_t addr, uint32_t val,
     uint8_t *ptr;
     MemoryRegionSection *section;
 
+    PhysMap *map = cur_map_get();
     section = phys_page_find(addr >> TARGET_PAGE_BITS);
 
     if (!memory_region_is_ram(section->mr) || section->readonly) {
         addr = memory_region_section_addr(section, addr);
         if (memory_region_is_ram(section->mr)) {
-            section = &phys_sections[phys_section_rom];
+            section = &map->phys_sections[phys_section_rom];
         }
 #if defined(TARGET_WORDS_BIGENDIAN)
         if (endian == DEVICE_LITTLE_ENDIAN) {
@@ -4050,6 +4062,7 @@ static inline void stl_phys_internal(target_phys_addr_t addr, uint32_t val,
                 (0xff & ~CODE_DIRTY_FLAG));
         }
     }
+    physmap_put(map);
 }
 
 void stl_phys(target_phys_addr_t addr, uint32_t val)
@@ -4081,12 +4094,13 @@ static inline void stw_phys_internal(target_phys_addr_t addr, uint32_t val,
     uint8_t *ptr;
     MemoryRegionSection *section;
 
+    PhysMap *map = cur_map_get();
     section = phys_page_find(addr >> TARGET_PAGE_BITS);
 
     if (!memory_region_is_ram(section->mr) || section->readonly) {
         addr = memory_region_section_addr(section, addr);
         if (memory_region_is_ram(section->mr)) {
-            section = &phys_sections[phys_section_rom];
+            section = &map->phys_sections[phys_section_rom];
         }
 #if defined(TARGET_WORDS_BIGENDIAN)
         if (endian == DEVICE_LITTLE_ENDIAN) {
@@ -4123,6 +4137,7 @@ static inline void stw_phys_internal(target_phys_addr_t addr, uint32_t val,
                 (0xff & ~CODE_DIRTY_FLAG));
         }
     }
+    physmap_put(map);
 }
 
 void stw_phys(target_phys_addr_t addr, uint32_t val)
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [Qemu-devel] [PATCH 10/15] memory: change tcg related code to using PhysMap
@ 2012-08-08  6:25   ` Liu Ping Fan
  0 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemulist, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Change tcg code to use PhysMap.
This is separated from the prev patch for review purpose. Should be
merged into prev one.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 exec.c |   27 +++++++++++++++++++++------
 1 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index 97addb9..8d0dea5 100644
--- a/exec.c
+++ b/exec.c
@@ -1923,6 +1923,7 @@ target_phys_addr_t memory_region_section_get_iotlb(CPUArchState *env,
 {
     target_phys_addr_t iotlb;
     CPUWatchpoint *wp;
+    PhysMap *map = cur_map_get();
 
     if (memory_region_is_ram(section->mr)) {
         /* Normal RAM.  */
@@ -1940,7 +1941,7 @@ target_phys_addr_t memory_region_section_get_iotlb(CPUArchState *env,
            and avoid full address decoding in every device.
            We can't use the high bits of pd for this because
            IO_MEM_ROMD uses these as a ram address.  */
-        iotlb = section - phys_sections;
+        iotlb = section - map->phys_sections;
         iotlb += memory_region_section_addr(section, paddr);
     }
 
@@ -1956,6 +1957,7 @@ target_phys_addr_t memory_region_section_get_iotlb(CPUArchState *env,
             }
         }
     }
+    physmap_put(map);
 
     return iotlb;
 }
@@ -3185,7 +3187,12 @@ static uint16_t dummy_section(PhysMap *map, MemoryRegion *mr)
 
 MemoryRegion *iotlb_to_region(target_phys_addr_t index)
 {
-    return phys_sections[index & ~TARGET_PAGE_MASK].mr;
+    MemoryRegion *ret;
+    PhysMap *map = cur_map_get();
+
+    ret = map->phys_sections[index & ~TARGET_PAGE_MASK].mr;
+    physmap_put(map);
+    return ret;
 }
 
 static void io_mem_init(void)
@@ -3946,13 +3953,14 @@ void stl_phys_notdirty(target_phys_addr_t addr, uint32_t val)
 {
     uint8_t *ptr;
     MemoryRegionSection *section;
+    PhysMap *map = cur_map_get();
 
     section = phys_page_find(addr >> TARGET_PAGE_BITS);
 
     if (!memory_region_is_ram(section->mr) || section->readonly) {
         addr = memory_region_section_addr(section, addr);
         if (memory_region_is_ram(section->mr)) {
-            section = &phys_sections[phys_section_rom];
+            section = &map->phys_sections[phys_section_rom];
         }
         io_mem_write(section->mr, addr, val, 4);
     } else {
@@ -3972,19 +3980,21 @@ void stl_phys_notdirty(target_phys_addr_t addr, uint32_t val)
             }
         }
     }
+    physmap_put(map);
 }
 
 void stq_phys_notdirty(target_phys_addr_t addr, uint64_t val)
 {
     uint8_t *ptr;
     MemoryRegionSection *section;
+    PhysMap *map = cur_map_get();
 
     section = phys_page_find(addr >> TARGET_PAGE_BITS);
 
     if (!memory_region_is_ram(section->mr) || section->readonly) {
         addr = memory_region_section_addr(section, addr);
         if (memory_region_is_ram(section->mr)) {
-            section = &phys_sections[phys_section_rom];
+            section = &map->phys_sections[phys_section_rom];
         }
 #ifdef TARGET_WORDS_BIGENDIAN
         io_mem_write(section->mr, addr, val >> 32, 4);
@@ -3999,6 +4009,7 @@ void stq_phys_notdirty(target_phys_addr_t addr, uint64_t val)
                                + memory_region_section_addr(section, addr));
         stq_p(ptr, val);
     }
+    physmap_put(map);
 }
 
 /* warning: addr must be aligned */
@@ -4008,12 +4019,13 @@ static inline void stl_phys_internal(target_phys_addr_t addr, uint32_t val,
     uint8_t *ptr;
     MemoryRegionSection *section;
 
+    PhysMap *map = cur_map_get();
     section = phys_page_find(addr >> TARGET_PAGE_BITS);
 
     if (!memory_region_is_ram(section->mr) || section->readonly) {
         addr = memory_region_section_addr(section, addr);
         if (memory_region_is_ram(section->mr)) {
-            section = &phys_sections[phys_section_rom];
+            section = &map->phys_sections[phys_section_rom];
         }
 #if defined(TARGET_WORDS_BIGENDIAN)
         if (endian == DEVICE_LITTLE_ENDIAN) {
@@ -4050,6 +4062,7 @@ static inline void stl_phys_internal(target_phys_addr_t addr, uint32_t val,
                 (0xff & ~CODE_DIRTY_FLAG));
         }
     }
+    physmap_put(map);
 }
 
 void stl_phys(target_phys_addr_t addr, uint32_t val)
@@ -4081,12 +4094,13 @@ static inline void stw_phys_internal(target_phys_addr_t addr, uint32_t val,
     uint8_t *ptr;
     MemoryRegionSection *section;
 
+    PhysMap *map = cur_map_get();
     section = phys_page_find(addr >> TARGET_PAGE_BITS);
 
     if (!memory_region_is_ram(section->mr) || section->readonly) {
         addr = memory_region_section_addr(section, addr);
         if (memory_region_is_ram(section->mr)) {
-            section = &phys_sections[phys_section_rom];
+            section = &map->phys_sections[phys_section_rom];
         }
 #if defined(TARGET_WORDS_BIGENDIAN)
         if (endian == DEVICE_LITTLE_ENDIAN) {
@@ -4123,6 +4137,7 @@ static inline void stw_phys_internal(target_phys_addr_t addr, uint32_t val,
                 (0xff & ~CODE_DIRTY_FLAG));
         }
     }
+    physmap_put(map);
 }
 
 void stw_phys(target_phys_addr_t addr, uint32_t val)
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [PATCH 11/15] lock: introduce global lock for device tree
  2012-08-08  6:25 ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  6:25   ` Liu Ping Fan
  -1 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Anthony Liguori, Avi Kivity, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber,
	qemulist

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 cpus.c      |   12 ++++++++++++
 main-loop.h |    3 +++
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/cpus.c b/cpus.c
index b182b3d..a734b36 100644
--- a/cpus.c
+++ b/cpus.c
@@ -611,6 +611,7 @@ static void qemu_tcg_init_cpu_signals(void)
 }
 #endif /* _WIN32 */
 
+QemuMutex qemu_device_tree_mutex;
 QemuMutex qemu_global_mutex;
 static QemuCond qemu_io_proceeded_cond;
 static bool iothread_requesting_mutex;
@@ -634,6 +635,7 @@ void qemu_init_cpu_loop(void)
     qemu_cond_init(&qemu_work_cond);
     qemu_cond_init(&qemu_io_proceeded_cond);
     qemu_mutex_init(&qemu_global_mutex);
+    qemu_mutex_init(&qemu_device_tree_mutex);
 
     qemu_thread_get_self(&io_thread);
 }
@@ -911,6 +913,16 @@ void qemu_mutex_unlock_iothread(void)
     qemu_mutex_unlock(&qemu_global_mutex);
 }
 
+void qemu_lock_devtree(void)
+{
+    qemu_mutex_lock(&qemu_device_tree_mutex);
+}
+
+void qemu_unlock_devtree(void)
+{
+    qemu_mutex_unlock(&qemu_device_tree_mutex);
+}
+
 static int all_vcpus_paused(void)
 {
     CPUArchState *penv = first_cpu;
diff --git a/main-loop.h b/main-loop.h
index dce1cd9..17e959a 100644
--- a/main-loop.h
+++ b/main-loop.h
@@ -353,6 +353,9 @@ void qemu_mutex_lock_iothread(void);
  */
 void qemu_mutex_unlock_iothread(void);
 
+void qemu_lock_devtree(void);
+void qemu_unlock_devtree(void);
+
 /* internal interfaces */
 
 void qemu_fd_register(int fd);
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [Qemu-devel] [PATCH 11/15] lock: introduce global lock for device tree
@ 2012-08-08  6:25   ` Liu Ping Fan
  0 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemulist, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 cpus.c      |   12 ++++++++++++
 main-loop.h |    3 +++
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/cpus.c b/cpus.c
index b182b3d..a734b36 100644
--- a/cpus.c
+++ b/cpus.c
@@ -611,6 +611,7 @@ static void qemu_tcg_init_cpu_signals(void)
 }
 #endif /* _WIN32 */
 
+QemuMutex qemu_device_tree_mutex;
 QemuMutex qemu_global_mutex;
 static QemuCond qemu_io_proceeded_cond;
 static bool iothread_requesting_mutex;
@@ -634,6 +635,7 @@ void qemu_init_cpu_loop(void)
     qemu_cond_init(&qemu_work_cond);
     qemu_cond_init(&qemu_io_proceeded_cond);
     qemu_mutex_init(&qemu_global_mutex);
+    qemu_mutex_init(&qemu_device_tree_mutex);
 
     qemu_thread_get_self(&io_thread);
 }
@@ -911,6 +913,16 @@ void qemu_mutex_unlock_iothread(void)
     qemu_mutex_unlock(&qemu_global_mutex);
 }
 
+void qemu_lock_devtree(void)
+{
+    qemu_mutex_lock(&qemu_device_tree_mutex);
+}
+
+void qemu_unlock_devtree(void)
+{
+    qemu_mutex_unlock(&qemu_device_tree_mutex);
+}
+
 static int all_vcpus_paused(void)
 {
     CPUArchState *penv = first_cpu;
diff --git a/main-loop.h b/main-loop.h
index dce1cd9..17e959a 100644
--- a/main-loop.h
+++ b/main-loop.h
@@ -353,6 +353,9 @@ void qemu_mutex_lock_iothread(void);
  */
 void qemu_mutex_unlock_iothread(void);
 
+void qemu_lock_devtree(void);
+void qemu_unlock_devtree(void);
+
 /* internal interfaces */
 
 void qemu_fd_register(int fd);
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [PATCH 12/15] qdev: using devtree lock to protect device's accessing
  2012-08-08  6:25 ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  6:25   ` Liu Ping Fan
  -1 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Anthony Liguori, Avi Kivity, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber,
	qemulist

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

lock:
  qemu_device_tree_mutex

competitors:
  --device_del(destruction of device will be postphoned until unplug
    ack from guest),
  --pci hot-unplug
  --iteration (qdev_reset_all)
  --device_add

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/pci-hotplug.c  |    4 ++++
 hw/qdev-monitor.c |   17 ++++++++++++++++-
 hw/qdev.c         |    2 ++
 3 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/hw/pci-hotplug.c b/hw/pci-hotplug.c
index e7fb780..33a9dfe 100644
--- a/hw/pci-hotplug.c
+++ b/hw/pci-hotplug.c
@@ -265,9 +265,11 @@ static int pci_device_hot_remove(Monitor *mon, const char *pci_addr)
         return -1;
     }
 
+    qemu_lock_devtree();
     d = pci_find_device(pci_find_root_bus(dom), bus, PCI_DEVFN(slot, 0));
     if (!d) {
         monitor_printf(mon, "slot %d empty\n", slot);
+        qemu_unlock_devtree();
         return -1;
     }
 
@@ -275,9 +277,11 @@ static int pci_device_hot_remove(Monitor *mon, const char *pci_addr)
     if (error_is_set(&local_err)) {
         monitor_printf(mon, "%s\n", error_get_pretty(local_err));
         error_free(local_err);
+        qemu_unlock_devtree();
         return -1;
     }
 
+    qemu_unlock_devtree();
     return 0;
 }
 
diff --git a/hw/qdev-monitor.c b/hw/qdev-monitor.c
index 7915b45..2d47fe0 100644
--- a/hw/qdev-monitor.c
+++ b/hw/qdev-monitor.c
@@ -429,14 +429,18 @@ DeviceState *qdev_device_add(QemuOpts *opts)
 
     /* find bus */
     path = qemu_opt_get(opts, "bus");
+
+    qemu_lock_devtree();
     if (path != NULL) {
         bus = qbus_find(path);
         if (!bus) {
+            qemu_unlock_devtree();
             return NULL;
         }
         if (strcmp(object_get_typename(OBJECT(bus)), k->bus_type) != 0) {
             qerror_report(QERR_BAD_BUS_FOR_DEVICE,
                           driver, object_get_typename(OBJECT(bus)));
+            qemu_unlock_devtree();
             return NULL;
         }
     } else {
@@ -444,11 +448,13 @@ DeviceState *qdev_device_add(QemuOpts *opts)
         if (!bus) {
             qerror_report(QERR_NO_BUS_FOR_DEVICE,
                           driver, k->bus_type);
+            qemu_unlock_devtree();
             return NULL;
         }
     }
     if (qdev_hotplug && !bus->allow_hotplug) {
         qerror_report(QERR_BUS_NO_HOTPLUG, bus->name);
+        qemu_unlock_devtree();
         return NULL;
     }
 
@@ -466,6 +472,7 @@ DeviceState *qdev_device_add(QemuOpts *opts)
     }
     if (qemu_opt_foreach(opts, set_property, qdev, 1) != 0) {
         qdev_free(qdev);
+        qemu_unlock_devtree();
         return NULL;
     }
     if (qdev->id) {
@@ -478,6 +485,8 @@ DeviceState *qdev_device_add(QemuOpts *opts)
                                   OBJECT(qdev), NULL);
         g_free(name);
     }        
+    qemu_unlock_devtree();
+
     if (qdev_init(qdev) < 0) {
         qerror_report(QERR_DEVICE_INIT_FAILED, driver);
         return NULL;
@@ -600,13 +609,19 @@ void qmp_device_del(const char *id, Error **errp)
 {
     DeviceState *dev;
 
+    /* protect against unplug ack from guest, where we really remove device
+     * from system
+     */
+    qemu_lock_devtree();
     dev = qdev_find_recursive(sysbus_get_default(), id);
     if (NULL == dev) {
         error_set(errp, QERR_DEVICE_NOT_FOUND, id);
+        qemu_unlock_devtree();
         return;
     }
-
+    /* Just remove from system, and drop refcnt there*/
     qdev_unplug(dev, errp);
+    qemu_unlock_devtree();
 }
 
 void qdev_machine_init(void)
diff --git a/hw/qdev.c b/hw/qdev.c
index af54467..17525fe 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -230,7 +230,9 @@ static int qbus_reset_one(BusState *bus, void *opaque)
 
 void qdev_reset_all(DeviceState *dev)
 {
+    qemu_lock_devtree();
     qdev_walk_children(dev, qdev_reset_one, qbus_reset_one, NULL);
+    qemu_unlock_devtree();
 }
 
 void qbus_reset_all_fn(void *opaque)
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [Qemu-devel] [PATCH 12/15] qdev: using devtree lock to protect device's accessing
@ 2012-08-08  6:25   ` Liu Ping Fan
  0 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemulist, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

lock:
  qemu_device_tree_mutex

competitors:
  --device_del(destruction of device will be postphoned until unplug
    ack from guest),
  --pci hot-unplug
  --iteration (qdev_reset_all)
  --device_add

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/pci-hotplug.c  |    4 ++++
 hw/qdev-monitor.c |   17 ++++++++++++++++-
 hw/qdev.c         |    2 ++
 3 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/hw/pci-hotplug.c b/hw/pci-hotplug.c
index e7fb780..33a9dfe 100644
--- a/hw/pci-hotplug.c
+++ b/hw/pci-hotplug.c
@@ -265,9 +265,11 @@ static int pci_device_hot_remove(Monitor *mon, const char *pci_addr)
         return -1;
     }
 
+    qemu_lock_devtree();
     d = pci_find_device(pci_find_root_bus(dom), bus, PCI_DEVFN(slot, 0));
     if (!d) {
         monitor_printf(mon, "slot %d empty\n", slot);
+        qemu_unlock_devtree();
         return -1;
     }
 
@@ -275,9 +277,11 @@ static int pci_device_hot_remove(Monitor *mon, const char *pci_addr)
     if (error_is_set(&local_err)) {
         monitor_printf(mon, "%s\n", error_get_pretty(local_err));
         error_free(local_err);
+        qemu_unlock_devtree();
         return -1;
     }
 
+    qemu_unlock_devtree();
     return 0;
 }
 
diff --git a/hw/qdev-monitor.c b/hw/qdev-monitor.c
index 7915b45..2d47fe0 100644
--- a/hw/qdev-monitor.c
+++ b/hw/qdev-monitor.c
@@ -429,14 +429,18 @@ DeviceState *qdev_device_add(QemuOpts *opts)
 
     /* find bus */
     path = qemu_opt_get(opts, "bus");
+
+    qemu_lock_devtree();
     if (path != NULL) {
         bus = qbus_find(path);
         if (!bus) {
+            qemu_unlock_devtree();
             return NULL;
         }
         if (strcmp(object_get_typename(OBJECT(bus)), k->bus_type) != 0) {
             qerror_report(QERR_BAD_BUS_FOR_DEVICE,
                           driver, object_get_typename(OBJECT(bus)));
+            qemu_unlock_devtree();
             return NULL;
         }
     } else {
@@ -444,11 +448,13 @@ DeviceState *qdev_device_add(QemuOpts *opts)
         if (!bus) {
             qerror_report(QERR_NO_BUS_FOR_DEVICE,
                           driver, k->bus_type);
+            qemu_unlock_devtree();
             return NULL;
         }
     }
     if (qdev_hotplug && !bus->allow_hotplug) {
         qerror_report(QERR_BUS_NO_HOTPLUG, bus->name);
+        qemu_unlock_devtree();
         return NULL;
     }
 
@@ -466,6 +472,7 @@ DeviceState *qdev_device_add(QemuOpts *opts)
     }
     if (qemu_opt_foreach(opts, set_property, qdev, 1) != 0) {
         qdev_free(qdev);
+        qemu_unlock_devtree();
         return NULL;
     }
     if (qdev->id) {
@@ -478,6 +485,8 @@ DeviceState *qdev_device_add(QemuOpts *opts)
                                   OBJECT(qdev), NULL);
         g_free(name);
     }        
+    qemu_unlock_devtree();
+
     if (qdev_init(qdev) < 0) {
         qerror_report(QERR_DEVICE_INIT_FAILED, driver);
         return NULL;
@@ -600,13 +609,19 @@ void qmp_device_del(const char *id, Error **errp)
 {
     DeviceState *dev;
 
+    /* protect against unplug ack from guest, where we really remove device
+     * from system
+     */
+    qemu_lock_devtree();
     dev = qdev_find_recursive(sysbus_get_default(), id);
     if (NULL == dev) {
         error_set(errp, QERR_DEVICE_NOT_FOUND, id);
+        qemu_unlock_devtree();
         return;
     }
-
+    /* Just remove from system, and drop refcnt there*/
     qdev_unplug(dev, errp);
+    qemu_unlock_devtree();
 }
 
 void qdev_machine_init(void)
diff --git a/hw/qdev.c b/hw/qdev.c
index af54467..17525fe 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -230,7 +230,9 @@ static int qbus_reset_one(BusState *bus, void *opaque)
 
 void qdev_reset_all(DeviceState *dev)
 {
+    qemu_lock_devtree();
     qdev_walk_children(dev, qdev_reset_one, qbus_reset_one, NULL);
+    qemu_unlock_devtree();
 }
 
 void qbus_reset_all_fn(void *opaque)
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [PATCH 13/15] hotplug: introduce qdev_unplug_complete() to remove device from views
  2012-08-08  6:25 ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  6:25   ` Liu Ping Fan
  -1 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Anthony Liguori, Avi Kivity, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber,
	qemulist

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

When guest confirm the removal of device, we should
--unmap from MemoryRegion view
--isolated from device tree view

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/acpi_piix4.c |    4 ++--
 hw/pci.c        |   13 ++++++++++++-
 hw/pci.h        |    2 ++
 hw/qdev.c       |   28 ++++++++++++++++++++++++++++
 hw/qdev.h       |    3 ++-
 5 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 0aace60..c209ff7 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -305,8 +305,8 @@ static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots)
             if (pc->no_hotplug) {
                 slot_free = false;
             } else {
-                object_unparent(OBJECT(dev));
-                qdev_free(qdev);
+                /* refcnt will be decreased */
+                qdev_unplug_complete(qdev, NULL);
             }
         }
     }
diff --git a/hw/pci.c b/hw/pci.c
index 99a4304..2095abf 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -856,12 +856,22 @@ static int pci_unregister_device(DeviceState *dev)
     if (ret)
         return ret;
 
-    pci_unregister_io_regions(pci_dev);
     pci_del_option_rom(pci_dev);
     do_pci_unregister_device(pci_dev);
     return 0;
 }
 
+static void pci_unmap_device(DeviceState *dev)
+{
+    PCIDevice *pci_dev = PCI_DEVICE(dev);
+    PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev);
+
+    pci_unregister_io_regions(pci_dev);
+    if (pc->unmap) {
+        pc->unmap(pci_dev);
+    }
+}
+
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
                       uint8_t type, MemoryRegion *memory)
 {
@@ -2022,6 +2032,7 @@ static void pci_device_class_init(ObjectClass *klass, void *data)
     DeviceClass *k = DEVICE_CLASS(klass);
     k->init = pci_qdev_init;
     k->unplug = pci_unplug_device;
+    k->unmap = pci_unmap_device;
     k->exit = pci_unregister_device;
     k->bus_type = TYPE_PCI_BUS;
     k->props = pci_props;
diff --git a/hw/pci.h b/hw/pci.h
index 79d38fd..1c5b909 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -145,6 +145,8 @@ typedef struct PCIDeviceClass {
     DeviceClass parent_class;
 
     int (*init)(PCIDevice *dev);
+    void (*unmap)(PCIDevice *dev);
+
     PCIUnregisterFunc *exit;
     PCIConfigReadFunc *config_read;
     PCIConfigWriteFunc *config_write;
diff --git a/hw/qdev.c b/hw/qdev.c
index 17525fe..530eabe 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -104,6 +104,14 @@ void qdev_set_parent_bus(DeviceState *dev, BusState *bus)
     bus_add_child(bus, dev);
 }
 
+static void qdev_unset_parent(DeviceState *dev)
+{
+    BusState *b = dev->parent_bus;
+
+    object_unparent(OBJECT(dev));
+    bus_remove_child(b, dev);
+}
+
 /* Create a new device.  This only initializes the device state structure
    and allows properties to be set.  qdev_init should be called to
    initialize the actual device emulation.  */
@@ -194,6 +202,26 @@ void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
     dev->alias_required_for_version = required_for_version;
 }
 
+static int qdev_unmap(DeviceState *dev)
+{
+    DeviceClass *dc =  DEVICE_GET_CLASS(dev);
+    if (dc->unmap) {
+        dc->unmap(dev);
+    }
+    return 0;
+}
+
+void qdev_unplug_complete(DeviceState *dev, Error **errp)
+{
+    /* isolate from mem view */
+    qdev_unmap(dev);
+    qemu_lock_devtree();
+    /* isolate from device tree */
+    qdev_unset_parent(dev);
+    qemu_unlock_devtree();
+    object_unref(OBJECT(dev));
+}
+
 void qdev_unplug(DeviceState *dev, Error **errp)
 {
     DeviceClass *dc = DEVICE_GET_CLASS(dev);
diff --git a/hw/qdev.h b/hw/qdev.h
index f4683dc..705635a 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -47,7 +47,7 @@ typedef struct DeviceClass {
 
     /* callbacks */
     void (*reset)(DeviceState *dev);
-
+    void (*unmap)(DeviceState *dev);
     /* device state */
     const VMStateDescription *vmsd;
 
@@ -162,6 +162,7 @@ void qdev_init_nofail(DeviceState *dev);
 void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
                                  int required_for_version);
 void qdev_unplug(DeviceState *dev, Error **errp);
+void qdev_unplug_complete(DeviceState *dev, Error **errp);
 void qdev_free(DeviceState *dev);
 int qdev_simple_unplug_cb(DeviceState *dev);
 void qdev_machine_creation_done(void);
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [Qemu-devel] [PATCH 13/15] hotplug: introduce qdev_unplug_complete() to remove device from views
@ 2012-08-08  6:25   ` Liu Ping Fan
  0 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemulist, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

When guest confirm the removal of device, we should
--unmap from MemoryRegion view
--isolated from device tree view

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/acpi_piix4.c |    4 ++--
 hw/pci.c        |   13 ++++++++++++-
 hw/pci.h        |    2 ++
 hw/qdev.c       |   28 ++++++++++++++++++++++++++++
 hw/qdev.h       |    3 ++-
 5 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 0aace60..c209ff7 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -305,8 +305,8 @@ static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots)
             if (pc->no_hotplug) {
                 slot_free = false;
             } else {
-                object_unparent(OBJECT(dev));
-                qdev_free(qdev);
+                /* refcnt will be decreased */
+                qdev_unplug_complete(qdev, NULL);
             }
         }
     }
diff --git a/hw/pci.c b/hw/pci.c
index 99a4304..2095abf 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -856,12 +856,22 @@ static int pci_unregister_device(DeviceState *dev)
     if (ret)
         return ret;
 
-    pci_unregister_io_regions(pci_dev);
     pci_del_option_rom(pci_dev);
     do_pci_unregister_device(pci_dev);
     return 0;
 }
 
+static void pci_unmap_device(DeviceState *dev)
+{
+    PCIDevice *pci_dev = PCI_DEVICE(dev);
+    PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev);
+
+    pci_unregister_io_regions(pci_dev);
+    if (pc->unmap) {
+        pc->unmap(pci_dev);
+    }
+}
+
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
                       uint8_t type, MemoryRegion *memory)
 {
@@ -2022,6 +2032,7 @@ static void pci_device_class_init(ObjectClass *klass, void *data)
     DeviceClass *k = DEVICE_CLASS(klass);
     k->init = pci_qdev_init;
     k->unplug = pci_unplug_device;
+    k->unmap = pci_unmap_device;
     k->exit = pci_unregister_device;
     k->bus_type = TYPE_PCI_BUS;
     k->props = pci_props;
diff --git a/hw/pci.h b/hw/pci.h
index 79d38fd..1c5b909 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -145,6 +145,8 @@ typedef struct PCIDeviceClass {
     DeviceClass parent_class;
 
     int (*init)(PCIDevice *dev);
+    void (*unmap)(PCIDevice *dev);
+
     PCIUnregisterFunc *exit;
     PCIConfigReadFunc *config_read;
     PCIConfigWriteFunc *config_write;
diff --git a/hw/qdev.c b/hw/qdev.c
index 17525fe..530eabe 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -104,6 +104,14 @@ void qdev_set_parent_bus(DeviceState *dev, BusState *bus)
     bus_add_child(bus, dev);
 }
 
+static void qdev_unset_parent(DeviceState *dev)
+{
+    BusState *b = dev->parent_bus;
+
+    object_unparent(OBJECT(dev));
+    bus_remove_child(b, dev);
+}
+
 /* Create a new device.  This only initializes the device state structure
    and allows properties to be set.  qdev_init should be called to
    initialize the actual device emulation.  */
@@ -194,6 +202,26 @@ void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
     dev->alias_required_for_version = required_for_version;
 }
 
+static int qdev_unmap(DeviceState *dev)
+{
+    DeviceClass *dc =  DEVICE_GET_CLASS(dev);
+    if (dc->unmap) {
+        dc->unmap(dev);
+    }
+    return 0;
+}
+
+void qdev_unplug_complete(DeviceState *dev, Error **errp)
+{
+    /* isolate from mem view */
+    qdev_unmap(dev);
+    qemu_lock_devtree();
+    /* isolate from device tree */
+    qdev_unset_parent(dev);
+    qemu_unlock_devtree();
+    object_unref(OBJECT(dev));
+}
+
 void qdev_unplug(DeviceState *dev, Error **errp)
 {
     DeviceClass *dc = DEVICE_GET_CLASS(dev);
diff --git a/hw/qdev.h b/hw/qdev.h
index f4683dc..705635a 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -47,7 +47,7 @@ typedef struct DeviceClass {
 
     /* callbacks */
     void (*reset)(DeviceState *dev);
-
+    void (*unmap)(DeviceState *dev);
     /* device state */
     const VMStateDescription *vmsd;
 
@@ -162,6 +162,7 @@ void qdev_init_nofail(DeviceState *dev);
 void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
                                  int required_for_version);
 void qdev_unplug(DeviceState *dev, Error **errp);
+void qdev_unplug_complete(DeviceState *dev, Error **errp);
 void qdev_free(DeviceState *dev);
 int qdev_simple_unplug_cb(DeviceState *dev);
 void qdev_machine_creation_done(void);
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [PATCH 14/15] qom: object_unref call reclaimer
  2012-08-08  6:25 ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  6:25   ` Liu Ping Fan
  -1 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Anthony Liguori, Avi Kivity, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber,
	qemulist

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

iohandler/bh/timer may use DeviceState when its refcnt=0,
postpone the reclaimer till they have done with it.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 qom/object.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/qom/object.c b/qom/object.c
index 822bdb7..1452b1b 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -23,6 +23,8 @@
 #include "qbool.h"
 #include "qint.h"
 #include "qstring.h"
+#include "hw/qdev.h"
+#include "qemu/reclaimer.h"
 
 #define MAX_INTERFACES 32
 
@@ -646,7 +648,12 @@ void object_unref(Object *obj)
 {
     g_assert(atomic_read(&obj->ref) > 0);
     if (atomic_dec_and_test(&obj->ref)) {
-        object_finalize(obj);
+        /* fixme, maybe introduce obj->finalze to make this more elegant */
+        if (object_dynamic_cast(obj, TYPE_DEVICE) != NULL) {
+            qemu_reclaimer_enqueue(obj, object_finalize);
+        } else {
+            object_finalize(obj);
+        }
     }
 }
 
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [Qemu-devel] [PATCH 14/15] qom: object_unref call reclaimer
@ 2012-08-08  6:25   ` Liu Ping Fan
  0 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemulist, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

iohandler/bh/timer may use DeviceState when its refcnt=0,
postpone the reclaimer till they have done with it.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 qom/object.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/qom/object.c b/qom/object.c
index 822bdb7..1452b1b 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -23,6 +23,8 @@
 #include "qbool.h"
 #include "qint.h"
 #include "qstring.h"
+#include "hw/qdev.h"
+#include "qemu/reclaimer.h"
 
 #define MAX_INTERFACES 32
 
@@ -646,7 +648,12 @@ void object_unref(Object *obj)
 {
     g_assert(atomic_read(&obj->ref) > 0);
     if (atomic_dec_and_test(&obj->ref)) {
-        object_finalize(obj);
+        /* fixme, maybe introduce obj->finalze to make this more elegant */
+        if (object_dynamic_cast(obj, TYPE_DEVICE) != NULL) {
+            qemu_reclaimer_enqueue(obj, object_finalize);
+        } else {
+            object_finalize(obj);
+        }
     }
 }
 
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [PATCH 15/15] e1000: using new interface--unmap to unplug
  2012-08-08  6:25 ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  6:25   ` Liu Ping Fan
  -1 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Anthony Liguori, Avi Kivity, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber,
	qemulist

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/e1000.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index 4573f13..fa71455 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -1192,6 +1192,13 @@ e1000_cleanup(VLANClientState *nc)
     s->nic = NULL;
 }
 
+static void
+pci_e1000_unmap(PCIDevice *p)
+{
+    /* DO NOT FREE anything!until refcnt=0 */
+    /* isolate from memory view */
+}
+
 static int
 pci_e1000_uninit(PCIDevice *dev)
 {
@@ -1275,6 +1282,7 @@ static void e1000_class_init(ObjectClass *klass, void *data)
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
     k->init = pci_e1000_init;
+    k->unmap = pci_e1000_unmap;
     k->exit = pci_e1000_uninit;
     k->romfile = "pxe-e1000.rom";
     k->vendor_id = PCI_VENDOR_ID_INTEL;
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 154+ messages in thread

* [Qemu-devel] [PATCH 15/15] e1000: using new interface--unmap to unplug
@ 2012-08-08  6:25   ` Liu Ping Fan
  0 siblings, 0 replies; 154+ messages in thread
From: Liu Ping Fan @ 2012-08-08  6:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemulist, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/e1000.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index 4573f13..fa71455 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -1192,6 +1192,13 @@ e1000_cleanup(VLANClientState *nc)
     s->nic = NULL;
 }
 
+static void
+pci_e1000_unmap(PCIDevice *p)
+{
+    /* DO NOT FREE anything!until refcnt=0 */
+    /* isolate from memory view */
+}
+
 static int
 pci_e1000_uninit(PCIDevice *dev)
 {
@@ -1275,6 +1282,7 @@ static void e1000_class_init(ObjectClass *klass, void *data)
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
     k->init = pci_e1000_init;
+    k->unmap = pci_e1000_unmap;
     k->exit = pci_e1000_uninit;
     k->romfile = "pxe-e1000.rom";
     k->vendor_id = PCI_VENDOR_ID_INTEL;
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 154+ messages in thread

* Re: [PATCH 01/15] atomic: introduce atomic operations
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  8:55     ` Paolo Bonzini
  -1 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-08  8:55 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: qemu-devel, kvm, Anthony Liguori, Avi Kivity, Jan Kiszka,
	Marcelo Tosatti, Stefan Hajnoczi, Blue Swirl,
	Andreas Färber

Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> If out of global lock, we will be challenged by SMP in low level,
> so need atomic ops.
> 
> This file is heavily copied from kernel.

Then it cannot be GPLv2 _or later_.  Please use the version that I
pointed you to.

Paolo

 Currently, only x86 atomic ops
> included, and will be extended for other arch for future.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  include/qemu/atomic.h |  161 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 161 insertions(+), 0 deletions(-)
>  create mode 100644 include/qemu/atomic.h
> 
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> new file mode 100644
> index 0000000..8e1fc3e
> --- /dev/null
> +++ b/include/qemu/atomic.h
> @@ -0,0 +1,161 @@
> +/*
> + * Simple interface for atomic operations.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */



^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 01/15] atomic: introduce atomic operations
@ 2012-08-08  8:55     ` Paolo Bonzini
  0 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-08  8:55 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> If out of global lock, we will be challenged by SMP in low level,
> so need atomic ops.
> 
> This file is heavily copied from kernel.

Then it cannot be GPLv2 _or later_.  Please use the version that I
pointed you to.

Paolo

 Currently, only x86 atomic ops
> included, and will be extended for other arch for future.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  include/qemu/atomic.h |  161 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 161 insertions(+), 0 deletions(-)
>  create mode 100644 include/qemu/atomic.h
> 
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> new file mode 100644
> index 0000000..8e1fc3e
> --- /dev/null
> +++ b/include/qemu/atomic.h
> @@ -0,0 +1,161 @@
> +/*
> + * Simple interface for atomic operations.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 01/15] atomic: introduce atomic operations
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  9:02     ` Avi Kivity
  -1 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:02 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: qemu-devel, kvm, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber

On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> If out of global lock, we will be challenged by SMP in low level,
> so need atomic ops.
> 
> This file is heavily copied from kernel. Currently, only x86 atomic ops
> included, and will be extended for other arch for future.
> 

I propose we use gcc builtins.  We get automatic architecture support,
and tuning for newer processors if the user so chooses.

http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html

In May 2031 we can switch to C11 atomics.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 01/15] atomic: introduce atomic operations
@ 2012-08-08  9:02     ` Avi Kivity
  0 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:02 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> If out of global lock, we will be challenged by SMP in low level,
> so need atomic ops.
> 
> This file is heavily copied from kernel. Currently, only x86 atomic ops
> included, and will be extended for other arch for future.
> 

I propose we use gcc builtins.  We get automatic architecture support,
and tuning for newer processors if the user so chooses.

http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html

In May 2031 we can switch to C11 atomics.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 03/15] qom: introduce reclaimer to release obj
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  9:05     ` Avi Kivity
  -1 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:05 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: qemu-devel, kvm, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber

On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Collect unused object and release them at caller demand.
> 

Please explain the motivation for this patch.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 03/15] qom: introduce reclaimer to release obj
@ 2012-08-08  9:05     ` Avi Kivity
  0 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:05 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Collect unused object and release them at caller demand.
> 

Please explain the motivation for this patch.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 01/15] atomic: introduce atomic operations
  2012-08-08  9:02     ` [Qemu-devel] " Avi Kivity
@ 2012-08-08  9:05       ` 陳韋任 (Wei-Ren Chen)
  -1 siblings, 0 replies; 154+ messages in thread
From: 陳韋任 (Wei-Ren Chen) @ 2012-08-08  9:05 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Liu Ping Fan, kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas F酺ber

> I propose we use gcc builtins.  We get automatic architecture support,
> and tuning for newer processors if the user so chooses.
> 
> http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
> 
> In May 2031 we can switch to C11 atomics.
         ^^^^
  Maybe 2013?

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 01/15] atomic: introduce atomic operations
@ 2012-08-08  9:05       ` 陳韋任 (Wei-Ren Chen)
  0 siblings, 0 replies; 154+ messages in thread
From: 陳韋任 (Wei-Ren Chen) @ 2012-08-08  9:05 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, Liu Ping Fan, qemu-devel,
	Blue Swirl, Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas F酺ber

> I propose we use gcc builtins.  We get automatic architecture support,
> and tuning for newer processors if the user so chooses.
> 
> http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
> 
> In May 2031 we can switch to C11 atomics.
         ^^^^
  Maybe 2013?

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 03/15] qom: introduce reclaimer to release obj
  2012-08-08  9:05     ` [Qemu-devel] " Avi Kivity
@ 2012-08-08  9:07       ` Paolo Bonzini
  -1 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-08  9:07 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, Liu Ping Fan, qemu-devel,
	Blue Swirl, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

Il 08/08/2012 11:05, Avi Kivity ha scritto:
>> > From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> > 
>> > Collect unused object and release them at caller demand.
>> > 
> Please explain the motivation for this patch.

It's poor man RCU, I think?

Paolo

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 03/15] qom: introduce reclaimer to release obj
@ 2012-08-08  9:07       ` Paolo Bonzini
  0 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-08  9:07 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, Liu Ping Fan, qemu-devel,
	Blue Swirl, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

Il 08/08/2012 11:05, Avi Kivity ha scritto:
>> > From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> > 
>> > Collect unused object and release them at caller demand.
>> > 
> Please explain the motivation for this patch.

It's poor man RCU, I think?

Paolo

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 04/15] memory: MemoryRegion topology must be stable when updating
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  9:13     ` Avi Kivity
  -1 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:13 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Using mem_map_lock to protect among updaters. So we can get the intact
> snapshot of mem topology -- FlatView & radix-tree.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  exec.c   |    3 +++
>  memory.c |   22 ++++++++++++++++++++++
>  memory.h |    2 ++
>  3 files changed, 27 insertions(+), 0 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 8244d54..0e29ef9 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -210,6 +210,8 @@ static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
>     The bottom level has pointers to MemoryRegionSections.  */
>  static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
>  
> +QemuMutex mem_map_lock;
> +
>  static void io_mem_init(void);
>  static void memory_map_init(void);
>  
> @@ -637,6 +639,7 @@ void cpu_exec_init_all(void)
>  #if !defined(CONFIG_USER_ONLY)
>      memory_map_init();
>      io_mem_init();
> +    qemu_mutex_init(&mem_map_lock);
>  #endif
>  }
>  
> diff --git a/memory.c b/memory.c
> index aab4a31..5986532 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -761,7 +761,9 @@ void memory_region_transaction_commit(void)
>      assert(memory_region_transaction_depth);
>      --memory_region_transaction_depth;
>      if (!memory_region_transaction_depth && memory_region_update_pending) {
> +        qemu_mutex_lock(&mem_map_lock);
>          memory_region_update_topology(NULL);
> +        qemu_mutex_unlock(&mem_map_lock);
>      }
>  }

Seems to me that nothing in memory.c can susceptible to races.  It must
already be called under the big qemu lock, and with the exception of
mutators (memory_region_set_*), changes aren't directly visible.

I think it's sufficient to take the mem_map_lock at the beginning of
core_begin() and drop it at the end of core_commit().  That means all
updates of volatile state, phys_map, are protected.



-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 04/15] memory: MemoryRegion topology must be stable when updating
@ 2012-08-08  9:13     ` Avi Kivity
  0 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:13 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Using mem_map_lock to protect among updaters. So we can get the intact
> snapshot of mem topology -- FlatView & radix-tree.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  exec.c   |    3 +++
>  memory.c |   22 ++++++++++++++++++++++
>  memory.h |    2 ++
>  3 files changed, 27 insertions(+), 0 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 8244d54..0e29ef9 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -210,6 +210,8 @@ static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
>     The bottom level has pointers to MemoryRegionSections.  */
>  static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
>  
> +QemuMutex mem_map_lock;
> +
>  static void io_mem_init(void);
>  static void memory_map_init(void);
>  
> @@ -637,6 +639,7 @@ void cpu_exec_init_all(void)
>  #if !defined(CONFIG_USER_ONLY)
>      memory_map_init();
>      io_mem_init();
> +    qemu_mutex_init(&mem_map_lock);
>  #endif
>  }
>  
> diff --git a/memory.c b/memory.c
> index aab4a31..5986532 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -761,7 +761,9 @@ void memory_region_transaction_commit(void)
>      assert(memory_region_transaction_depth);
>      --memory_region_transaction_depth;
>      if (!memory_region_transaction_depth && memory_region_update_pending) {
> +        qemu_mutex_lock(&mem_map_lock);
>          memory_region_update_topology(NULL);
> +        qemu_mutex_unlock(&mem_map_lock);
>      }
>  }

Seems to me that nothing in memory.c can susceptible to races.  It must
already be called under the big qemu lock, and with the exception of
mutators (memory_region_set_*), changes aren't directly visible.

I think it's sufficient to take the mem_map_lock at the beginning of
core_begin() and drop it at the end of core_commit().  That means all
updates of volatile state, phys_map, are protected.



-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 03/15] qom: introduce reclaimer to release obj
  2012-08-08  9:07       ` [Qemu-devel] " Paolo Bonzini
@ 2012-08-08  9:15         ` Avi Kivity
  -1 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liu Ping Fan, qemu-devel, kvm, Anthony Liguori, Jan Kiszka,
	Marcelo Tosatti, Stefan Hajnoczi, Blue Swirl,
	Andreas Färber

On 08/08/2012 12:07 PM, Paolo Bonzini wrote:
> Il 08/08/2012 11:05, Avi Kivity ha scritto:
>>> > From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>> > 
>>> > Collect unused object and release them at caller demand.
>>> > 
>> Please explain the motivation for this patch.
> 
> It's poor man RCU, I think?

I thought that it was to defer destructors (finalizers) to a more
suitable context.  But why is the unref context unsuitable?

I don't see how it relates to RCU, where is the C and the U?

Anyway the list eagerly awaits the explanation.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 03/15] qom: introduce reclaimer to release obj
@ 2012-08-08  9:15         ` Avi Kivity
  0 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, Liu Ping Fan, qemu-devel,
	Blue Swirl, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

On 08/08/2012 12:07 PM, Paolo Bonzini wrote:
> Il 08/08/2012 11:05, Avi Kivity ha scritto:
>>> > From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>> > 
>>> > Collect unused object and release them at caller demand.
>>> > 
>> Please explain the motivation for this patch.
> 
> It's poor man RCU, I think?

I thought that it was to defer destructors (finalizers) to a more
suitable context.  But why is the unref context unsuitable?

I don't see how it relates to RCU, where is the C and the U?

Anyway the list eagerly awaits the explanation.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 01/15] atomic: introduce atomic operations
  2012-08-08  9:05       ` 陳韋任 (Wei-Ren Chen)
@ 2012-08-08  9:15         ` Avi Kivity
  -1 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:15 UTC (permalink / raw)
  To: "陳韋任 (Wei-Ren Chen)"
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, Liu Ping Fan, qemu-devel,
	Blue Swirl, Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas F酺ber

On 08/08/2012 12:05 PM, 陳韋任 (Wei-Ren Chen) wrote:
>> I propose we use gcc builtins.  We get automatic architecture support,
>> and tuning for newer processors if the user so chooses.
>> 
>> http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
>> 
>> In May 2031 we can switch to C11 atomics.
>          ^^^^
>   Maybe 2013?
> 

Maybe in 2013 we'll get a compiler that supports C11...

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 01/15] atomic: introduce atomic operations
@ 2012-08-08  9:15         ` Avi Kivity
  0 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:15 UTC (permalink / raw)
  To: "陳韋任 (Wei-Ren Chen)"
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, Liu Ping Fan, qemu-devel,
	Blue Swirl, Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas F酺ber

On 08/08/2012 12:05 PM, 陳韋任 (Wei-Ren Chen) wrote:
>> I propose we use gcc builtins.  We get automatic architecture support,
>> and tuning for newer processors if the user so chooses.
>> 
>> http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
>> 
>> In May 2031 we can switch to C11 atomics.
>          ^^^^
>   Maybe 2013?
> 

Maybe in 2013 we'll get a compiler that supports C11...

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 05/15] memory: introduce life_ops to MemoryRegion
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  9:18     ` Avi Kivity
  -1 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:18 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: qemu-devel, kvm, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber

On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> The types of referred object by MemoryRegion are variable, ex,
> another mr, DeviceState, or other struct defined by drivers.
> So the refer/unrefer may be different by drivers.
> 
> Using this ops, we can mange the backend object.
> 

Seems to be a needless abstration - we already have lifetime manangement
for objects.

I suggested previously to replace the opaque parameter with an Object,
and use Object's refcounting.  That's a lot of work, but IMO is worth it
as the opaques are dangerous to live lying around.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 05/15] memory: introduce life_ops to MemoryRegion
@ 2012-08-08  9:18     ` Avi Kivity
  0 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:18 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> The types of referred object by MemoryRegion are variable, ex,
> another mr, DeviceState, or other struct defined by drivers.
> So the refer/unrefer may be different by drivers.
> 
> Using this ops, we can mange the backend object.
> 

Seems to be a needless abstration - we already have lifetime manangement
for objects.

I suggested previously to replace the opaque parameter with an Object,
and use Object's refcounting.  That's a lot of work, but IMO is worth it
as the opaques are dangerous to live lying around.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 06/15] memory: use refcnt to manage MemoryRegion
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  9:20     ` Avi Kivity
  -1 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:20 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: qemu-devel, kvm, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber

On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Using refcnt for mr, so we can separate mr's life cycle management
> from refered object.
>   When mr->ref 0->1, inc the refered object.
>   When mr->ref 1->0, dec the refered object.
> 
> The refered object can be DeviceStae, another mr, or other opaque.

Please explain the motivation more fully.

Usually a MemoryRegion will be embedded within some DeviceState, or its
lifecycle will be managed by the DeviceState.  So long as we keep the
DeviceState alive all associated MemoryRegions should be alive as well.
 Why not do this directly?


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 06/15] memory: use refcnt to manage MemoryRegion
@ 2012-08-08  9:20     ` Avi Kivity
  0 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:20 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Using refcnt for mr, so we can separate mr's life cycle management
> from refered object.
>   When mr->ref 0->1, inc the refered object.
>   When mr->ref 1->0, dec the refered object.
> 
> The refered object can be DeviceStae, another mr, or other opaque.

Please explain the motivation more fully.

Usually a MemoryRegion will be embedded within some DeviceState, or its
lifecycle will be managed by the DeviceState.  So long as we keep the
DeviceState alive all associated MemoryRegions should be alive as well.
 Why not do this directly?


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 01/15] atomic: introduce atomic operations
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  9:21     ` Peter Maydell
  -1 siblings, 0 replies; 154+ messages in thread
From: Peter Maydell @ 2012-08-08  9:21 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: qemu-devel, kvm, Stefan Hajnoczi, Marcelo Tosatti, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

On 8 August 2012 07:25, Liu Ping Fan <qemulist@gmail.com> wrote:
> +static inline void atomic_sub(int i, Atomic *v)
> +{
> +    asm volatile("lock; subl %1,%0"
> +             : "+m" (v->counter)
> +             : "ir" (i));
> +}

NAK. We don't want random inline assembly implementations of locking
primitives in QEMU, they are way too hard to keep working with all the
possible host architectures we support. I spent some time a while back
getting rid of the (variously busted) versions we had previously.

If you absolutely must use atomic ops, use the gcc builtins. For
preference, stick to higher level and less error-prone abstractions.

-- PMM

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 01/15] atomic: introduce atomic operations
@ 2012-08-08  9:21     ` Peter Maydell
  0 siblings, 0 replies; 154+ messages in thread
From: Peter Maydell @ 2012-08-08  9:21 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

On 8 August 2012 07:25, Liu Ping Fan <qemulist@gmail.com> wrote:
> +static inline void atomic_sub(int i, Atomic *v)
> +{
> +    asm volatile("lock; subl %1,%0"
> +             : "+m" (v->counter)
> +             : "ir" (i));
> +}

NAK. We don't want random inline assembly implementations of locking
primitives in QEMU, they are way too hard to keep working with all the
possible host architectures we support. I spent some time a while back
getting rid of the (variously busted) versions we had previously.

If you absolutely must use atomic ops, use the gcc builtins. For
preference, stick to higher level and less error-prone abstractions.

-- PMM

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 08/15] memory: introduce PhysMap to present snapshot of toploygy
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  9:27     ` Avi Kivity
  -1 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:27 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> PhysMap contain the flatview and radix-tree view, they are snapshot
> of system topology and should be consistent. With PhysMap, we can
> swap the pointer when updating and achieve the atomic.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  exec.c   |    8 --------
>  memory.c |   33 ---------------------------------
>  memory.h |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 60 insertions(+), 43 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 0e29ef9..01b91b0 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -156,8 +156,6 @@ typedef struct PageDesc {
>  #endif
>  
>  /* Size of the L2 (and L3, etc) page tables.  */
> -#define L2_BITS 10
> -#define L2_SIZE (1 << L2_BITS)
>  
>  #define P_L2_LEVELS \
>      (((TARGET_PHYS_ADDR_SPACE_BITS - TARGET_PAGE_BITS - 1) / L2_BITS) + 1)
> @@ -185,7 +183,6 @@ uintptr_t qemu_host_page_mask;
>  static void *l1_map[V_L1_SIZE];
>  
>  #if !defined(CONFIG_USER_ONLY)
> -typedef struct PhysPageEntry PhysPageEntry;


This (and the other stuff you're moving) is private memory internals.
It should be moved to memory-internals.h (currently named exec-obsolete.h).



-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 08/15] memory: introduce PhysMap to present snapshot of toploygy
@ 2012-08-08  9:27     ` Avi Kivity
  0 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:27 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> PhysMap contain the flatview and radix-tree view, they are snapshot
> of system topology and should be consistent. With PhysMap, we can
> swap the pointer when updating and achieve the atomic.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  exec.c   |    8 --------
>  memory.c |   33 ---------------------------------
>  memory.h |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 60 insertions(+), 43 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 0e29ef9..01b91b0 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -156,8 +156,6 @@ typedef struct PageDesc {
>  #endif
>  
>  /* Size of the L2 (and L3, etc) page tables.  */
> -#define L2_BITS 10
> -#define L2_SIZE (1 << L2_BITS)
>  
>  #define P_L2_LEVELS \
>      (((TARGET_PHYS_ADDR_SPACE_BITS - TARGET_PAGE_BITS - 1) / L2_BITS) + 1)
> @@ -185,7 +183,6 @@ uintptr_t qemu_host_page_mask;
>  static void *l1_map[V_L1_SIZE];
>  
>  #if !defined(CONFIG_USER_ONLY)
> -typedef struct PhysPageEntry PhysPageEntry;


This (and the other stuff you're moving) is private memory internals.
It should be moved to memory-internals.h (currently named exec-obsolete.h).



-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 12/15] qdev: using devtree lock to protect device's accessing
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  9:33     ` Peter Maydell
  -1 siblings, 0 replies; 154+ messages in thread
From: Peter Maydell @ 2012-08-08  9:33 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

On 8 August 2012 07:25, Liu Ping Fan <qemulist@gmail.com> wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>
> lock:
>   qemu_device_tree_mutex

Looking at where it's used, this doesn't seem to have anything to do
with device trees (ie dtb, see www.devicetree.org) : poorly named lock?

-- PMM

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 12/15] qdev: using devtree lock to protect device's accessing
@ 2012-08-08  9:33     ` Peter Maydell
  0 siblings, 0 replies; 154+ messages in thread
From: Peter Maydell @ 2012-08-08  9:33 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

On 8 August 2012 07:25, Liu Ping Fan <qemulist@gmail.com> wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>
> lock:
>   qemu_device_tree_mutex

Looking at where it's used, this doesn't seem to have anything to do
with device trees (ie dtb, see www.devicetree.org) : poorly named lock?

-- PMM

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 03/15] qom: introduce reclaimer to release obj
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  9:35     ` Paolo Bonzini
  -1 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-08  9:35 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Collect unused object and release them at caller demand.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  include/qemu/reclaimer.h |   28 ++++++++++++++++++++++
>  main-loop.c              |    5 ++++
>  qemu-tool.c              |    5 ++++
>  qom/Makefile.objs        |    2 +-
>  qom/reclaimer.c          |   58 ++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 97 insertions(+), 1 deletions(-)
>  create mode 100644 include/qemu/reclaimer.h
>  create mode 100644 qom/reclaimer.c
> 
> diff --git a/include/qemu/reclaimer.h b/include/qemu/reclaimer.h
> new file mode 100644
> index 0000000..9307e93
> --- /dev/null
> +++ b/include/qemu/reclaimer.h
> @@ -0,0 +1,28 @@
> +/*
> + * QEMU reclaimer
> + *
> + * Copyright IBM, Corp. 2012
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_RECLAIMER
> +#define QEMU_RECLAIMER
> +
> +typedef void ReleaseHandler(void *opaque);
> +typedef struct Chunk {
> +    QLIST_ENTRY(Chunk) list;
> +    void *opaque;
> +    ReleaseHandler *release;
> +} Chunk;
> +
> +typedef struct ChunkHead {
> +        struct Chunk *lh_first;
> +} ChunkHead;
> +
> +void reclaimer_enqueue(ChunkHead *head, void *opaque, ReleaseHandler *release);
> +void reclaimer_worker(ChunkHead *head);
> +void qemu_reclaimer_enqueue(void *opaque, ReleaseHandler *release);
> +void qemu_reclaimer(void);

So "enqueue" is call_rcu and qemu_reclaimer marks a quiescent state +
empties the pending call_rcu.

But what's the difference between the two pairs of APIs?

> +#endif
> diff --git a/main-loop.c b/main-loop.c
> index eb3b6e6..be9d095 100644
> --- a/main-loop.c
> +++ b/main-loop.c
> @@ -26,6 +26,7 @@
>  #include "qemu-timer.h"
>  #include "slirp/slirp.h"
>  #include "main-loop.h"
> +#include "qemu/reclaimer.h"
>  
>  #ifndef _WIN32
>  
> @@ -505,5 +506,9 @@ int main_loop_wait(int nonblocking)
>         them.  */
>      qemu_bh_poll();
>  
> +    /* ref to device from iohandler/bh/timer do not obey the rules, so delay
> +     * reclaiming until now.
> +     */
> +    qemu_reclaimer();
>      return ret;
>  }
> diff --git a/qemu-tool.c b/qemu-tool.c
> index 318c5fc..f5fe319 100644
> --- a/qemu-tool.c
> +++ b/qemu-tool.c
> @@ -21,6 +21,7 @@
>  #include "main-loop.h"
>  #include "qemu_socket.h"
>  #include "slirp/libslirp.h"
> +#include "qemu/reclaimer.h"
>  
>  #include <sys/time.h>
>  
> @@ -75,6 +76,10 @@ void qemu_mutex_unlock_iothread(void)
>  {
>  }
>  
> +void qemu_reclaimer(void)
> +{
> +}
> +
>  int use_icount;
>  
>  void qemu_clock_warp(QEMUClock *clock)
> diff --git a/qom/Makefile.objs b/qom/Makefile.objs
> index 5ef060a..a579261 100644
> --- a/qom/Makefile.objs
> +++ b/qom/Makefile.objs
> @@ -1,4 +1,4 @@
> -qom-obj-y = object.o container.o qom-qobject.o
> +qom-obj-y = object.o container.o qom-qobject.o reclaimer.o
>  qom-obj-twice-y = cpu.o
>  common-obj-y = $(qom-obj-twice-y)
>  user-obj-y = $(qom-obj-twice-y)
> diff --git a/qom/reclaimer.c b/qom/reclaimer.c
> new file mode 100644
> index 0000000..6cb53e3
> --- /dev/null
> +++ b/qom/reclaimer.c
> @@ -0,0 +1,58 @@
> +/*
> + * QEMU reclaimer
> + *
> + * Copyright IBM, Corp. 2012
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu-common.h"
> +#include "qemu-thread.h"
> +#include "main-loop.h"
> +#include "qemu-queue.h"
> +#include "qemu/reclaimer.h"
> +
> +static struct QemuMutex reclaimer_lock;
> +static QLIST_HEAD(rcl, Chunk) reclaimer_list;
> +
> +void reclaimer_enqueue(ChunkHead *head, void *opaque, ReleaseHandler *release)
> +{
> +    Chunk *r = g_malloc0(sizeof(Chunk));
> +    r->opaque = opaque;
> +    r->release = release;
> +    QLIST_INSERT_HEAD_RCU(head, r, list);
> +}

No lock?

> +void reclaimer_worker(ChunkHead *head)
> +{
> +    Chunk *cur, *next;
> +
> +    QLIST_FOREACH_SAFE(cur, head, list, next) {
> +        QLIST_REMOVE(cur, list);
> +        cur->release(cur->opaque);
> +        g_free(cur);
> +    }

QLIST_REMOVE needs a lock too, so using the lockless
QLIST_INSERT_HEAD_RCU is not necessary.

> +}
> +
> +void qemu_reclaimer_enqueue(void *opaque, ReleaseHandler *release)
> +{
> +    Chunk *r = g_malloc0(sizeof(Chunk));
> +    r->opaque = opaque;
> +    r->release = release;
> +    qemu_mutex_lock(&reclaimer_lock);
> +    QLIST_INSERT_HEAD_RCU(&reclaimer_list, r, list);
> +    qemu_mutex_unlock(&reclaimer_lock);
> +}
> +
> +
> +void qemu_reclaimer(void)
> +{
> +    Chunk *cur, *next;
> +
> +    QLIST_FOREACH_SAFE(cur, &reclaimer_list, list, next) {
> +        QLIST_REMOVE(cur, list);
> +        cur->release(cur->opaque);
> +        g_free(cur);
> +    }

Same here.

> +}
> 

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 03/15] qom: introduce reclaimer to release obj
@ 2012-08-08  9:35     ` Paolo Bonzini
  0 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-08  9:35 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Collect unused object and release them at caller demand.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  include/qemu/reclaimer.h |   28 ++++++++++++++++++++++
>  main-loop.c              |    5 ++++
>  qemu-tool.c              |    5 ++++
>  qom/Makefile.objs        |    2 +-
>  qom/reclaimer.c          |   58 ++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 97 insertions(+), 1 deletions(-)
>  create mode 100644 include/qemu/reclaimer.h
>  create mode 100644 qom/reclaimer.c
> 
> diff --git a/include/qemu/reclaimer.h b/include/qemu/reclaimer.h
> new file mode 100644
> index 0000000..9307e93
> --- /dev/null
> +++ b/include/qemu/reclaimer.h
> @@ -0,0 +1,28 @@
> +/*
> + * QEMU reclaimer
> + *
> + * Copyright IBM, Corp. 2012
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_RECLAIMER
> +#define QEMU_RECLAIMER
> +
> +typedef void ReleaseHandler(void *opaque);
> +typedef struct Chunk {
> +    QLIST_ENTRY(Chunk) list;
> +    void *opaque;
> +    ReleaseHandler *release;
> +} Chunk;
> +
> +typedef struct ChunkHead {
> +        struct Chunk *lh_first;
> +} ChunkHead;
> +
> +void reclaimer_enqueue(ChunkHead *head, void *opaque, ReleaseHandler *release);
> +void reclaimer_worker(ChunkHead *head);
> +void qemu_reclaimer_enqueue(void *opaque, ReleaseHandler *release);
> +void qemu_reclaimer(void);

So "enqueue" is call_rcu and qemu_reclaimer marks a quiescent state +
empties the pending call_rcu.

But what's the difference between the two pairs of APIs?

> +#endif
> diff --git a/main-loop.c b/main-loop.c
> index eb3b6e6..be9d095 100644
> --- a/main-loop.c
> +++ b/main-loop.c
> @@ -26,6 +26,7 @@
>  #include "qemu-timer.h"
>  #include "slirp/slirp.h"
>  #include "main-loop.h"
> +#include "qemu/reclaimer.h"
>  
>  #ifndef _WIN32
>  
> @@ -505,5 +506,9 @@ int main_loop_wait(int nonblocking)
>         them.  */
>      qemu_bh_poll();
>  
> +    /* ref to device from iohandler/bh/timer do not obey the rules, so delay
> +     * reclaiming until now.
> +     */
> +    qemu_reclaimer();
>      return ret;
>  }
> diff --git a/qemu-tool.c b/qemu-tool.c
> index 318c5fc..f5fe319 100644
> --- a/qemu-tool.c
> +++ b/qemu-tool.c
> @@ -21,6 +21,7 @@
>  #include "main-loop.h"
>  #include "qemu_socket.h"
>  #include "slirp/libslirp.h"
> +#include "qemu/reclaimer.h"
>  
>  #include <sys/time.h>
>  
> @@ -75,6 +76,10 @@ void qemu_mutex_unlock_iothread(void)
>  {
>  }
>  
> +void qemu_reclaimer(void)
> +{
> +}
> +
>  int use_icount;
>  
>  void qemu_clock_warp(QEMUClock *clock)
> diff --git a/qom/Makefile.objs b/qom/Makefile.objs
> index 5ef060a..a579261 100644
> --- a/qom/Makefile.objs
> +++ b/qom/Makefile.objs
> @@ -1,4 +1,4 @@
> -qom-obj-y = object.o container.o qom-qobject.o
> +qom-obj-y = object.o container.o qom-qobject.o reclaimer.o
>  qom-obj-twice-y = cpu.o
>  common-obj-y = $(qom-obj-twice-y)
>  user-obj-y = $(qom-obj-twice-y)
> diff --git a/qom/reclaimer.c b/qom/reclaimer.c
> new file mode 100644
> index 0000000..6cb53e3
> --- /dev/null
> +++ b/qom/reclaimer.c
> @@ -0,0 +1,58 @@
> +/*
> + * QEMU reclaimer
> + *
> + * Copyright IBM, Corp. 2012
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu-common.h"
> +#include "qemu-thread.h"
> +#include "main-loop.h"
> +#include "qemu-queue.h"
> +#include "qemu/reclaimer.h"
> +
> +static struct QemuMutex reclaimer_lock;
> +static QLIST_HEAD(rcl, Chunk) reclaimer_list;
> +
> +void reclaimer_enqueue(ChunkHead *head, void *opaque, ReleaseHandler *release)
> +{
> +    Chunk *r = g_malloc0(sizeof(Chunk));
> +    r->opaque = opaque;
> +    r->release = release;
> +    QLIST_INSERT_HEAD_RCU(head, r, list);
> +}

No lock?

> +void reclaimer_worker(ChunkHead *head)
> +{
> +    Chunk *cur, *next;
> +
> +    QLIST_FOREACH_SAFE(cur, head, list, next) {
> +        QLIST_REMOVE(cur, list);
> +        cur->release(cur->opaque);
> +        g_free(cur);
> +    }

QLIST_REMOVE needs a lock too, so using the lockless
QLIST_INSERT_HEAD_RCU is not necessary.

> +}
> +
> +void qemu_reclaimer_enqueue(void *opaque, ReleaseHandler *release)
> +{
> +    Chunk *r = g_malloc0(sizeof(Chunk));
> +    r->opaque = opaque;
> +    r->release = release;
> +    qemu_mutex_lock(&reclaimer_lock);
> +    QLIST_INSERT_HEAD_RCU(&reclaimer_list, r, list);
> +    qemu_mutex_unlock(&reclaimer_lock);
> +}
> +
> +
> +void qemu_reclaimer(void)
> +{
> +    Chunk *cur, *next;
> +
> +    QLIST_FOREACH_SAFE(cur, &reclaimer_list, list, next) {
> +        QLIST_REMOVE(cur, list);
> +        cur->release(cur->opaque);
> +        g_free(cur);
> +    }

Same here.

> +}
> 

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 14/15] qom: object_unref call reclaimer
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  9:40     ` Paolo Bonzini
  -1 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-08  9:40 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: qemu-devel, kvm, Anthony Liguori, Avi Kivity, Jan Kiszka,
	Marcelo Tosatti, Stefan Hajnoczi, Blue Swirl,
	Andreas Färber

Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> iohandler/bh/timer may use DeviceState when its refcnt=0,
> postpone the reclaimer till they have done with it.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  qom/object.c |    9 ++++++++-
>  1 files changed, 8 insertions(+), 1 deletions(-)
> 
> diff --git a/qom/object.c b/qom/object.c
> index 822bdb7..1452b1b 100644
> --- a/qom/object.c
> +++ b/qom/object.c
> @@ -23,6 +23,8 @@
>  #include "qbool.h"
>  #include "qint.h"
>  #include "qstring.h"
> +#include "hw/qdev.h"
> +#include "qemu/reclaimer.h"
>  
>  #define MAX_INTERFACES 32
>  
> @@ -646,7 +648,12 @@ void object_unref(Object *obj)
>  {
>      g_assert(atomic_read(&obj->ref) > 0);
>      if (atomic_dec_and_test(&obj->ref)) {
> -        object_finalize(obj);
> +        /* fixme, maybe introduce obj->finalze to make this more elegant */
> +        if (object_dynamic_cast(obj, TYPE_DEVICE) != NULL) {
> +            qemu_reclaimer_enqueue(obj, object_finalize);
> +        } else {
> +            object_finalize(obj);
> +        }
>      }
>  }
>  
> 

Just do this unconditionally.

Paolo


^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 14/15] qom: object_unref call reclaimer
@ 2012-08-08  9:40     ` Paolo Bonzini
  0 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-08  9:40 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> iohandler/bh/timer may use DeviceState when its refcnt=0,
> postpone the reclaimer till they have done with it.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  qom/object.c |    9 ++++++++-
>  1 files changed, 8 insertions(+), 1 deletions(-)
> 
> diff --git a/qom/object.c b/qom/object.c
> index 822bdb7..1452b1b 100644
> --- a/qom/object.c
> +++ b/qom/object.c
> @@ -23,6 +23,8 @@
>  #include "qbool.h"
>  #include "qint.h"
>  #include "qstring.h"
> +#include "hw/qdev.h"
> +#include "qemu/reclaimer.h"
>  
>  #define MAX_INTERFACES 32
>  
> @@ -646,7 +648,12 @@ void object_unref(Object *obj)
>  {
>      g_assert(atomic_read(&obj->ref) > 0);
>      if (atomic_dec_and_test(&obj->ref)) {
> -        object_finalize(obj);
> +        /* fixme, maybe introduce obj->finalze to make this more elegant */
> +        if (object_dynamic_cast(obj, TYPE_DEVICE) != NULL) {
> +            qemu_reclaimer_enqueue(obj, object_finalize);
> +        } else {
> +            object_finalize(obj);
> +        }
>      }
>  }
>  
> 

Just do this unconditionally.

Paolo

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 09/15] memory: prepare flatview and radix-tree for rcu style access
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  9:41     ` Avi Kivity
  -1 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:41 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Flatview and radix view are all under the protection of pointer.
> And this make sure the change of them seem to be atomic!
> 
> The mr accessed by radix-tree leaf or flatview will be reclaimed
> after the prev PhysMap not in use any longer
> 

IMO this cleverness should come much later.  Let's first take care of
dropping the big qemu lock, then make swithcing memory maps more efficient.

The initial paths could look like:

  lookup:
     take mem_map_lock
     lookup
     take ref
     drop mem_map_lock

  update:
     take mem_map_lock (in core_begin)
     do updates
     drop memo_map_lock

Later we can replace mem_map_lock with either a rwlock or (real) rcu.


>  
>  #if !defined(CONFIG_USER_ONLY)
>  
> -static void phys_map_node_reserve(unsigned nodes)
> +static void phys_map_node_reserve(PhysMap *map, unsigned nodes)
>  {
> -    if (phys_map_nodes_nb + nodes > phys_map_nodes_nb_alloc) {
> +    if (map->phys_map_nodes_nb + nodes > map->phys_map_nodes_nb_alloc) {
>          typedef PhysPageEntry Node[L2_SIZE];
> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc * 2, 16);
> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc,
> -                                      phys_map_nodes_nb + nodes);
> -        phys_map_nodes = g_renew(Node, phys_map_nodes,
> -                                 phys_map_nodes_nb_alloc);
> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc * 2,
> +                                                                        16);
> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc,
> +                                      map->phys_map_nodes_nb + nodes);
> +        map->phys_map_nodes = g_renew(Node, map->phys_map_nodes,
> +                                 map->phys_map_nodes_nb_alloc);
>      }
>  }

Please have a patch that just adds the map parameter to all these
functions.  This makes the later patch, that adds the copy, easier to read.

> +
> +void cur_map_update(PhysMap *next)
> +{
> +    qemu_mutex_lock(&cur_map_lock);
> +    physmap_put(cur_map);
> +    cur_map = next;
> +    smp_mb();
> +    qemu_mutex_unlock(&cur_map_lock);
> +}

IMO this can be mem_map_lock.

If we take my previous suggestion:

  lookup:
     take mem_map_lock
     lookup
     take ref
     drop mem_map_lock

  update:
     take mem_map_lock (in core_begin)
     do updates
     drop memo_map_lock

And update it to


  update:
     prepare next_map (in core_begin)
     do updates
     take mem_map_lock (in core_commit)
     switch maps
     drop mem_map_lock
     free old map


Note the lookup path copies the MemoryRegionSection instead of
referencing it.  Thus we can destroy the old map without worrying; the
only pointers will point to MemoryRegions, which will be protected by
the refcounts on their Objects.

This can be easily switched to rcu:

  update:
     prepare next_map (in core_begin)
     do updates
     switch maps - rcu_assign_pointer
     call_rcu(free old map) (or synchronize_rcu; free old maps)

Again, this should be done after the simplictic patch that enables
parallel lookup but keeps just one map.



-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 09/15] memory: prepare flatview and radix-tree for rcu style access
@ 2012-08-08  9:41     ` Avi Kivity
  0 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:41 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Flatview and radix view are all under the protection of pointer.
> And this make sure the change of them seem to be atomic!
> 
> The mr accessed by radix-tree leaf or flatview will be reclaimed
> after the prev PhysMap not in use any longer
> 

IMO this cleverness should come much later.  Let's first take care of
dropping the big qemu lock, then make swithcing memory maps more efficient.

The initial paths could look like:

  lookup:
     take mem_map_lock
     lookup
     take ref
     drop mem_map_lock

  update:
     take mem_map_lock (in core_begin)
     do updates
     drop memo_map_lock

Later we can replace mem_map_lock with either a rwlock or (real) rcu.


>  
>  #if !defined(CONFIG_USER_ONLY)
>  
> -static void phys_map_node_reserve(unsigned nodes)
> +static void phys_map_node_reserve(PhysMap *map, unsigned nodes)
>  {
> -    if (phys_map_nodes_nb + nodes > phys_map_nodes_nb_alloc) {
> +    if (map->phys_map_nodes_nb + nodes > map->phys_map_nodes_nb_alloc) {
>          typedef PhysPageEntry Node[L2_SIZE];
> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc * 2, 16);
> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc,
> -                                      phys_map_nodes_nb + nodes);
> -        phys_map_nodes = g_renew(Node, phys_map_nodes,
> -                                 phys_map_nodes_nb_alloc);
> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc * 2,
> +                                                                        16);
> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc,
> +                                      map->phys_map_nodes_nb + nodes);
> +        map->phys_map_nodes = g_renew(Node, map->phys_map_nodes,
> +                                 map->phys_map_nodes_nb_alloc);
>      }
>  }

Please have a patch that just adds the map parameter to all these
functions.  This makes the later patch, that adds the copy, easier to read.

> +
> +void cur_map_update(PhysMap *next)
> +{
> +    qemu_mutex_lock(&cur_map_lock);
> +    physmap_put(cur_map);
> +    cur_map = next;
> +    smp_mb();
> +    qemu_mutex_unlock(&cur_map_lock);
> +}

IMO this can be mem_map_lock.

If we take my previous suggestion:

  lookup:
     take mem_map_lock
     lookup
     take ref
     drop mem_map_lock

  update:
     take mem_map_lock (in core_begin)
     do updates
     drop memo_map_lock

And update it to


  update:
     prepare next_map (in core_begin)
     do updates
     take mem_map_lock (in core_commit)
     switch maps
     drop mem_map_lock
     free old map


Note the lookup path copies the MemoryRegionSection instead of
referencing it.  Thus we can destroy the old map without worrying; the
only pointers will point to MemoryRegions, which will be protected by
the refcounts on their Objects.

This can be easily switched to rcu:

  update:
     prepare next_map (in core_begin)
     do updates
     switch maps - rcu_assign_pointer
     call_rcu(free old map) (or synchronize_rcu; free old maps)

Again, this should be done after the simplictic patch that enables
parallel lookup but keeps just one map.



-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 11/15] lock: introduce global lock for device tree
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  9:41     ` Paolo Bonzini
  -1 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-08  9:41 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  cpus.c      |   12 ++++++++++++
>  main-loop.h |    3 +++
>  2 files changed, 15 insertions(+), 0 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index b182b3d..a734b36 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -611,6 +611,7 @@ static void qemu_tcg_init_cpu_signals(void)
>  }
>  #endif /* _WIN32 */
>  
> +QemuMutex qemu_device_tree_mutex;
>  QemuMutex qemu_global_mutex;
>  static QemuCond qemu_io_proceeded_cond;
>  static bool iothread_requesting_mutex;
> @@ -634,6 +635,7 @@ void qemu_init_cpu_loop(void)
>      qemu_cond_init(&qemu_work_cond);
>      qemu_cond_init(&qemu_io_proceeded_cond);
>      qemu_mutex_init(&qemu_global_mutex);
> +    qemu_mutex_init(&qemu_device_tree_mutex);
>  
>      qemu_thread_get_self(&io_thread);
>  }
> @@ -911,6 +913,16 @@ void qemu_mutex_unlock_iothread(void)
>      qemu_mutex_unlock(&qemu_global_mutex);
>  }
>  
> +void qemu_lock_devtree(void)
> +{
> +    qemu_mutex_lock(&qemu_device_tree_mutex);
> +}
> +
> +void qemu_unlock_devtree(void)
> +{
> +    qemu_mutex_unlock(&qemu_device_tree_mutex);
> +}

We don't need the wrappers.  They are there for the big lock just
because TCG needs extra work for iothread_requesting_mutex.

Paolo

>  static int all_vcpus_paused(void)
>  {
>      CPUArchState *penv = first_cpu;
> diff --git a/main-loop.h b/main-loop.h
> index dce1cd9..17e959a 100644
> --- a/main-loop.h
> +++ b/main-loop.h
> @@ -353,6 +353,9 @@ void qemu_mutex_lock_iothread(void);
>   */
>  void qemu_mutex_unlock_iothread(void);
>  
> +void qemu_lock_devtree(void);
> +void qemu_unlock_devtree(void);
> +
>  /* internal interfaces */
>  
>  void qemu_fd_register(int fd);
> 

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 11/15] lock: introduce global lock for device tree
@ 2012-08-08  9:41     ` Paolo Bonzini
  0 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-08  9:41 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  cpus.c      |   12 ++++++++++++
>  main-loop.h |    3 +++
>  2 files changed, 15 insertions(+), 0 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index b182b3d..a734b36 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -611,6 +611,7 @@ static void qemu_tcg_init_cpu_signals(void)
>  }
>  #endif /* _WIN32 */
>  
> +QemuMutex qemu_device_tree_mutex;
>  QemuMutex qemu_global_mutex;
>  static QemuCond qemu_io_proceeded_cond;
>  static bool iothread_requesting_mutex;
> @@ -634,6 +635,7 @@ void qemu_init_cpu_loop(void)
>      qemu_cond_init(&qemu_work_cond);
>      qemu_cond_init(&qemu_io_proceeded_cond);
>      qemu_mutex_init(&qemu_global_mutex);
> +    qemu_mutex_init(&qemu_device_tree_mutex);
>  
>      qemu_thread_get_self(&io_thread);
>  }
> @@ -911,6 +913,16 @@ void qemu_mutex_unlock_iothread(void)
>      qemu_mutex_unlock(&qemu_global_mutex);
>  }
>  
> +void qemu_lock_devtree(void)
> +{
> +    qemu_mutex_lock(&qemu_device_tree_mutex);
> +}
> +
> +void qemu_unlock_devtree(void)
> +{
> +    qemu_mutex_unlock(&qemu_device_tree_mutex);
> +}

We don't need the wrappers.  They are there for the big lock just
because TCG needs extra work for iothread_requesting_mutex.

Paolo

>  static int all_vcpus_paused(void)
>  {
>      CPUArchState *penv = first_cpu;
> diff --git a/main-loop.h b/main-loop.h
> index dce1cd9..17e959a 100644
> --- a/main-loop.h
> +++ b/main-loop.h
> @@ -353,6 +353,9 @@ void qemu_mutex_lock_iothread(void);
>   */
>  void qemu_mutex_unlock_iothread(void);
>  
> +void qemu_lock_devtree(void);
> +void qemu_unlock_devtree(void);
> +
>  /* internal interfaces */
>  
>  void qemu_fd_register(int fd);
> 

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 11/15] lock: introduce global lock for device tree
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  9:42     ` Avi Kivity
  -1 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:42 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: qemu-devel, kvm, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber

On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 

Please explain the motivation.  AFAICT, the big qemu lock is sufficient.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 11/15] lock: introduce global lock for device tree
@ 2012-08-08  9:42     ` Avi Kivity
  0 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08  9:42 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 

Please explain the motivation.  AFAICT, the big qemu lock is sufficient.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 13/15] hotplug: introduce qdev_unplug_complete() to remove device from views
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  9:52     ` Paolo Bonzini
  -1 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-08  9:52 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
> +void qdev_unplug_complete(DeviceState *dev, Error **errp)
> +{
> +    /* isolate from mem view */
> +    qdev_unmap(dev);
> +    qemu_lock_devtree();
> +    /* isolate from device tree */
> +    qdev_unset_parent(dev);
> +    qemu_unlock_devtree();
> +    object_unref(OBJECT(dev));

Rather than deferring the free, you should defer the unref.  Otherwise
the following can happen when you have "real" RCU access to the memory
map on the read-side:

    VCPU thread                    I/O thread
=====================================================================
    get MMIO request
    rcu_read_lock()
    walk memory map
                                   qdev_unmap()
                                   lock_devtree()
                                   ...
                                   unlock_devtree
                                   unref dev -> refcnt=0, free enqueued
    ref()
    rcu_read_unlock()
                                   free()
    <dangling pointer!>

If you defer the unref, you have instead

    VCPU thread                    I/O thread
=====================================================================
    get MMIO request
    rcu_read_lock()
    walk memory map
                                   qdev_unmap()
                                   lock_devtree()
                                   ...
                                   unlock_devtree
                                   unref is enqueued
    ref() -> refcnt = 2
    rcu_read_unlock()
                                   unref() -> refcnt=1
    unref() -> refcnt = 1

So this also makes patch 14 unnecessary.

Paolo

> +}

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 13/15] hotplug: introduce qdev_unplug_complete() to remove device from views
@ 2012-08-08  9:52     ` Paolo Bonzini
  0 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-08  9:52 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
> +void qdev_unplug_complete(DeviceState *dev, Error **errp)
> +{
> +    /* isolate from mem view */
> +    qdev_unmap(dev);
> +    qemu_lock_devtree();
> +    /* isolate from device tree */
> +    qdev_unset_parent(dev);
> +    qemu_unlock_devtree();
> +    object_unref(OBJECT(dev));

Rather than deferring the free, you should defer the unref.  Otherwise
the following can happen when you have "real" RCU access to the memory
map on the read-side:

    VCPU thread                    I/O thread
=====================================================================
    get MMIO request
    rcu_read_lock()
    walk memory map
                                   qdev_unmap()
                                   lock_devtree()
                                   ...
                                   unlock_devtree
                                   unref dev -> refcnt=0, free enqueued
    ref()
    rcu_read_unlock()
                                   free()
    <dangling pointer!>

If you defer the unref, you have instead

    VCPU thread                    I/O thread
=====================================================================
    get MMIO request
    rcu_read_lock()
    walk memory map
                                   qdev_unmap()
                                   lock_devtree()
                                   ...
                                   unlock_devtree
                                   unref is enqueued
    ref() -> refcnt = 2
    rcu_read_unlock()
                                   unref() -> refcnt=1
    unref() -> refcnt = 1

So this also makes patch 14 unnecessary.

Paolo

> +}

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 15/15] e1000: using new interface--unmap to unplug
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08  9:56     ` Paolo Bonzini
  -1 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-08  9:56 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: qemu-devel, kvm, Anthony Liguori, Avi Kivity, Jan Kiszka,
	Marcelo Tosatti, Stefan Hajnoczi, Blue Swirl,
	Andreas Färber

Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
>  
> +static void
> +pci_e1000_unmap(PCIDevice *p)
> +{
> +    /* DO NOT FREE anything!until refcnt=0 */
> +    /* isolate from memory view */
> +}

At least you need to call the superclass method.

Paolo

>  static int
>  pci_e1000_uninit(PCIDevice *dev)
>  {
> @@ -1275,6 +1282,7 @@ static void e1000_class_init(ObjectClass *klass, void *data)
>      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>  
>      k->init = pci_e1000_init;
> +    k->unmap = pci_e1000_unmap;
>      k->exit = pci_e1000_uninit;
>      k->romfile = "pxe-e1000.rom";
>      k->vendor_id = PCI_VENDOR_ID_INTEL;



^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 15/15] e1000: using new interface--unmap to unplug
@ 2012-08-08  9:56     ` Paolo Bonzini
  0 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-08  9:56 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
>  
> +static void
> +pci_e1000_unmap(PCIDevice *p)
> +{
> +    /* DO NOT FREE anything!until refcnt=0 */
> +    /* isolate from memory view */
> +}

At least you need to call the superclass method.

Paolo

>  static int
>  pci_e1000_uninit(PCIDevice *dev)
>  {
> @@ -1275,6 +1282,7 @@ static void e1000_class_init(ObjectClass *klass, void *data)
>      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>  
>      k->init = pci_e1000_init;
> +    k->unmap = pci_e1000_unmap;
>      k->exit = pci_e1000_uninit;
>      k->romfile = "pxe-e1000.rom";
>      k->vendor_id = PCI_VENDOR_ID_INTEL;

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 13/15] hotplug: introduce qdev_unplug_complete() to remove device from views
  2012-08-08  9:52     ` [Qemu-devel] " Paolo Bonzini
@ 2012-08-08 10:07       ` Avi Kivity
  -1 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08 10:07 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, Liu Ping Fan, qemu-devel,
	Blue Swirl, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

On 08/08/2012 12:52 PM, Paolo Bonzini wrote:
> Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
>> +void qdev_unplug_complete(DeviceState *dev, Error **errp)
>> +{
>> +    /* isolate from mem view */
>> +    qdev_unmap(dev);
>> +    qemu_lock_devtree();
>> +    /* isolate from device tree */
>> +    qdev_unset_parent(dev);
>> +    qemu_unlock_devtree();
>> +    object_unref(OBJECT(dev));
> 
> Rather than deferring the free, you should defer the unref.  Otherwise
> the following can happen when you have "real" RCU access to the memory
> map on the read-side:
> 
>     VCPU thread                    I/O thread
> =====================================================================
>     get MMIO request
>     rcu_read_lock()
>     walk memory map
>                                    qdev_unmap()
>                                    lock_devtree()
>                                    ...
>                                    unlock_devtree
>                                    unref dev -> refcnt=0, free enqueued
>     ref()
>     rcu_read_unlock()
>                                    free()
>     <dangling pointer!>

unref should follow either synchronize_rcu(), or be in a call_rcu()
callback (deferring the unref).  IMO synchronize_rcu() is sufficient
here, unplug is a truly slow path, esp. on real hardware.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 13/15] hotplug: introduce qdev_unplug_complete() to remove device from views
@ 2012-08-08 10:07       ` Avi Kivity
  0 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08 10:07 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, Liu Ping Fan, qemu-devel,
	Blue Swirl, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

On 08/08/2012 12:52 PM, Paolo Bonzini wrote:
> Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
>> +void qdev_unplug_complete(DeviceState *dev, Error **errp)
>> +{
>> +    /* isolate from mem view */
>> +    qdev_unmap(dev);
>> +    qemu_lock_devtree();
>> +    /* isolate from device tree */
>> +    qdev_unset_parent(dev);
>> +    qemu_unlock_devtree();
>> +    object_unref(OBJECT(dev));
> 
> Rather than deferring the free, you should defer the unref.  Otherwise
> the following can happen when you have "real" RCU access to the memory
> map on the read-side:
> 
>     VCPU thread                    I/O thread
> =====================================================================
>     get MMIO request
>     rcu_read_lock()
>     walk memory map
>                                    qdev_unmap()
>                                    lock_devtree()
>                                    ...
>                                    unlock_devtree
>                                    unref dev -> refcnt=0, free enqueued
>     ref()
>     rcu_read_unlock()
>                                    free()
>     <dangling pointer!>

unref should follow either synchronize_rcu(), or be in a call_rcu()
callback (deferring the unref).  IMO synchronize_rcu() is sufficient
here, unplug is a truly slow path, esp. on real hardware.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 01/15] atomic: introduce atomic operations
  2012-08-08  9:21     ` Peter Maydell
@ 2012-08-08 13:09       ` Stefan Hajnoczi
  -1 siblings, 0 replies; 154+ messages in thread
From: Stefan Hajnoczi @ 2012-08-08 13:09 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Liu Ping Fan, qemu-devel, kvm, Marcelo Tosatti, Blue Swirl,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini,
	Andreas Färber

On Wed, Aug 8, 2012 at 10:21 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 8 August 2012 07:25, Liu Ping Fan <qemulist@gmail.com> wrote:
>> +static inline void atomic_sub(int i, Atomic *v)
>> +{
>> +    asm volatile("lock; subl %1,%0"
>> +             : "+m" (v->counter)
>> +             : "ir" (i));
>> +}
>
> NAK. We don't want random inline assembly implementations of locking
> primitives in QEMU, they are way too hard to keep working with all the
> possible host architectures we support. I spent some time a while back
> getting rid of the (variously busted) versions we had previously.
>
> If you absolutely must use atomic ops, use the gcc builtins. For
> preference, stick to higher level and less error-prone abstractions.

We're spoilt for choice here:

1. Atomic built-ins from gcc
2. glib atomics

No need to roll our own or copy the implementation from the kernel.

Stefan

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 01/15] atomic: introduce atomic operations
@ 2012-08-08 13:09       ` Stefan Hajnoczi
  0 siblings, 0 replies; 154+ messages in thread
From: Stefan Hajnoczi @ 2012-08-08 13:09 UTC (permalink / raw)
  To: Peter Maydell
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, Liu Ping Fan, qemu-devel,
	Blue Swirl, Avi Kivity, Anthony Liguori, Paolo Bonzini,
	Andreas Färber

On Wed, Aug 8, 2012 at 10:21 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 8 August 2012 07:25, Liu Ping Fan <qemulist@gmail.com> wrote:
>> +static inline void atomic_sub(int i, Atomic *v)
>> +{
>> +    asm volatile("lock; subl %1,%0"
>> +             : "+m" (v->counter)
>> +             : "ir" (i));
>> +}
>
> NAK. We don't want random inline assembly implementations of locking
> primitives in QEMU, they are way too hard to keep working with all the
> possible host architectures we support. I spent some time a while back
> getting rid of the (variously busted) versions we had previously.
>
> If you absolutely must use atomic ops, use the gcc builtins. For
> preference, stick to higher level and less error-prone abstractions.

We're spoilt for choice here:

1. Atomic built-ins from gcc
2. glib atomics

No need to roll our own or copy the implementation from the kernel.

Stefan

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 01/15] atomic: introduce atomic operations
  2012-08-08 13:09       ` Stefan Hajnoczi
@ 2012-08-08 13:18         ` Paolo Bonzini
  -1 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-08 13:18 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Peter Maydell, Liu Ping Fan, qemu-devel, kvm, Marcelo Tosatti,
	Blue Swirl, Avi Kivity, Anthony Liguori, Jan Kiszka,
	Andreas Färber

Il 08/08/2012 15:09, Stefan Hajnoczi ha scritto:
>> > NAK. We don't want random inline assembly implementations of locking
>> > primitives in QEMU, they are way too hard to keep working with all the
>> > possible host architectures we support. I spent some time a while back
>> > getting rid of the (variously busted) versions we had previously.
>> >
>> > If you absolutely must use atomic ops, use the gcc builtins. For
>> > preference, stick to higher level and less error-prone abstractions.
> We're spoilt for choice here:
> 
> 1. Atomic built-ins from gcc
> 2. glib atomics
> 
> No need to roll our own or copy the implementation from the kernel.

To some extent we need to because:

1. GCC atomics look ugly, :) do not provide rmb/wmb, and in some
versions of GCC mb is known to be (wrongly) a no-op.

2. glib atomics do not provide mb/rmb/wmb either, and
g_atomic_int_get/g_atomic_int_set are inefficient: they add barriers
everywhere, while it is clearer if you put barriers manually, and you
often do not need barriers in the get side.  glib atomics also do not
provide xchg.

I agree however that a small wrapper around GCC atomics is much better
than assembly.  Assembly can be limited to the memory barriers (where we
already have it anyway).

Paolo

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 01/15] atomic: introduce atomic operations
@ 2012-08-08 13:18         ` Paolo Bonzini
  0 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-08 13:18 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Peter Maydell, kvm, Jan Kiszka, Marcelo Tosatti, Liu Ping Fan,
	qemu-devel, Blue Swirl, Avi Kivity, Anthony Liguori,
	Andreas Färber

Il 08/08/2012 15:09, Stefan Hajnoczi ha scritto:
>> > NAK. We don't want random inline assembly implementations of locking
>> > primitives in QEMU, they are way too hard to keep working with all the
>> > possible host architectures we support. I spent some time a while back
>> > getting rid of the (variously busted) versions we had previously.
>> >
>> > If you absolutely must use atomic ops, use the gcc builtins. For
>> > preference, stick to higher level and less error-prone abstractions.
> We're spoilt for choice here:
> 
> 1. Atomic built-ins from gcc
> 2. glib atomics
> 
> No need to roll our own or copy the implementation from the kernel.

To some extent we need to because:

1. GCC atomics look ugly, :) do not provide rmb/wmb, and in some
versions of GCC mb is known to be (wrongly) a no-op.

2. glib atomics do not provide mb/rmb/wmb either, and
g_atomic_int_get/g_atomic_int_set are inefficient: they add barriers
everywhere, while it is clearer if you put barriers manually, and you
often do not need barriers in the get side.  glib atomics also do not
provide xchg.

I agree however that a small wrapper around GCC atomics is much better
than assembly.  Assembly can be limited to the memory barriers (where we
already have it anyway).

Paolo

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 01/15] atomic: introduce atomic operations
  2012-08-08 13:18         ` Paolo Bonzini
@ 2012-08-08 13:32           ` Peter Maydell
  -1 siblings, 0 replies; 154+ messages in thread
From: Peter Maydell @ 2012-08-08 13:32 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, Liu Ping Fan, qemu-devel,
	Blue Swirl, Avi Kivity, Anthony Liguori, Jan Kiszka,
	Andreas Färber

On 8 August 2012 14:18, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 08/08/2012 15:09, Stefan Hajnoczi ha scritto:
>> No need to roll our own or copy the implementation from the kernel.
>
> To some extent we need to because:
>
> 1. GCC atomics look ugly, :) do not provide rmb/wmb, and in some
> versions of GCC mb is known to be (wrongly) a no-op.
>
> 2. glib atomics do not provide mb/rmb/wmb either, and
> g_atomic_int_get/g_atomic_int_set are inefficient: they add barriers
> everywhere, while it is clearer if you put barriers manually, and you
> often do not need barriers in the get side.  glib atomics also do not
> provide xchg.

These are arguments in favour of "don't try to use atomic ops" --
if serious large projects like GCC and glib can't produce working
efficient implementations for all target architectures, what chance
do we have?

-- PMM

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 01/15] atomic: introduce atomic operations
@ 2012-08-08 13:32           ` Peter Maydell
  0 siblings, 0 replies; 154+ messages in thread
From: Peter Maydell @ 2012-08-08 13:32 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, Liu Ping Fan, qemu-devel,
	Blue Swirl, Avi Kivity, Anthony Liguori, Jan Kiszka,
	Andreas Färber

On 8 August 2012 14:18, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 08/08/2012 15:09, Stefan Hajnoczi ha scritto:
>> No need to roll our own or copy the implementation from the kernel.
>
> To some extent we need to because:
>
> 1. GCC atomics look ugly, :) do not provide rmb/wmb, and in some
> versions of GCC mb is known to be (wrongly) a no-op.
>
> 2. glib atomics do not provide mb/rmb/wmb either, and
> g_atomic_int_get/g_atomic_int_set are inefficient: they add barriers
> everywhere, while it is clearer if you put barriers manually, and you
> often do not need barriers in the get side.  glib atomics also do not
> provide xchg.

These are arguments in favour of "don't try to use atomic ops" --
if serious large projects like GCC and glib can't produce working
efficient implementations for all target architectures, what chance
do we have?

-- PMM

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 01/15] atomic: introduce atomic operations
  2012-08-08 13:32           ` [Qemu-devel] " Peter Maydell
@ 2012-08-08 13:49             ` Paolo Bonzini
  -1 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-08 13:49 UTC (permalink / raw)
  To: Peter Maydell
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, Liu Ping Fan, qemu-devel,
	Blue Swirl, Avi Kivity, Anthony Liguori, Jan Kiszka,
	Andreas Färber

Il 08/08/2012 15:32, Peter Maydell ha scritto:
>> > 1. GCC atomics look ugly, :) do not provide rmb/wmb, and in some
>> > versions of GCC mb is known to be (wrongly) a no-op.
>> >
>> > 2. glib atomics do not provide mb/rmb/wmb either, and
>> > g_atomic_int_get/g_atomic_int_set are inefficient: they add barriers
>> > everywhere, while it is clearer if you put barriers manually, and you
>> > often do not need barriers in the get side.  glib atomics also do not
>> > provide xchg.
> These are arguments in favour of "don't try to use atomic ops" --
> if serious large projects like GCC and glib can't produce working
> efficient implementations for all target architectures, what chance
> do we have?

Well, maybe... but the flaws in both GCC and glib are small in size
(even though large in importance, at least for us) and we can work
around them easily.  mb/rmb/wmb is essentially the small set of atomic
operations that we're already using.

Paolo


^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 01/15] atomic: introduce atomic operations
@ 2012-08-08 13:49             ` Paolo Bonzini
  0 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-08 13:49 UTC (permalink / raw)
  To: Peter Maydell
  Cc: kvm, Stefan Hajnoczi, Marcelo Tosatti, Liu Ping Fan, qemu-devel,
	Blue Swirl, Avi Kivity, Anthony Liguori, Jan Kiszka,
	Andreas Färber

Il 08/08/2012 15:32, Peter Maydell ha scritto:
>> > 1. GCC atomics look ugly, :) do not provide rmb/wmb, and in some
>> > versions of GCC mb is known to be (wrongly) a no-op.
>> >
>> > 2. glib atomics do not provide mb/rmb/wmb either, and
>> > g_atomic_int_get/g_atomic_int_set are inefficient: they add barriers
>> > everywhere, while it is clearer if you put barriers manually, and you
>> > often do not need barriers in the get side.  glib atomics also do not
>> > provide xchg.
> These are arguments in favour of "don't try to use atomic ops" --
> if serious large projects like GCC and glib can't produce working
> efficient implementations for all target architectures, what chance
> do we have?

Well, maybe... but the flaws in both GCC and glib are small in size
(even though large in importance, at least for us) and we can work
around them easily.  mb/rmb/wmb is essentially the small set of atomic
operations that we're already using.

Paolo

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 01/15] atomic: introduce atomic operations
  2012-08-08 13:49             ` [Qemu-devel] " Paolo Bonzini
@ 2012-08-08 14:00               ` Avi Kivity
  -1 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08 14:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, kvm, Stefan Hajnoczi, Marcelo Tosatti,
	Liu Ping Fan, qemu-devel, Blue Swirl, Anthony Liguori,
	Jan Kiszka, Andreas Färber

On 08/08/2012 04:49 PM, Paolo Bonzini wrote:
> Il 08/08/2012 15:32, Peter Maydell ha scritto:
>>> > 1. GCC atomics look ugly, :) do not provide rmb/wmb, and in some
>>> > versions of GCC mb is known to be (wrongly) a no-op.
>>> >
>>> > 2. glib atomics do not provide mb/rmb/wmb either, and
>>> > g_atomic_int_get/g_atomic_int_set are inefficient: they add barriers
>>> > everywhere, while it is clearer if you put barriers manually, and you
>>> > often do not need barriers in the get side.  glib atomics also do not
>>> > provide xchg.
>> These are arguments in favour of "don't try to use atomic ops" --
>> if serious large projects like GCC and glib can't produce working
>> efficient implementations for all target architectures, what chance
>> do we have?
> 
> Well, maybe... but the flaws in both GCC and glib are small in size
> (even though large in importance, at least for us) and we can work
> around them easily.  mb/rmb/wmb is essentially the small set of atomic
> operations that we're already using.

We can easily define rmb()/wmb() to be __sync_synchronize() as a
default, and only override them where it matters.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 01/15] atomic: introduce atomic operations
@ 2012-08-08 14:00               ` Avi Kivity
  0 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-08 14:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, kvm, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Liu Ping Fan, Blue Swirl, Anthony Liguori, Jan Kiszka,
	Andreas Färber

On 08/08/2012 04:49 PM, Paolo Bonzini wrote:
> Il 08/08/2012 15:32, Peter Maydell ha scritto:
>>> > 1. GCC atomics look ugly, :) do not provide rmb/wmb, and in some
>>> > versions of GCC mb is known to be (wrongly) a no-op.
>>> >
>>> > 2. glib atomics do not provide mb/rmb/wmb either, and
>>> > g_atomic_int_get/g_atomic_int_set are inefficient: they add barriers
>>> > everywhere, while it is clearer if you put barriers manually, and you
>>> > often do not need barriers in the get side.  glib atomics also do not
>>> > provide xchg.
>> These are arguments in favour of "don't try to use atomic ops" --
>> if serious large projects like GCC and glib can't produce working
>> efficient implementations for all target architectures, what chance
>> do we have?
> 
> Well, maybe... but the flaws in both GCC and glib are small in size
> (even though large in importance, at least for us) and we can work
> around them easily.  mb/rmb/wmb is essentially the small set of atomic
> operations that we're already using.

We can easily define rmb()/wmb() to be __sync_synchronize() as a
default, and only override them where it matters.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 04/15] memory: MemoryRegion topology must be stable when updating
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08 19:17     ` Blue Swirl
  -1 siblings, 0 replies; 154+ messages in thread
From: Blue Swirl @ 2012-08-08 19:17 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Wed, Aug 8, 2012 at 6:25 AM, Liu Ping Fan <qemulist@gmail.com> wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>
> Using mem_map_lock to protect among updaters. So we can get the intact
> snapshot of mem topology -- FlatView & radix-tree.
>
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  exec.c   |    3 +++
>  memory.c |   22 ++++++++++++++++++++++
>  memory.h |    2 ++
>  3 files changed, 27 insertions(+), 0 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 8244d54..0e29ef9 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -210,6 +210,8 @@ static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
>     The bottom level has pointers to MemoryRegionSections.  */
>  static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
>
> +QemuMutex mem_map_lock;
> +
>  static void io_mem_init(void);
>  static void memory_map_init(void);
>
> @@ -637,6 +639,7 @@ void cpu_exec_init_all(void)
>  #if !defined(CONFIG_USER_ONLY)
>      memory_map_init();
>      io_mem_init();
> +    qemu_mutex_init(&mem_map_lock);

I'd move this and the mutex to memory.c since there are no other uses.
The mutex could be static then.

>  #endif
>  }
>
> diff --git a/memory.c b/memory.c
> index aab4a31..5986532 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -761,7 +761,9 @@ void memory_region_transaction_commit(void)
>      assert(memory_region_transaction_depth);
>      --memory_region_transaction_depth;
>      if (!memory_region_transaction_depth && memory_region_update_pending) {
> +        qemu_mutex_lock(&mem_map_lock);
>          memory_region_update_topology(NULL);
> +        qemu_mutex_unlock(&mem_map_lock);
>      }
>  }
>
> @@ -1069,8 +1071,10 @@ void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
>  {
>      uint8_t mask = 1 << client;
>
> +    qemu_mutex_lock(&mem_map_lock);
>      mr->dirty_log_mask = (mr->dirty_log_mask & ~mask) | (log * mask);
>      memory_region_update_topology(mr);
> +    qemu_mutex_unlock(&mem_map_lock);
>  }
>
>  bool memory_region_get_dirty(MemoryRegion *mr, target_phys_addr_t addr,
> @@ -1103,8 +1107,10 @@ void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
>  void memory_region_set_readonly(MemoryRegion *mr, bool readonly)
>  {
>      if (mr->readonly != readonly) {
> +        qemu_mutex_lock(&mem_map_lock);
>          mr->readonly = readonly;
>          memory_region_update_topology(mr);
> +        qemu_mutex_unlock(&mem_map_lock);
>      }
>  }
>
> @@ -1112,7 +1118,9 @@ void memory_region_rom_device_set_readable(MemoryRegion *mr, bool readable)
>  {
>      if (mr->readable != readable) {
>          mr->readable = readable;
> +        qemu_mutex_lock(&mem_map_lock);
>          memory_region_update_topology(mr);
> +        qemu_mutex_unlock(&mem_map_lock);
>      }
>  }
>
> @@ -1206,6 +1214,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
>      };
>      unsigned i;
>
> +    qemu_mutex_lock(&mem_map_lock);
>      for (i = 0; i < mr->ioeventfd_nb; ++i) {
>          if (memory_region_ioeventfd_before(mrfd, mr->ioeventfds[i])) {
>              break;
> @@ -1218,6 +1227,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
>              sizeof(*mr->ioeventfds) * (mr->ioeventfd_nb-1 - i));
>      mr->ioeventfds[i] = mrfd;
>      memory_region_update_topology(mr);
> +    qemu_mutex_unlock(&mem_map_lock);
>  }
>
>  void memory_region_del_eventfd(MemoryRegion *mr,
> @@ -1236,6 +1246,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
>      };
>      unsigned i;
>
> +    qemu_mutex_lock(&mem_map_lock);
>      for (i = 0; i < mr->ioeventfd_nb; ++i) {
>          if (memory_region_ioeventfd_equal(mrfd, mr->ioeventfds[i])) {
>              break;
> @@ -1248,6 +1259,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
>      mr->ioeventfds = g_realloc(mr->ioeventfds,
>                                    sizeof(*mr->ioeventfds)*mr->ioeventfd_nb + 1);
>      memory_region_update_topology(mr);
> +    qemu_mutex_unlock(&mem_map_lock);
>  }
>
>  static void memory_region_add_subregion_common(MemoryRegion *mr,
> @@ -1259,6 +1271,8 @@ static void memory_region_add_subregion_common(MemoryRegion *mr,
>      assert(!subregion->parent);
>      subregion->parent = mr;
>      subregion->addr = offset;
> +
> +    qemu_mutex_lock(&mem_map_lock);
>      QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
>          if (subregion->may_overlap || other->may_overlap) {
>              continue;
> @@ -1289,6 +1303,7 @@ static void memory_region_add_subregion_common(MemoryRegion *mr,
>      QTAILQ_INSERT_TAIL(&mr->subregions, subregion, subregions_link);
>  done:
>      memory_region_update_topology(mr);
> +    qemu_mutex_unlock(&mem_map_lock);
>  }
>
>
> @@ -1316,8 +1331,11 @@ void memory_region_del_subregion(MemoryRegion *mr,
>  {
>      assert(subregion->parent == mr);
>      subregion->parent = NULL;
> +
> +    qemu_mutex_lock(&mem_map_lock);
>      QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
>      memory_region_update_topology(mr);
> +    qemu_mutex_unlock(&mem_map_lock);
>  }
>
>  void memory_region_set_enabled(MemoryRegion *mr, bool enabled)
> @@ -1325,8 +1343,10 @@ void memory_region_set_enabled(MemoryRegion *mr, bool enabled)
>      if (enabled == mr->enabled) {
>          return;
>      }
> +    qemu_mutex_lock(&mem_map_lock);
>      mr->enabled = enabled;
>      memory_region_update_topology(NULL);
> +    qemu_mutex_unlock(&mem_map_lock);
>  }
>
>  void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr)
> @@ -1361,7 +1381,9 @@ void memory_region_set_alias_offset(MemoryRegion *mr, target_phys_addr_t offset)
>          return;
>      }
>
> +    qemu_mutex_lock(&mem_map_lock);
>      memory_region_update_topology(mr);
> +    qemu_mutex_unlock(&mem_map_lock);
>  }
>
>  ram_addr_t memory_region_get_ram_addr(MemoryRegion *mr)
> diff --git a/memory.h b/memory.h
> index 740c48e..fe6aefa 100644
> --- a/memory.h
> +++ b/memory.h
> @@ -25,6 +25,7 @@
>  #include "iorange.h"
>  #include "ioport.h"
>  #include "int128.h"
> +#include "qemu-thread.h"
>
>  typedef struct MemoryRegionOps MemoryRegionOps;
>  typedef struct MemoryRegion MemoryRegion;
> @@ -207,6 +208,7 @@ struct MemoryListener {
>      QTAILQ_ENTRY(MemoryListener) link;
>  };
>
> +extern QemuMutex mem_map_lock;
>  /**
>   * memory_region_init: Initialize a memory region
>   *
> --
> 1.7.4.4
>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 04/15] memory: MemoryRegion topology must be stable when updating
@ 2012-08-08 19:17     ` Blue Swirl
  0 siblings, 0 replies; 154+ messages in thread
From: Blue Swirl @ 2012-08-08 19:17 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Wed, Aug 8, 2012 at 6:25 AM, Liu Ping Fan <qemulist@gmail.com> wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>
> Using mem_map_lock to protect among updaters. So we can get the intact
> snapshot of mem topology -- FlatView & radix-tree.
>
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  exec.c   |    3 +++
>  memory.c |   22 ++++++++++++++++++++++
>  memory.h |    2 ++
>  3 files changed, 27 insertions(+), 0 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 8244d54..0e29ef9 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -210,6 +210,8 @@ static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
>     The bottom level has pointers to MemoryRegionSections.  */
>  static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
>
> +QemuMutex mem_map_lock;
> +
>  static void io_mem_init(void);
>  static void memory_map_init(void);
>
> @@ -637,6 +639,7 @@ void cpu_exec_init_all(void)
>  #if !defined(CONFIG_USER_ONLY)
>      memory_map_init();
>      io_mem_init();
> +    qemu_mutex_init(&mem_map_lock);

I'd move this and the mutex to memory.c since there are no other uses.
The mutex could be static then.

>  #endif
>  }
>
> diff --git a/memory.c b/memory.c
> index aab4a31..5986532 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -761,7 +761,9 @@ void memory_region_transaction_commit(void)
>      assert(memory_region_transaction_depth);
>      --memory_region_transaction_depth;
>      if (!memory_region_transaction_depth && memory_region_update_pending) {
> +        qemu_mutex_lock(&mem_map_lock);
>          memory_region_update_topology(NULL);
> +        qemu_mutex_unlock(&mem_map_lock);
>      }
>  }
>
> @@ -1069,8 +1071,10 @@ void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
>  {
>      uint8_t mask = 1 << client;
>
> +    qemu_mutex_lock(&mem_map_lock);
>      mr->dirty_log_mask = (mr->dirty_log_mask & ~mask) | (log * mask);
>      memory_region_update_topology(mr);
> +    qemu_mutex_unlock(&mem_map_lock);
>  }
>
>  bool memory_region_get_dirty(MemoryRegion *mr, target_phys_addr_t addr,
> @@ -1103,8 +1107,10 @@ void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
>  void memory_region_set_readonly(MemoryRegion *mr, bool readonly)
>  {
>      if (mr->readonly != readonly) {
> +        qemu_mutex_lock(&mem_map_lock);
>          mr->readonly = readonly;
>          memory_region_update_topology(mr);
> +        qemu_mutex_unlock(&mem_map_lock);
>      }
>  }
>
> @@ -1112,7 +1118,9 @@ void memory_region_rom_device_set_readable(MemoryRegion *mr, bool readable)
>  {
>      if (mr->readable != readable) {
>          mr->readable = readable;
> +        qemu_mutex_lock(&mem_map_lock);
>          memory_region_update_topology(mr);
> +        qemu_mutex_unlock(&mem_map_lock);
>      }
>  }
>
> @@ -1206,6 +1214,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
>      };
>      unsigned i;
>
> +    qemu_mutex_lock(&mem_map_lock);
>      for (i = 0; i < mr->ioeventfd_nb; ++i) {
>          if (memory_region_ioeventfd_before(mrfd, mr->ioeventfds[i])) {
>              break;
> @@ -1218,6 +1227,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
>              sizeof(*mr->ioeventfds) * (mr->ioeventfd_nb-1 - i));
>      mr->ioeventfds[i] = mrfd;
>      memory_region_update_topology(mr);
> +    qemu_mutex_unlock(&mem_map_lock);
>  }
>
>  void memory_region_del_eventfd(MemoryRegion *mr,
> @@ -1236,6 +1246,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
>      };
>      unsigned i;
>
> +    qemu_mutex_lock(&mem_map_lock);
>      for (i = 0; i < mr->ioeventfd_nb; ++i) {
>          if (memory_region_ioeventfd_equal(mrfd, mr->ioeventfds[i])) {
>              break;
> @@ -1248,6 +1259,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
>      mr->ioeventfds = g_realloc(mr->ioeventfds,
>                                    sizeof(*mr->ioeventfds)*mr->ioeventfd_nb + 1);
>      memory_region_update_topology(mr);
> +    qemu_mutex_unlock(&mem_map_lock);
>  }
>
>  static void memory_region_add_subregion_common(MemoryRegion *mr,
> @@ -1259,6 +1271,8 @@ static void memory_region_add_subregion_common(MemoryRegion *mr,
>      assert(!subregion->parent);
>      subregion->parent = mr;
>      subregion->addr = offset;
> +
> +    qemu_mutex_lock(&mem_map_lock);
>      QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
>          if (subregion->may_overlap || other->may_overlap) {
>              continue;
> @@ -1289,6 +1303,7 @@ static void memory_region_add_subregion_common(MemoryRegion *mr,
>      QTAILQ_INSERT_TAIL(&mr->subregions, subregion, subregions_link);
>  done:
>      memory_region_update_topology(mr);
> +    qemu_mutex_unlock(&mem_map_lock);
>  }
>
>
> @@ -1316,8 +1331,11 @@ void memory_region_del_subregion(MemoryRegion *mr,
>  {
>      assert(subregion->parent == mr);
>      subregion->parent = NULL;
> +
> +    qemu_mutex_lock(&mem_map_lock);
>      QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
>      memory_region_update_topology(mr);
> +    qemu_mutex_unlock(&mem_map_lock);
>  }
>
>  void memory_region_set_enabled(MemoryRegion *mr, bool enabled)
> @@ -1325,8 +1343,10 @@ void memory_region_set_enabled(MemoryRegion *mr, bool enabled)
>      if (enabled == mr->enabled) {
>          return;
>      }
> +    qemu_mutex_lock(&mem_map_lock);
>      mr->enabled = enabled;
>      memory_region_update_topology(NULL);
> +    qemu_mutex_unlock(&mem_map_lock);
>  }
>
>  void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr)
> @@ -1361,7 +1381,9 @@ void memory_region_set_alias_offset(MemoryRegion *mr, target_phys_addr_t offset)
>          return;
>      }
>
> +    qemu_mutex_lock(&mem_map_lock);
>      memory_region_update_topology(mr);
> +    qemu_mutex_unlock(&mem_map_lock);
>  }
>
>  ram_addr_t memory_region_get_ram_addr(MemoryRegion *mr)
> diff --git a/memory.h b/memory.h
> index 740c48e..fe6aefa 100644
> --- a/memory.h
> +++ b/memory.h
> @@ -25,6 +25,7 @@
>  #include "iorange.h"
>  #include "ioport.h"
>  #include "int128.h"
> +#include "qemu-thread.h"
>
>  typedef struct MemoryRegionOps MemoryRegionOps;
>  typedef struct MemoryRegion MemoryRegion;
> @@ -207,6 +208,7 @@ struct MemoryListener {
>      QTAILQ_ENTRY(MemoryListener) link;
>  };
>
> +extern QemuMutex mem_map_lock;
>  /**
>   * memory_region_init: Initialize a memory region
>   *
> --
> 1.7.4.4
>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 08/15] memory: introduce PhysMap to present snapshot of toploygy
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08 19:18     ` Blue Swirl
  -1 siblings, 0 replies; 154+ messages in thread
From: Blue Swirl @ 2012-08-08 19:18 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Wed, Aug 8, 2012 at 6:25 AM, Liu Ping Fan <qemulist@gmail.com> wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>
> PhysMap contain the flatview and radix-tree view, they are snapshot
> of system topology and should be consistent. With PhysMap, we can
> swap the pointer when updating and achieve the atomic.
>
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  exec.c   |    8 --------
>  memory.c |   33 ---------------------------------
>  memory.h |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 60 insertions(+), 43 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 0e29ef9..01b91b0 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -156,8 +156,6 @@ typedef struct PageDesc {
>  #endif
>
>  /* Size of the L2 (and L3, etc) page tables.  */

Please copy this comment to the header file.

> -#define L2_BITS 10
> -#define L2_SIZE (1 << L2_BITS)
>
>  #define P_L2_LEVELS \
>      (((TARGET_PHYS_ADDR_SPACE_BITS - TARGET_PAGE_BITS - 1) / L2_BITS) + 1)
> @@ -185,7 +183,6 @@ uintptr_t qemu_host_page_mask;
>  static void *l1_map[V_L1_SIZE];
>
>  #if !defined(CONFIG_USER_ONLY)
> -typedef struct PhysPageEntry PhysPageEntry;
>
>  static MemoryRegionSection *phys_sections;
>  static unsigned phys_sections_nb, phys_sections_nb_alloc;
> @@ -194,11 +191,6 @@ static uint16_t phys_section_notdirty;
>  static uint16_t phys_section_rom;
>  static uint16_t phys_section_watch;
>
> -struct PhysPageEntry {
> -    uint16_t is_leaf : 1;
> -     /* index into phys_sections (is_leaf) or phys_map_nodes (!is_leaf) */
> -    uint16_t ptr : 15;
> -};
>
>  /* Simple allocator for PhysPageEntry nodes */
>  static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
> diff --git a/memory.c b/memory.c
> index 2eaa2fc..c7f2cfd 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -31,17 +31,6 @@ static bool global_dirty_log = false;
>  static QTAILQ_HEAD(memory_listeners, MemoryListener) memory_listeners
>      = QTAILQ_HEAD_INITIALIZER(memory_listeners);
>
> -typedef struct AddrRange AddrRange;
> -
> -/*
> - * Note using signed integers limits us to physical addresses at most
> - * 63 bits wide.  They are needed for negative offsetting in aliases
> - * (large MemoryRegion::alias_offset).
> - */
> -struct AddrRange {
> -    Int128 start;
> -    Int128 size;
> -};
>
>  static AddrRange addrrange_make(Int128 start, Int128 size)
>  {
> @@ -197,28 +186,6 @@ static bool memory_region_ioeventfd_equal(MemoryRegionIoeventfd a,
>          && !memory_region_ioeventfd_before(b, a);
>  }
>
> -typedef struct FlatRange FlatRange;
> -typedef struct FlatView FlatView;
> -
> -/* Range of memory in the global map.  Addresses are absolute. */
> -struct FlatRange {
> -    MemoryRegion *mr;
> -    target_phys_addr_t offset_in_region;
> -    AddrRange addr;
> -    uint8_t dirty_log_mask;
> -    bool readable;
> -    bool readonly;
> -};
> -
> -/* Flattened global view of current active memory hierarchy.  Kept in sorted
> - * order.
> - */
> -struct FlatView {
> -    FlatRange *ranges;
> -    unsigned nr;
> -    unsigned nr_allocated;
> -};
> -
>  typedef struct AddressSpace AddressSpace;
>  typedef struct AddressSpaceOps AddressSpaceOps;
>
> diff --git a/memory.h b/memory.h
> index 740f018..357edd8 100644
> --- a/memory.h
> +++ b/memory.h
> @@ -29,12 +29,72 @@
>  #include "qemu-thread.h"
>  #include "qemu/reclaimer.h"
>
> +typedef struct AddrRange AddrRange;
> +typedef struct FlatRange FlatRange;
> +typedef struct FlatView FlatView;
> +typedef struct PhysPageEntry PhysPageEntry;
> +typedef struct PhysMap PhysMap;
> +typedef struct MemoryRegionSection MemoryRegionSection;
>  typedef struct MemoryRegionOps MemoryRegionOps;
>  typedef struct MemoryRegionLifeOps MemoryRegionLifeOps;
>  typedef struct MemoryRegion MemoryRegion;
>  typedef struct MemoryRegionPortio MemoryRegionPortio;
>  typedef struct MemoryRegionMmio MemoryRegionMmio;
>
> +/*
> + * Note using signed integers limits us to physical addresses at most
> + * 63 bits wide.  They are needed for negative offsetting in aliases
> + * (large MemoryRegion::alias_offset).
> + */
> +struct AddrRange {
> +    Int128 start;
> +    Int128 size;
> +};
> +
> +/* Range of memory in the global map.  Addresses are absolute. */
> +struct FlatRange {
> +    MemoryRegion *mr;
> +    target_phys_addr_t offset_in_region;
> +    AddrRange addr;
> +    uint8_t dirty_log_mask;
> +    bool readable;
> +    bool readonly;
> +};
> +
> +/* Flattened global view of current active memory hierarchy.  Kept in sorted
> + * order.
> + */
> +struct FlatView {
> +    FlatRange *ranges;
> +    unsigned nr;
> +    unsigned nr_allocated;
> +};
> +
> +struct PhysPageEntry {
> +    uint16_t is_leaf:1;
> +     /* index into phys_sections (is_leaf) or phys_map_nodes (!is_leaf) */
> +    uint16_t ptr:15;
> +};
> +
> +#define L2_BITS 10
> +#define L2_SIZE (1 << L2_BITS)
> +/* This is a multi-level map on the physical address space.
> +   The bottom level has pointers to MemoryRegionSections.  */
> +struct PhysMap {
> +    Atomic ref;
> +    PhysPageEntry root;
> +    PhysPageEntry (*phys_map_nodes)[L2_SIZE];
> +    unsigned phys_map_nodes_nb;
> +    unsigned phys_map_nodes_nb_alloc;
> +
> +    MemoryRegionSection *phys_sections;
> +    unsigned phys_sections_nb;
> +    unsigned phys_sections_nb_alloc;
> +
> +    /* FlatView */
> +    FlatView views[2];
> +};
> +
>  /* Must match *_DIRTY_FLAGS in cpu-all.h.  To be replaced with dynamic
>   * registration.
>   */
> @@ -167,8 +227,6 @@ struct MemoryRegionPortio {
>
>  #define PORTIO_END_OF_LIST() { }
>
> -typedef struct MemoryRegionSection MemoryRegionSection;
> -
>  /**
>   * MemoryRegionSection: describes a fragment of a #MemoryRegion
>   *
> --
> 1.7.4.4
>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 08/15] memory: introduce PhysMap to present snapshot of toploygy
@ 2012-08-08 19:18     ` Blue Swirl
  0 siblings, 0 replies; 154+ messages in thread
From: Blue Swirl @ 2012-08-08 19:18 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Wed, Aug 8, 2012 at 6:25 AM, Liu Ping Fan <qemulist@gmail.com> wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>
> PhysMap contain the flatview and radix-tree view, they are snapshot
> of system topology and should be consistent. With PhysMap, we can
> swap the pointer when updating and achieve the atomic.
>
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  exec.c   |    8 --------
>  memory.c |   33 ---------------------------------
>  memory.h |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 60 insertions(+), 43 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 0e29ef9..01b91b0 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -156,8 +156,6 @@ typedef struct PageDesc {
>  #endif
>
>  /* Size of the L2 (and L3, etc) page tables.  */

Please copy this comment to the header file.

> -#define L2_BITS 10
> -#define L2_SIZE (1 << L2_BITS)
>
>  #define P_L2_LEVELS \
>      (((TARGET_PHYS_ADDR_SPACE_BITS - TARGET_PAGE_BITS - 1) / L2_BITS) + 1)
> @@ -185,7 +183,6 @@ uintptr_t qemu_host_page_mask;
>  static void *l1_map[V_L1_SIZE];
>
>  #if !defined(CONFIG_USER_ONLY)
> -typedef struct PhysPageEntry PhysPageEntry;
>
>  static MemoryRegionSection *phys_sections;
>  static unsigned phys_sections_nb, phys_sections_nb_alloc;
> @@ -194,11 +191,6 @@ static uint16_t phys_section_notdirty;
>  static uint16_t phys_section_rom;
>  static uint16_t phys_section_watch;
>
> -struct PhysPageEntry {
> -    uint16_t is_leaf : 1;
> -     /* index into phys_sections (is_leaf) or phys_map_nodes (!is_leaf) */
> -    uint16_t ptr : 15;
> -};
>
>  /* Simple allocator for PhysPageEntry nodes */
>  static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
> diff --git a/memory.c b/memory.c
> index 2eaa2fc..c7f2cfd 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -31,17 +31,6 @@ static bool global_dirty_log = false;
>  static QTAILQ_HEAD(memory_listeners, MemoryListener) memory_listeners
>      = QTAILQ_HEAD_INITIALIZER(memory_listeners);
>
> -typedef struct AddrRange AddrRange;
> -
> -/*
> - * Note using signed integers limits us to physical addresses at most
> - * 63 bits wide.  They are needed for negative offsetting in aliases
> - * (large MemoryRegion::alias_offset).
> - */
> -struct AddrRange {
> -    Int128 start;
> -    Int128 size;
> -};
>
>  static AddrRange addrrange_make(Int128 start, Int128 size)
>  {
> @@ -197,28 +186,6 @@ static bool memory_region_ioeventfd_equal(MemoryRegionIoeventfd a,
>          && !memory_region_ioeventfd_before(b, a);
>  }
>
> -typedef struct FlatRange FlatRange;
> -typedef struct FlatView FlatView;
> -
> -/* Range of memory in the global map.  Addresses are absolute. */
> -struct FlatRange {
> -    MemoryRegion *mr;
> -    target_phys_addr_t offset_in_region;
> -    AddrRange addr;
> -    uint8_t dirty_log_mask;
> -    bool readable;
> -    bool readonly;
> -};
> -
> -/* Flattened global view of current active memory hierarchy.  Kept in sorted
> - * order.
> - */
> -struct FlatView {
> -    FlatRange *ranges;
> -    unsigned nr;
> -    unsigned nr_allocated;
> -};
> -
>  typedef struct AddressSpace AddressSpace;
>  typedef struct AddressSpaceOps AddressSpaceOps;
>
> diff --git a/memory.h b/memory.h
> index 740f018..357edd8 100644
> --- a/memory.h
> +++ b/memory.h
> @@ -29,12 +29,72 @@
>  #include "qemu-thread.h"
>  #include "qemu/reclaimer.h"
>
> +typedef struct AddrRange AddrRange;
> +typedef struct FlatRange FlatRange;
> +typedef struct FlatView FlatView;
> +typedef struct PhysPageEntry PhysPageEntry;
> +typedef struct PhysMap PhysMap;
> +typedef struct MemoryRegionSection MemoryRegionSection;
>  typedef struct MemoryRegionOps MemoryRegionOps;
>  typedef struct MemoryRegionLifeOps MemoryRegionLifeOps;
>  typedef struct MemoryRegion MemoryRegion;
>  typedef struct MemoryRegionPortio MemoryRegionPortio;
>  typedef struct MemoryRegionMmio MemoryRegionMmio;
>
> +/*
> + * Note using signed integers limits us to physical addresses at most
> + * 63 bits wide.  They are needed for negative offsetting in aliases
> + * (large MemoryRegion::alias_offset).
> + */
> +struct AddrRange {
> +    Int128 start;
> +    Int128 size;
> +};
> +
> +/* Range of memory in the global map.  Addresses are absolute. */
> +struct FlatRange {
> +    MemoryRegion *mr;
> +    target_phys_addr_t offset_in_region;
> +    AddrRange addr;
> +    uint8_t dirty_log_mask;
> +    bool readable;
> +    bool readonly;
> +};
> +
> +/* Flattened global view of current active memory hierarchy.  Kept in sorted
> + * order.
> + */
> +struct FlatView {
> +    FlatRange *ranges;
> +    unsigned nr;
> +    unsigned nr_allocated;
> +};
> +
> +struct PhysPageEntry {
> +    uint16_t is_leaf:1;
> +     /* index into phys_sections (is_leaf) or phys_map_nodes (!is_leaf) */
> +    uint16_t ptr:15;
> +};
> +
> +#define L2_BITS 10
> +#define L2_SIZE (1 << L2_BITS)
> +/* This is a multi-level map on the physical address space.
> +   The bottom level has pointers to MemoryRegionSections.  */
> +struct PhysMap {
> +    Atomic ref;
> +    PhysPageEntry root;
> +    PhysPageEntry (*phys_map_nodes)[L2_SIZE];
> +    unsigned phys_map_nodes_nb;
> +    unsigned phys_map_nodes_nb_alloc;
> +
> +    MemoryRegionSection *phys_sections;
> +    unsigned phys_sections_nb;
> +    unsigned phys_sections_nb_alloc;
> +
> +    /* FlatView */
> +    FlatView views[2];
> +};
> +
>  /* Must match *_DIRTY_FLAGS in cpu-all.h.  To be replaced with dynamic
>   * registration.
>   */
> @@ -167,8 +227,6 @@ struct MemoryRegionPortio {
>
>  #define PORTIO_END_OF_LIST() { }
>
> -typedef struct MemoryRegionSection MemoryRegionSection;
> -
>  /**
>   * MemoryRegionSection: describes a fragment of a #MemoryRegion
>   *
> --
> 1.7.4.4
>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 09/15] memory: prepare flatview and radix-tree for rcu style access
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-08 19:23     ` Blue Swirl
  -1 siblings, 0 replies; 154+ messages in thread
From: Blue Swirl @ 2012-08-08 19:23 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: qemu-devel, kvm, Anthony Liguori, Avi Kivity, Jan Kiszka,
	Marcelo Tosatti, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Wed, Aug 8, 2012 at 6:25 AM, Liu Ping Fan <qemulist@gmail.com> wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>
> Flatview and radix view are all under the protection of pointer.
> And this make sure the change of them seem to be atomic!
>
> The mr accessed by radix-tree leaf or flatview will be reclaimed
> after the prev PhysMap not in use any longer
>
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  exec.c      |  303 +++++++++++++++++++++++++++++++++++++++-------------------
>  hw/vhost.c  |    2 +-
>  hw/xen_pt.c |    2 +-
>  kvm-all.c   |    2 +-
>  memory.c    |   92 ++++++++++++++-----
>  memory.h    |    9 ++-
>  vl.c        |    1 +
>  xen-all.c   |    2 +-
>  8 files changed, 286 insertions(+), 127 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 01b91b0..97addb9 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -24,6 +24,7 @@
>  #include <sys/mman.h>
>  #endif
>
> +#include "qemu/atomic.h"
>  #include "qemu-common.h"
>  #include "cpu.h"
>  #include "tcg.h"
> @@ -35,6 +36,8 @@
>  #include "qemu-timer.h"
>  #include "memory.h"
>  #include "exec-memory.h"
> +#include "qemu-thread.h"
> +#include "qemu/reclaimer.h"
>  #if defined(CONFIG_USER_ONLY)
>  #include <qemu.h>
>  #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
> @@ -184,25 +187,17 @@ static void *l1_map[V_L1_SIZE];
>
>  #if !defined(CONFIG_USER_ONLY)
>
> -static MemoryRegionSection *phys_sections;
> -static unsigned phys_sections_nb, phys_sections_nb_alloc;
>  static uint16_t phys_section_unassigned;
>  static uint16_t phys_section_notdirty;
>  static uint16_t phys_section_rom;
>  static uint16_t phys_section_watch;
>
> -
> -/* Simple allocator for PhysPageEntry nodes */
> -static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
> -static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
> -
>  #define PHYS_MAP_NODE_NIL (((uint16_t)~0) >> 1)
>
> -/* This is a multi-level map on the physical address space.
> -   The bottom level has pointers to MemoryRegionSections.  */
> -static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
> -
> +static QemuMutex cur_map_lock;
> +static PhysMap *cur_map;
>  QemuMutex mem_map_lock;
> +static PhysMap *next_map;
>
>  static void io_mem_init(void);
>  static void memory_map_init(void);
> @@ -383,41 +378,38 @@ static inline PageDesc *page_find(tb_page_addr_t index)
>
>  #if !defined(CONFIG_USER_ONLY)
>
> -static void phys_map_node_reserve(unsigned nodes)
> +static void phys_map_node_reserve(PhysMap *map, unsigned nodes)
>  {
> -    if (phys_map_nodes_nb + nodes > phys_map_nodes_nb_alloc) {
> +    if (map->phys_map_nodes_nb + nodes > map->phys_map_nodes_nb_alloc) {
>          typedef PhysPageEntry Node[L2_SIZE];
> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc * 2, 16);
> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc,
> -                                      phys_map_nodes_nb + nodes);
> -        phys_map_nodes = g_renew(Node, phys_map_nodes,
> -                                 phys_map_nodes_nb_alloc);
> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc * 2,
> +                                                                        16);
> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc,
> +                                      map->phys_map_nodes_nb + nodes);
> +        map->phys_map_nodes = g_renew(Node, map->phys_map_nodes,
> +                                 map->phys_map_nodes_nb_alloc);
>      }
>  }
>
> -static uint16_t phys_map_node_alloc(void)
> +static uint16_t phys_map_node_alloc(PhysMap *map)
>  {
>      unsigned i;
>      uint16_t ret;
>
> -    ret = phys_map_nodes_nb++;
> +    ret = map->phys_map_nodes_nb++;
>      assert(ret != PHYS_MAP_NODE_NIL);
> -    assert(ret != phys_map_nodes_nb_alloc);
> +    assert(ret != map->phys_map_nodes_nb_alloc);
>      for (i = 0; i < L2_SIZE; ++i) {
> -        phys_map_nodes[ret][i].is_leaf = 0;
> -        phys_map_nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
> +        map->phys_map_nodes[ret][i].is_leaf = 0;
> +        map->phys_map_nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
>      }
>      return ret;
>  }
>
> -static void phys_map_nodes_reset(void)
> -{
> -    phys_map_nodes_nb = 0;
> -}
> -
> -
> -static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
> -                                target_phys_addr_t *nb, uint16_t leaf,
> +static void phys_page_set_level(PhysMap *map, PhysPageEntry *lp,
> +                                target_phys_addr_t *index,
> +                                target_phys_addr_t *nb,
> +                                uint16_t leaf,
>                                  int level)
>  {
>      PhysPageEntry *p;
> @@ -425,8 +417,8 @@ static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
>      target_phys_addr_t step = (target_phys_addr_t)1 << (level * L2_BITS);
>
>      if (!lp->is_leaf && lp->ptr == PHYS_MAP_NODE_NIL) {
> -        lp->ptr = phys_map_node_alloc();
> -        p = phys_map_nodes[lp->ptr];
> +        lp->ptr = phys_map_node_alloc(map);
> +        p = map->phys_map_nodes[lp->ptr];
>          if (level == 0) {
>              for (i = 0; i < L2_SIZE; i++) {
>                  p[i].is_leaf = 1;
> @@ -434,7 +426,7 @@ static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
>              }
>          }
>      } else {
> -        p = phys_map_nodes[lp->ptr];
> +        p = map->phys_map_nodes[lp->ptr];
>      }
>      lp = &p[(*index >> (level * L2_BITS)) & (L2_SIZE - 1)];
>
> @@ -445,24 +437,27 @@ static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
>              *index += step;
>              *nb -= step;
>          } else {
> -            phys_page_set_level(lp, index, nb, leaf, level - 1);
> +            phys_page_set_level(map, lp, index, nb, leaf, level - 1);
>          }
>          ++lp;
>      }
>  }
>
> -static void phys_page_set(target_phys_addr_t index, target_phys_addr_t nb,
> -                          uint16_t leaf)
> +static void phys_page_set(PhysMap *map, target_phys_addr_t index,
> +                            target_phys_addr_t nb,
> +                            uint16_t leaf)
>  {
>      /* Wildly overreserve - it doesn't matter much. */
> -    phys_map_node_reserve(3 * P_L2_LEVELS);
> +    phys_map_node_reserve(map, 3 * P_L2_LEVELS);
>
> -    phys_page_set_level(&phys_map, &index, &nb, leaf, P_L2_LEVELS - 1);
> +    /* update in new tree*/
> +    phys_page_set_level(map, &map->root, &index, &nb, leaf, P_L2_LEVELS - 1);
>  }
>
> -MemoryRegionSection *phys_page_find(target_phys_addr_t index)
> +static MemoryRegionSection *phys_page_find_internal(PhysMap *map,
> +                           target_phys_addr_t index)
>  {
> -    PhysPageEntry lp = phys_map;
> +    PhysPageEntry lp = map->root;
>      PhysPageEntry *p;
>      int i;
>      uint16_t s_index = phys_section_unassigned;
> @@ -471,13 +466,79 @@ MemoryRegionSection *phys_page_find(target_phys_addr_t index)
>          if (lp.ptr == PHYS_MAP_NODE_NIL) {
>              goto not_found;
>          }
> -        p = phys_map_nodes[lp.ptr];
> +        p = map->phys_map_nodes[lp.ptr];
>          lp = p[(index >> (i * L2_BITS)) & (L2_SIZE - 1)];
>      }
>
>      s_index = lp.ptr;
>  not_found:
> -    return &phys_sections[s_index];
> +    return &map->phys_sections[s_index];
> +}
> +
> +MemoryRegionSection *phys_page_find(target_phys_addr_t index)
> +{
> +    return phys_page_find_internal(cur_map, index);
> +}
> +
> +void physmap_get(PhysMap *map)
> +{
> +    atomic_inc(&map->ref);
> +}
> +
> +/* Untill rcu read side finished, do this reclaim */

Until

> +static ChunkHead physmap_reclaimer_list = { .lh_first = NULL };

Please insert a blank line here.

> +void physmap_reclaimer_enqueue(void *opaque, ReleaseHandler *release)
> +{
> +    reclaimer_enqueue(&physmap_reclaimer_list, opaque, release);
> +}
> +
> +static void destroy_all_mappings(PhysMap *map);

Prototypes belong to the top of the file.

> +static void phys_map_release(PhysMap *map)
> +{
> +    /* emulate for rcu reclaimer for mr */
> +    reclaimer_worker(&physmap_reclaimer_list);
> +
> +    destroy_all_mappings(map);
> +    g_free(map->phys_map_nodes);
> +    g_free(map->phys_sections);
> +    g_free(map->views[0].ranges);
> +    g_free(map->views[1].ranges);
> +    g_free(map);
> +}
> +
> +void physmap_put(PhysMap *map)
> +{
> +    if (atomic_dec_and_test(&map->ref)) {
> +        phys_map_release(map);
> +    }
> +}
> +
> +void cur_map_update(PhysMap *next)
> +{
> +    qemu_mutex_lock(&cur_map_lock);
> +    physmap_put(cur_map);
> +    cur_map = next;
> +    smp_mb();
> +    qemu_mutex_unlock(&cur_map_lock);
> +}
> +
> +PhysMap *cur_map_get(void)
> +{
> +    PhysMap *ret;
> +
> +    qemu_mutex_lock(&cur_map_lock);
> +    ret = cur_map;
> +    physmap_get(ret);
> +    smp_mb();
> +    qemu_mutex_unlock(&cur_map_lock);
> +    return ret;
> +}
> +
> +PhysMap *alloc_next_map(void)
> +{
> +    PhysMap *next = g_malloc0(sizeof(PhysMap));
> +    atomic_set(&next->ref, 1);
> +    return next;
>  }
>
>  bool memory_region_is_unassigned(MemoryRegion *mr)
> @@ -632,6 +693,7 @@ void cpu_exec_init_all(void)
>      memory_map_init();
>      io_mem_init();
>      qemu_mutex_init(&mem_map_lock);
> +    qemu_mutex_init(&cur_map_lock);
>  #endif
>  }
>
> @@ -2161,17 +2223,18 @@ int page_unprotect(target_ulong address, uintptr_t pc, void *puc)
>
>  #define SUBPAGE_IDX(addr) ((addr) & ~TARGET_PAGE_MASK)
>  typedef struct subpage_t {
> +    PhysMap *map;
>      MemoryRegion iomem;
>      target_phys_addr_t base;
>      uint16_t sub_section[TARGET_PAGE_SIZE];
>  } subpage_t;
>
> -static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
> -                             uint16_t section);
> -static subpage_t *subpage_init(target_phys_addr_t base);
> -static void destroy_page_desc(uint16_t section_index)
> +static int subpage_register(PhysMap *map, subpage_t *mmio, uint32_t start,
> +                            uint32_t end, uint16_t section);
> +static subpage_t *subpage_init(PhysMap *map, target_phys_addr_t base);
> +static void destroy_page_desc(PhysMap *map, uint16_t section_index)
>  {
> -    MemoryRegionSection *section = &phys_sections[section_index];
> +    MemoryRegionSection *section = &map->phys_sections[section_index];
>      MemoryRegion *mr = section->mr;
>
>      if (mr->subpage) {
> @@ -2181,7 +2244,7 @@ static void destroy_page_desc(uint16_t section_index)
>      }
>  }
>
> -static void destroy_l2_mapping(PhysPageEntry *lp, unsigned level)
> +static void destroy_l2_mapping(PhysMap *map, PhysPageEntry *lp, unsigned level)
>  {
>      unsigned i;
>      PhysPageEntry *p;
> @@ -2190,38 +2253,34 @@ static void destroy_l2_mapping(PhysPageEntry *lp, unsigned level)
>          return;
>      }
>
> -    p = phys_map_nodes[lp->ptr];
> +    p = map->phys_map_nodes[lp->ptr];
>      for (i = 0; i < L2_SIZE; ++i) {
>          if (!p[i].is_leaf) {
> -            destroy_l2_mapping(&p[i], level - 1);
> +            destroy_l2_mapping(map, &p[i], level - 1);
>          } else {
> -            destroy_page_desc(p[i].ptr);
> +            destroy_page_desc(map, p[i].ptr);
>          }
>      }
>      lp->is_leaf = 0;
>      lp->ptr = PHYS_MAP_NODE_NIL;
>  }
>
> -static void destroy_all_mappings(void)
> +static void destroy_all_mappings(PhysMap *map)
>  {
> -    destroy_l2_mapping(&phys_map, P_L2_LEVELS - 1);
> -    phys_map_nodes_reset();
> -}
> +    PhysPageEntry *root = &map->root;
>
> -static uint16_t phys_section_add(MemoryRegionSection *section)
> -{
> -    if (phys_sections_nb == phys_sections_nb_alloc) {
> -        phys_sections_nb_alloc = MAX(phys_sections_nb_alloc * 2, 16);
> -        phys_sections = g_renew(MemoryRegionSection, phys_sections,
> -                                phys_sections_nb_alloc);
> -    }
> -    phys_sections[phys_sections_nb] = *section;
> -    return phys_sections_nb++;
> +    destroy_l2_mapping(map, root, P_L2_LEVELS - 1);
>  }
>
> -static void phys_sections_clear(void)
> +static uint16_t phys_section_add(PhysMap *map, MemoryRegionSection *section)
>  {
> -    phys_sections_nb = 0;
> +    if (map->phys_sections_nb == map->phys_sections_nb_alloc) {
> +        map->phys_sections_nb_alloc = MAX(map->phys_sections_nb_alloc * 2, 16);
> +        map->phys_sections = g_renew(MemoryRegionSection, map->phys_sections,
> +                                map->phys_sections_nb_alloc);
> +    }
> +    map->phys_sections[map->phys_sections_nb] = *section;
> +    return map->phys_sections_nb++;
>  }
>
>  /* register physical memory.
> @@ -2232,12 +2291,13 @@ static void phys_sections_clear(void)
>     start_addr and region_offset are rounded down to a page boundary
>     before calculating this offset.  This should not be a problem unless
>     the low bits of start_addr and region_offset differ.  */
> -static void register_subpage(MemoryRegionSection *section)
> +static void register_subpage(PhysMap *map, MemoryRegionSection *section)
>  {
>      subpage_t *subpage;
>      target_phys_addr_t base = section->offset_within_address_space
>          & TARGET_PAGE_MASK;
> -    MemoryRegionSection *existing = phys_page_find(base >> TARGET_PAGE_BITS);
> +    MemoryRegionSection *existing = phys_page_find_internal(map,
> +                                            base >> TARGET_PAGE_BITS);
>      MemoryRegionSection subsection = {
>          .offset_within_address_space = base,
>          .size = TARGET_PAGE_SIZE,
> @@ -2247,30 +2307,30 @@ static void register_subpage(MemoryRegionSection *section)
>      assert(existing->mr->subpage || existing->mr == &io_mem_unassigned);
>
>      if (!(existing->mr->subpage)) {
> -        subpage = subpage_init(base);
> +        subpage = subpage_init(map, base);
>          subsection.mr = &subpage->iomem;
> -        phys_page_set(base >> TARGET_PAGE_BITS, 1,
> -                      phys_section_add(&subsection));
> +        phys_page_set(map, base >> TARGET_PAGE_BITS, 1,
> +                      phys_section_add(map, &subsection));
>      } else {
>          subpage = container_of(existing->mr, subpage_t, iomem);
>      }
>      start = section->offset_within_address_space & ~TARGET_PAGE_MASK;
>      end = start + section->size;
> -    subpage_register(subpage, start, end, phys_section_add(section));
> +    subpage_register(map, subpage, start, end, phys_section_add(map, section));
>  }
>
>
> -static void register_multipage(MemoryRegionSection *section)
> +static void register_multipage(PhysMap *map, MemoryRegionSection *section)
>  {
>      target_phys_addr_t start_addr = section->offset_within_address_space;
>      ram_addr_t size = section->size;
>      target_phys_addr_t addr;
> -    uint16_t section_index = phys_section_add(section);
> +    uint16_t section_index = phys_section_add(map, section);
>
>      assert(size);
>
>      addr = start_addr;
> -    phys_page_set(addr >> TARGET_PAGE_BITS, size >> TARGET_PAGE_BITS,
> +    phys_page_set(map, addr >> TARGET_PAGE_BITS, size >> TARGET_PAGE_BITS,
>                    section_index);
>  }
>
> @@ -2278,13 +2338,14 @@ void cpu_register_physical_memory_log(MemoryRegionSection *section,
>                                        bool readonly)
>  {
>      MemoryRegionSection now = *section, remain = *section;
> +    PhysMap *map = next_map;
>
>      if ((now.offset_within_address_space & ~TARGET_PAGE_MASK)
>          || (now.size < TARGET_PAGE_SIZE)) {
>          now.size = MIN(TARGET_PAGE_ALIGN(now.offset_within_address_space)
>                         - now.offset_within_address_space,
>                         now.size);
> -        register_subpage(&now);
> +        register_subpage(map, &now);
>          remain.size -= now.size;
>          remain.offset_within_address_space += now.size;
>          remain.offset_within_region += now.size;
> @@ -2292,14 +2353,14 @@ void cpu_register_physical_memory_log(MemoryRegionSection *section,
>      now = remain;
>      now.size &= TARGET_PAGE_MASK;
>      if (now.size) {
> -        register_multipage(&now);
> +        register_multipage(map, &now);
>          remain.size -= now.size;
>          remain.offset_within_address_space += now.size;
>          remain.offset_within_region += now.size;
>      }
>      now = remain;
>      if (now.size) {
> -        register_subpage(&now);
> +        register_subpage(map, &now);
>      }
>  }
>
> @@ -3001,7 +3062,7 @@ static uint64_t subpage_read(void *opaque, target_phys_addr_t addr,
>             mmio, len, addr, idx);
>  #endif
>
> -    section = &phys_sections[mmio->sub_section[idx]];
> +    section = &mmio->map->phys_sections[mmio->sub_section[idx]];
>      addr += mmio->base;
>      addr -= section->offset_within_address_space;
>      addr += section->offset_within_region;
> @@ -3020,7 +3081,7 @@ static void subpage_write(void *opaque, target_phys_addr_t addr,
>             __func__, mmio, len, addr, idx, value);
>  #endif
>
> -    section = &phys_sections[mmio->sub_section[idx]];
> +    section = &mmio->map->phys_sections[mmio->sub_section[idx]];
>      addr += mmio->base;
>      addr -= section->offset_within_address_space;
>      addr += section->offset_within_region;
> @@ -3065,8 +3126,8 @@ static const MemoryRegionOps subpage_ram_ops = {
>      .endianness = DEVICE_NATIVE_ENDIAN,
>  };
>
> -static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
> -                             uint16_t section)
> +static int subpage_register(PhysMap *map, subpage_t *mmio, uint32_t start,
> +                              uint32_t end, uint16_t section)
>  {
>      int idx, eidx;
>
> @@ -3078,10 +3139,10 @@ static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
>      printf("%s: %p start %08x end %08x idx %08x eidx %08x mem %ld\n", __func__,
>             mmio, start, end, idx, eidx, memory);
>  #endif
> -    if (memory_region_is_ram(phys_sections[section].mr)) {
> -        MemoryRegionSection new_section = phys_sections[section];
> +    if (memory_region_is_ram(map->phys_sections[section].mr)) {
> +        MemoryRegionSection new_section = map->phys_sections[section];
>          new_section.mr = &io_mem_subpage_ram;
> -        section = phys_section_add(&new_section);
> +        section = phys_section_add(map, &new_section);
>      }
>      for (; idx <= eidx; idx++) {
>          mmio->sub_section[idx] = section;
> @@ -3090,12 +3151,13 @@ static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
>      return 0;
>  }
>
> -static subpage_t *subpage_init(target_phys_addr_t base)
> +static subpage_t *subpage_init(PhysMap *map, target_phys_addr_t base)
>  {
>      subpage_t *mmio;
>
>      mmio = g_malloc0(sizeof(subpage_t));
>
> +    mmio->map = map;
>      mmio->base = base;
>      memory_region_init_io(&mmio->iomem, &subpage_ops, mmio,
>                            "subpage", TARGET_PAGE_SIZE);
> @@ -3104,12 +3166,12 @@ static subpage_t *subpage_init(target_phys_addr_t base)
>      printf("%s: %p base " TARGET_FMT_plx " len %08x %d\n", __func__,
>             mmio, base, TARGET_PAGE_SIZE, subpage_memory);
>  #endif
> -    subpage_register(mmio, 0, TARGET_PAGE_SIZE-1, phys_section_unassigned);
> +    subpage_register(map, mmio, 0, TARGET_PAGE_SIZE-1, phys_section_unassigned);
>
>      return mmio;
>  }
>
> -static uint16_t dummy_section(MemoryRegion *mr)
> +static uint16_t dummy_section(PhysMap *map, MemoryRegion *mr)
>  {
>      MemoryRegionSection section = {
>          .mr = mr,
> @@ -3118,7 +3180,7 @@ static uint16_t dummy_section(MemoryRegion *mr)
>          .size = UINT64_MAX,
>      };
>
> -    return phys_section_add(&section);
> +    return phys_section_add(map, &section);
>  }
>
>  MemoryRegion *iotlb_to_region(target_phys_addr_t index)
> @@ -3140,15 +3202,32 @@ static void io_mem_init(void)
>                            "watch", UINT64_MAX);
>  }
>
> -static void core_begin(MemoryListener *listener)
> +#if 0
> +static void physmap_init(void)
> +{
> +    FlatView v = { .ranges = NULL,
> +                             .nr = 0,
> +                             .nr_allocated = 0,
> +    };
> +
> +    init_map.views[0] = v;
> +    init_map.views[1] = v;
> +    cur_map =  &init_map;
> +}
> +#endif

Please delete.

> +
> +static void core_begin(MemoryListener *listener, PhysMap *new_map)
>  {
> -    destroy_all_mappings();
> -    phys_sections_clear();
> -    phys_map.ptr = PHYS_MAP_NODE_NIL;
> -    phys_section_unassigned = dummy_section(&io_mem_unassigned);
> -    phys_section_notdirty = dummy_section(&io_mem_notdirty);
> -    phys_section_rom = dummy_section(&io_mem_rom);
> -    phys_section_watch = dummy_section(&io_mem_watch);
> +
> +    new_map->root.ptr = PHYS_MAP_NODE_NIL;
> +    new_map->root.is_leaf = 0;
> +
> +    /* In all the map, these sections have the same index */
> +    phys_section_unassigned = dummy_section(new_map, &io_mem_unassigned);
> +    phys_section_notdirty = dummy_section(new_map, &io_mem_notdirty);
> +    phys_section_rom = dummy_section(new_map, &io_mem_rom);
> +    phys_section_watch = dummy_section(new_map, &io_mem_watch);
> +    next_map = new_map;
>  }
>
>  static void core_commit(MemoryListener *listener)
> @@ -3161,6 +3240,16 @@ static void core_commit(MemoryListener *listener)
>      for(env = first_cpu; env != NULL; env = env->next_cpu) {
>          tlb_flush(env, 1);
>      }
> +
> +/* move into high layer
> +    qemu_mutex_lock(&cur_map_lock);
> +    if (cur_map != NULL) {
> +        physmap_put(cur_map);
> +    }
> +    cur_map = next_map;
> +    smp_mb();
> +    qemu_mutex_unlock(&cur_map_lock);
> +*/

Also commented out code should be deleted.

>  }
>
>  static void core_region_add(MemoryListener *listener,
> @@ -3217,7 +3306,7 @@ static void core_eventfd_del(MemoryListener *listener,
>  {
>  }
>
> -static void io_begin(MemoryListener *listener)
> +static void io_begin(MemoryListener *listener, PhysMap *next)
>  {
>  }
>
> @@ -3329,6 +3418,20 @@ static void memory_map_init(void)
>      memory_listener_register(&io_memory_listener, system_io);
>  }
>
> +void physmap_init(void)
> +{
> +    FlatView v = { .ranges = NULL, .nr = 0, .nr_allocated = 0,
> +                           };
> +    PhysMap *init_map = g_malloc0(sizeof(PhysMap));
> +
> +    atomic_set(&init_map->ref, 1);
> +    init_map->root.ptr = PHYS_MAP_NODE_NIL;
> +    init_map->root.is_leaf = 0;
> +    init_map->views[0] = v;
> +    init_map->views[1] = v;
> +    cur_map = init_map;
> +}
> +
>  MemoryRegion *get_system_memory(void)
>  {
>      return system_memory;
> @@ -3391,6 +3494,7 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>      uint32_t val;
>      target_phys_addr_t page;
>      MemoryRegionSection *section;
> +    PhysMap *cur = cur_map_get();
>
>      while (len > 0) {
>          page = addr & TARGET_PAGE_MASK;
> @@ -3472,6 +3576,7 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>          buf += l;
>          addr += l;
>      }
> +    physmap_put(cur);
>  }
>
>  /* used for ROM loading : can write in RAM and ROM */
> diff --git a/hw/vhost.c b/hw/vhost.c
> index 43664e7..df58345 100644
> --- a/hw/vhost.c
> +++ b/hw/vhost.c
> @@ -438,7 +438,7 @@ static bool vhost_section(MemoryRegionSection *section)
>          && memory_region_is_ram(section->mr);
>  }
>
> -static void vhost_begin(MemoryListener *listener)
> +static void vhost_begin(MemoryListener *listener, PhysMap *next)
>  {
>  }
>
> diff --git a/hw/xen_pt.c b/hw/xen_pt.c
> index 3b6d186..fba8586 100644
> --- a/hw/xen_pt.c
> +++ b/hw/xen_pt.c
> @@ -597,7 +597,7 @@ static void xen_pt_region_update(XenPCIPassthroughState *s,
>      }
>  }
>
> -static void xen_pt_begin(MemoryListener *l)
> +static void xen_pt_begin(MemoryListener *l, PhysMap *next)
>  {
>  }
>
> diff --git a/kvm-all.c b/kvm-all.c
> index f8e4328..bc42cab 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -693,7 +693,7 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, bool add)
>      }
>  }
>
> -static void kvm_begin(MemoryListener *listener)
> +static void kvm_begin(MemoryListener *listener, PhysMap *next)
>  {
>  }
>
> diff --git a/memory.c b/memory.c
> index c7f2cfd..54cdc7f 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -20,6 +20,7 @@
>  #include "kvm.h"
>  #include <assert.h>
>  #include "hw/qdev.h"
> +#include "qemu-thread.h"
>
>  #define WANT_EXEC_OBSOLETE
>  #include "exec-obsolete.h"
> @@ -192,7 +193,7 @@ typedef struct AddressSpaceOps AddressSpaceOps;
>  /* A system address space - I/O, memory, etc. */
>  struct AddressSpace {
>      MemoryRegion *root;
> -    FlatView current_map;
> +    int view_id;
>      int ioeventfd_nb;
>      MemoryRegionIoeventfd *ioeventfds;
>  };
> @@ -232,11 +233,6 @@ static void flatview_insert(FlatView *view, unsigned pos, FlatRange *range)
>      ++view->nr;
>  }
>
> -static void flatview_destroy(FlatView *view)
> -{
> -    g_free(view->ranges);
> -}
> -
>  static bool can_merge(FlatRange *r1, FlatRange *r2)
>  {
>      return int128_eq(addrrange_end(r1->addr), r2->addr.start)
> @@ -594,8 +590,10 @@ static void address_space_update_ioeventfds(AddressSpace *as)
>      MemoryRegionIoeventfd *ioeventfds = NULL;
>      AddrRange tmp;
>      unsigned i;
> +    PhysMap *map = cur_map_get();
> +    FlatView *view = &map->views[as->view_id];
>
> -    FOR_EACH_FLAT_RANGE(fr, &as->current_map) {
> +    FOR_EACH_FLAT_RANGE(fr, view) {
>          for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
>              tmp = addrrange_shift(fr->mr->ioeventfds[i].addr,
>                                    int128_sub(fr->addr.start,
> @@ -616,6 +614,7 @@ static void address_space_update_ioeventfds(AddressSpace *as)
>      g_free(as->ioeventfds);
>      as->ioeventfds = ioeventfds;
>      as->ioeventfd_nb = ioeventfd_nb;
> +    physmap_put(map);
>  }
>
>  static void address_space_update_topology_pass(AddressSpace *as,
> @@ -681,21 +680,23 @@ static void address_space_update_topology_pass(AddressSpace *as,
>  }
>
>
> -static void address_space_update_topology(AddressSpace *as)
> +static void address_space_update_topology(AddressSpace *as, PhysMap *prev,
> +                                            PhysMap *next)
>  {
> -    FlatView old_view = as->current_map;
> +    FlatView old_view = prev->views[as->view_id];
>      FlatView new_view = generate_memory_topology(as->root);
>
>      address_space_update_topology_pass(as, old_view, new_view, false);
>      address_space_update_topology_pass(as, old_view, new_view, true);
> +    next->views[as->view_id] = new_view;
>
> -    as->current_map = new_view;
> -    flatview_destroy(&old_view);
>      address_space_update_ioeventfds(as);
>  }
>
>  static void memory_region_update_topology(MemoryRegion *mr)
>  {
> +    PhysMap *prev, *next;
> +
>      if (memory_region_transaction_depth) {
>          memory_region_update_pending |= !mr || mr->enabled;
>          return;
> @@ -705,16 +706,20 @@ static void memory_region_update_topology(MemoryRegion *mr)
>          return;
>      }
>
> -    MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
> +     prev = cur_map_get();
> +    /* allocate PhysMap next here */
> +    next = alloc_next_map();
> +    MEMORY_LISTENER_CALL_GLOBAL(begin, Forward, next);
>
>      if (address_space_memory.root) {
> -        address_space_update_topology(&address_space_memory);
> +        address_space_update_topology(&address_space_memory, prev, next);
>      }
>      if (address_space_io.root) {
> -        address_space_update_topology(&address_space_io);
> +        address_space_update_topology(&address_space_io, prev, next);
>      }
>
>      MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
> +    cur_map_update(next);
>
>      memory_region_update_pending = false;
>  }
> @@ -1071,7 +1076,7 @@ void memory_region_put(MemoryRegion *mr)
>
>      if (atomic_dec_and_test(&mr->ref)) {
>          /* to fix, using call_rcu( ,release) */
> -        mr->life_ops->put(mr);
> +        physmap_reclaimer_enqueue(mr, (ReleaseHandler *)mr->life_ops->put);
>      }
>  }
>
> @@ -1147,13 +1152,18 @@ void memory_region_set_dirty(MemoryRegion *mr, target_phys_addr_t addr,
>  void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
>  {
>      FlatRange *fr;
> +    FlatView *fview;
> +    PhysMap *map;
>
> -    FOR_EACH_FLAT_RANGE(fr, &address_space_memory.current_map) {
> +    map = cur_map_get();
> +    fview = &map->views[address_space_memory.view_id];
> +    FOR_EACH_FLAT_RANGE(fr, fview) {
>          if (fr->mr == mr) {
>              MEMORY_LISTENER_UPDATE_REGION(fr, &address_space_memory,
>                                            Forward, log_sync);
>          }
>      }
> +    physmap_put(map);
>  }
>
>  void memory_region_set_readonly(MemoryRegion *mr, bool readonly)
> @@ -1201,8 +1211,12 @@ static void memory_region_update_coalesced_range(MemoryRegion *mr)
>      FlatRange *fr;
>      CoalescedMemoryRange *cmr;
>      AddrRange tmp;
> +    FlatView *fview;
> +    PhysMap *map;
>
> -    FOR_EACH_FLAT_RANGE(fr, &address_space_memory.current_map) {
> +    map = cur_map_get();
> +    fview = &map->views[address_space_memory.view_id];
> +    FOR_EACH_FLAT_RANGE(fr, fview) {
>          if (fr->mr == mr) {
>              qemu_unregister_coalesced_mmio(int128_get64(fr->addr.start),
>                                             int128_get64(fr->addr.size));
> @@ -1219,6 +1233,7 @@ static void memory_region_update_coalesced_range(MemoryRegion *mr)
>              }
>          }
>      }
> +    physmap_put(map);
>  }
>
>  void memory_region_set_coalescing(MemoryRegion *mr)
> @@ -1458,29 +1473,49 @@ static int cmp_flatrange_addr(const void *addr_, const void *fr_)
>      return 0;
>  }
>
> -static FlatRange *address_space_lookup(AddressSpace *as, AddrRange addr)
> +static FlatRange *address_space_lookup(FlatView *view, AddrRange addr)
>  {
> -    return bsearch(&addr, as->current_map.ranges, as->current_map.nr,
> +    return bsearch(&addr, view->ranges, view->nr,
>                     sizeof(FlatRange), cmp_flatrange_addr);
>  }
>
> +/* dec the ref, which inc by memory_region_find*/
> +void memory_region_section_put(MemoryRegionSection *mrs)
> +{
> +    if (mrs->mr != NULL) {
> +        memory_region_put(mrs->mr);
> +    }
> +}
> +
> +/* inc mr's ref. Caller need dec mr's ref */
>  MemoryRegionSection memory_region_find(MemoryRegion *address_space,
>                                         target_phys_addr_t addr, uint64_t size)
>  {
> +    PhysMap *map;
>      AddressSpace *as = memory_region_to_address_space(address_space);
>      AddrRange range = addrrange_make(int128_make64(addr),
>                                       int128_make64(size));
> -    FlatRange *fr = address_space_lookup(as, range);
> +    FlatView *fview;
> +
> +    map = cur_map_get();
> +
> +    fview = &map->views[as->view_id];
> +    FlatRange *fr = address_space_lookup(fview, range);
>      MemoryRegionSection ret = { .mr = NULL, .size = 0 };
>
>      if (!fr) {
> +        physmap_put(map);
>          return ret;
>      }
>
> -    while (fr > as->current_map.ranges
> +    while (fr > fview->ranges
>             && addrrange_intersects(fr[-1].addr, range)) {
>          --fr;
>      }
> +    /* To fix, the caller must in rcu, or we must inc fr->mr->ref here
> +     */
> +    memory_region_get(fr->mr);
> +    physmap_put(map);
>
>      ret.mr = fr->mr;
>      range = addrrange_intersection(range, fr->addr);
> @@ -1497,10 +1532,13 @@ void memory_global_sync_dirty_bitmap(MemoryRegion *address_space)
>  {
>      AddressSpace *as = memory_region_to_address_space(address_space);
>      FlatRange *fr;
> +    PhysMap *map = cur_map_get();
> +    FlatView *view = &map->views[as->view_id];
>
> -    FOR_EACH_FLAT_RANGE(fr, &as->current_map) {
> +    FOR_EACH_FLAT_RANGE(fr, view) {
>          MEMORY_LISTENER_UPDATE_REGION(fr, as, Forward, log_sync);
>      }
> +    physmap_put(map);
>  }
>
>  void memory_global_dirty_log_start(void)
> @@ -1519,6 +1557,8 @@ static void listener_add_address_space(MemoryListener *listener,
>                                         AddressSpace *as)
>  {
>      FlatRange *fr;
> +    PhysMap *map;
> +    FlatView *view;
>
>      if (listener->address_space_filter
>          && listener->address_space_filter != as->root) {
> @@ -1528,7 +1568,10 @@ static void listener_add_address_space(MemoryListener *listener,
>      if (global_dirty_log) {
>          listener->log_global_start(listener);
>      }
> -    FOR_EACH_FLAT_RANGE(fr, &as->current_map) {
> +
> +    map = cur_map_get();
> +    view = &map->views[as->view_id];
> +    FOR_EACH_FLAT_RANGE(fr, view) {
>          MemoryRegionSection section = {
>              .mr = fr->mr,
>              .address_space = as->root,
> @@ -1539,6 +1582,7 @@ static void listener_add_address_space(MemoryListener *listener,
>          };
>          listener->region_add(listener, &section);
>      }
> +    physmap_put(map);
>  }
>
>  void memory_listener_register(MemoryListener *listener, MemoryRegion *filter)
> @@ -1570,12 +1614,14 @@ void memory_listener_unregister(MemoryListener *listener)
>  void set_system_memory_map(MemoryRegion *mr)
>  {
>      address_space_memory.root = mr;
> +    address_space_memory.view_id = 0;
>      memory_region_update_topology(NULL);
>  }
>
>  void set_system_io_map(MemoryRegion *mr)
>  {
>      address_space_io.root = mr;
> +    address_space_io.view_id = 1;
>      memory_region_update_topology(NULL);
>  }
>
> diff --git a/memory.h b/memory.h
> index 357edd8..18442d4 100644
> --- a/memory.h
> +++ b/memory.h
> @@ -256,7 +256,7 @@ typedef struct MemoryListener MemoryListener;
>   * Use with memory_listener_register() and memory_listener_unregister().
>   */
>  struct MemoryListener {
> -    void (*begin)(MemoryListener *listener);
> +    void (*begin)(MemoryListener *listener, PhysMap *next);
>      void (*commit)(MemoryListener *listener);
>      void (*region_add)(MemoryListener *listener, MemoryRegionSection *section);
>      void (*region_del)(MemoryListener *listener, MemoryRegionSection *section);
> @@ -829,6 +829,13 @@ void mtree_info(fprintf_function mon_printf, void *f);
>
>  void memory_region_get(MemoryRegion *mr);
>  void memory_region_put(MemoryRegion *mr);
> +void physmap_reclaimer_enqueue(void *opaque, ReleaseHandler *release);
> +void physmap_get(PhysMap *map);
> +void physmap_put(PhysMap *map);
> +PhysMap *cur_map_get(void);
> +PhysMap *alloc_next_map(void);
> +void cur_map_update(PhysMap *next);
> +void physmap_init(void);
>  #endif
>
>  #endif
> diff --git a/vl.c b/vl.c
> index 1329c30..12af523 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -3346,6 +3346,7 @@ int main(int argc, char **argv, char **envp)
>      if (ram_size == 0) {
>          ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
>      }
> +    physmap_init();
>
>      configure_accelerator();
>
> diff --git a/xen-all.c b/xen-all.c
> index 59f2323..41d82fd 100644
> --- a/xen-all.c
> +++ b/xen-all.c
> @@ -452,7 +452,7 @@ static void xen_set_memory(struct MemoryListener *listener,
>      }
>  }
>
> -static void xen_begin(MemoryListener *listener)
> +static void xen_begin(MemoryListener *listener, PhysMap *next)
>  {
>  }
>
> --
> 1.7.4.4
>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 09/15] memory: prepare flatview and radix-tree for rcu style access
@ 2012-08-08 19:23     ` Blue Swirl
  0 siblings, 0 replies; 154+ messages in thread
From: Blue Swirl @ 2012-08-08 19:23 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Wed, Aug 8, 2012 at 6:25 AM, Liu Ping Fan <qemulist@gmail.com> wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>
> Flatview and radix view are all under the protection of pointer.
> And this make sure the change of them seem to be atomic!
>
> The mr accessed by radix-tree leaf or flatview will be reclaimed
> after the prev PhysMap not in use any longer
>
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  exec.c      |  303 +++++++++++++++++++++++++++++++++++++++-------------------
>  hw/vhost.c  |    2 +-
>  hw/xen_pt.c |    2 +-
>  kvm-all.c   |    2 +-
>  memory.c    |   92 ++++++++++++++-----
>  memory.h    |    9 ++-
>  vl.c        |    1 +
>  xen-all.c   |    2 +-
>  8 files changed, 286 insertions(+), 127 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 01b91b0..97addb9 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -24,6 +24,7 @@
>  #include <sys/mman.h>
>  #endif
>
> +#include "qemu/atomic.h"
>  #include "qemu-common.h"
>  #include "cpu.h"
>  #include "tcg.h"
> @@ -35,6 +36,8 @@
>  #include "qemu-timer.h"
>  #include "memory.h"
>  #include "exec-memory.h"
> +#include "qemu-thread.h"
> +#include "qemu/reclaimer.h"
>  #if defined(CONFIG_USER_ONLY)
>  #include <qemu.h>
>  #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
> @@ -184,25 +187,17 @@ static void *l1_map[V_L1_SIZE];
>
>  #if !defined(CONFIG_USER_ONLY)
>
> -static MemoryRegionSection *phys_sections;
> -static unsigned phys_sections_nb, phys_sections_nb_alloc;
>  static uint16_t phys_section_unassigned;
>  static uint16_t phys_section_notdirty;
>  static uint16_t phys_section_rom;
>  static uint16_t phys_section_watch;
>
> -
> -/* Simple allocator for PhysPageEntry nodes */
> -static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
> -static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
> -
>  #define PHYS_MAP_NODE_NIL (((uint16_t)~0) >> 1)
>
> -/* This is a multi-level map on the physical address space.
> -   The bottom level has pointers to MemoryRegionSections.  */
> -static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
> -
> +static QemuMutex cur_map_lock;
> +static PhysMap *cur_map;
>  QemuMutex mem_map_lock;
> +static PhysMap *next_map;
>
>  static void io_mem_init(void);
>  static void memory_map_init(void);
> @@ -383,41 +378,38 @@ static inline PageDesc *page_find(tb_page_addr_t index)
>
>  #if !defined(CONFIG_USER_ONLY)
>
> -static void phys_map_node_reserve(unsigned nodes)
> +static void phys_map_node_reserve(PhysMap *map, unsigned nodes)
>  {
> -    if (phys_map_nodes_nb + nodes > phys_map_nodes_nb_alloc) {
> +    if (map->phys_map_nodes_nb + nodes > map->phys_map_nodes_nb_alloc) {
>          typedef PhysPageEntry Node[L2_SIZE];
> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc * 2, 16);
> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc,
> -                                      phys_map_nodes_nb + nodes);
> -        phys_map_nodes = g_renew(Node, phys_map_nodes,
> -                                 phys_map_nodes_nb_alloc);
> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc * 2,
> +                                                                        16);
> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc,
> +                                      map->phys_map_nodes_nb + nodes);
> +        map->phys_map_nodes = g_renew(Node, map->phys_map_nodes,
> +                                 map->phys_map_nodes_nb_alloc);
>      }
>  }
>
> -static uint16_t phys_map_node_alloc(void)
> +static uint16_t phys_map_node_alloc(PhysMap *map)
>  {
>      unsigned i;
>      uint16_t ret;
>
> -    ret = phys_map_nodes_nb++;
> +    ret = map->phys_map_nodes_nb++;
>      assert(ret != PHYS_MAP_NODE_NIL);
> -    assert(ret != phys_map_nodes_nb_alloc);
> +    assert(ret != map->phys_map_nodes_nb_alloc);
>      for (i = 0; i < L2_SIZE; ++i) {
> -        phys_map_nodes[ret][i].is_leaf = 0;
> -        phys_map_nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
> +        map->phys_map_nodes[ret][i].is_leaf = 0;
> +        map->phys_map_nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
>      }
>      return ret;
>  }
>
> -static void phys_map_nodes_reset(void)
> -{
> -    phys_map_nodes_nb = 0;
> -}
> -
> -
> -static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
> -                                target_phys_addr_t *nb, uint16_t leaf,
> +static void phys_page_set_level(PhysMap *map, PhysPageEntry *lp,
> +                                target_phys_addr_t *index,
> +                                target_phys_addr_t *nb,
> +                                uint16_t leaf,
>                                  int level)
>  {
>      PhysPageEntry *p;
> @@ -425,8 +417,8 @@ static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
>      target_phys_addr_t step = (target_phys_addr_t)1 << (level * L2_BITS);
>
>      if (!lp->is_leaf && lp->ptr == PHYS_MAP_NODE_NIL) {
> -        lp->ptr = phys_map_node_alloc();
> -        p = phys_map_nodes[lp->ptr];
> +        lp->ptr = phys_map_node_alloc(map);
> +        p = map->phys_map_nodes[lp->ptr];
>          if (level == 0) {
>              for (i = 0; i < L2_SIZE; i++) {
>                  p[i].is_leaf = 1;
> @@ -434,7 +426,7 @@ static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
>              }
>          }
>      } else {
> -        p = phys_map_nodes[lp->ptr];
> +        p = map->phys_map_nodes[lp->ptr];
>      }
>      lp = &p[(*index >> (level * L2_BITS)) & (L2_SIZE - 1)];
>
> @@ -445,24 +437,27 @@ static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
>              *index += step;
>              *nb -= step;
>          } else {
> -            phys_page_set_level(lp, index, nb, leaf, level - 1);
> +            phys_page_set_level(map, lp, index, nb, leaf, level - 1);
>          }
>          ++lp;
>      }
>  }
>
> -static void phys_page_set(target_phys_addr_t index, target_phys_addr_t nb,
> -                          uint16_t leaf)
> +static void phys_page_set(PhysMap *map, target_phys_addr_t index,
> +                            target_phys_addr_t nb,
> +                            uint16_t leaf)
>  {
>      /* Wildly overreserve - it doesn't matter much. */
> -    phys_map_node_reserve(3 * P_L2_LEVELS);
> +    phys_map_node_reserve(map, 3 * P_L2_LEVELS);
>
> -    phys_page_set_level(&phys_map, &index, &nb, leaf, P_L2_LEVELS - 1);
> +    /* update in new tree*/
> +    phys_page_set_level(map, &map->root, &index, &nb, leaf, P_L2_LEVELS - 1);
>  }
>
> -MemoryRegionSection *phys_page_find(target_phys_addr_t index)
> +static MemoryRegionSection *phys_page_find_internal(PhysMap *map,
> +                           target_phys_addr_t index)
>  {
> -    PhysPageEntry lp = phys_map;
> +    PhysPageEntry lp = map->root;
>      PhysPageEntry *p;
>      int i;
>      uint16_t s_index = phys_section_unassigned;
> @@ -471,13 +466,79 @@ MemoryRegionSection *phys_page_find(target_phys_addr_t index)
>          if (lp.ptr == PHYS_MAP_NODE_NIL) {
>              goto not_found;
>          }
> -        p = phys_map_nodes[lp.ptr];
> +        p = map->phys_map_nodes[lp.ptr];
>          lp = p[(index >> (i * L2_BITS)) & (L2_SIZE - 1)];
>      }
>
>      s_index = lp.ptr;
>  not_found:
> -    return &phys_sections[s_index];
> +    return &map->phys_sections[s_index];
> +}
> +
> +MemoryRegionSection *phys_page_find(target_phys_addr_t index)
> +{
> +    return phys_page_find_internal(cur_map, index);
> +}
> +
> +void physmap_get(PhysMap *map)
> +{
> +    atomic_inc(&map->ref);
> +}
> +
> +/* Untill rcu read side finished, do this reclaim */

Until

> +static ChunkHead physmap_reclaimer_list = { .lh_first = NULL };

Please insert a blank line here.

> +void physmap_reclaimer_enqueue(void *opaque, ReleaseHandler *release)
> +{
> +    reclaimer_enqueue(&physmap_reclaimer_list, opaque, release);
> +}
> +
> +static void destroy_all_mappings(PhysMap *map);

Prototypes belong to the top of the file.

> +static void phys_map_release(PhysMap *map)
> +{
> +    /* emulate for rcu reclaimer for mr */
> +    reclaimer_worker(&physmap_reclaimer_list);
> +
> +    destroy_all_mappings(map);
> +    g_free(map->phys_map_nodes);
> +    g_free(map->phys_sections);
> +    g_free(map->views[0].ranges);
> +    g_free(map->views[1].ranges);
> +    g_free(map);
> +}
> +
> +void physmap_put(PhysMap *map)
> +{
> +    if (atomic_dec_and_test(&map->ref)) {
> +        phys_map_release(map);
> +    }
> +}
> +
> +void cur_map_update(PhysMap *next)
> +{
> +    qemu_mutex_lock(&cur_map_lock);
> +    physmap_put(cur_map);
> +    cur_map = next;
> +    smp_mb();
> +    qemu_mutex_unlock(&cur_map_lock);
> +}
> +
> +PhysMap *cur_map_get(void)
> +{
> +    PhysMap *ret;
> +
> +    qemu_mutex_lock(&cur_map_lock);
> +    ret = cur_map;
> +    physmap_get(ret);
> +    smp_mb();
> +    qemu_mutex_unlock(&cur_map_lock);
> +    return ret;
> +}
> +
> +PhysMap *alloc_next_map(void)
> +{
> +    PhysMap *next = g_malloc0(sizeof(PhysMap));
> +    atomic_set(&next->ref, 1);
> +    return next;
>  }
>
>  bool memory_region_is_unassigned(MemoryRegion *mr)
> @@ -632,6 +693,7 @@ void cpu_exec_init_all(void)
>      memory_map_init();
>      io_mem_init();
>      qemu_mutex_init(&mem_map_lock);
> +    qemu_mutex_init(&cur_map_lock);
>  #endif
>  }
>
> @@ -2161,17 +2223,18 @@ int page_unprotect(target_ulong address, uintptr_t pc, void *puc)
>
>  #define SUBPAGE_IDX(addr) ((addr) & ~TARGET_PAGE_MASK)
>  typedef struct subpage_t {
> +    PhysMap *map;
>      MemoryRegion iomem;
>      target_phys_addr_t base;
>      uint16_t sub_section[TARGET_PAGE_SIZE];
>  } subpage_t;
>
> -static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
> -                             uint16_t section);
> -static subpage_t *subpage_init(target_phys_addr_t base);
> -static void destroy_page_desc(uint16_t section_index)
> +static int subpage_register(PhysMap *map, subpage_t *mmio, uint32_t start,
> +                            uint32_t end, uint16_t section);
> +static subpage_t *subpage_init(PhysMap *map, target_phys_addr_t base);
> +static void destroy_page_desc(PhysMap *map, uint16_t section_index)
>  {
> -    MemoryRegionSection *section = &phys_sections[section_index];
> +    MemoryRegionSection *section = &map->phys_sections[section_index];
>      MemoryRegion *mr = section->mr;
>
>      if (mr->subpage) {
> @@ -2181,7 +2244,7 @@ static void destroy_page_desc(uint16_t section_index)
>      }
>  }
>
> -static void destroy_l2_mapping(PhysPageEntry *lp, unsigned level)
> +static void destroy_l2_mapping(PhysMap *map, PhysPageEntry *lp, unsigned level)
>  {
>      unsigned i;
>      PhysPageEntry *p;
> @@ -2190,38 +2253,34 @@ static void destroy_l2_mapping(PhysPageEntry *lp, unsigned level)
>          return;
>      }
>
> -    p = phys_map_nodes[lp->ptr];
> +    p = map->phys_map_nodes[lp->ptr];
>      for (i = 0; i < L2_SIZE; ++i) {
>          if (!p[i].is_leaf) {
> -            destroy_l2_mapping(&p[i], level - 1);
> +            destroy_l2_mapping(map, &p[i], level - 1);
>          } else {
> -            destroy_page_desc(p[i].ptr);
> +            destroy_page_desc(map, p[i].ptr);
>          }
>      }
>      lp->is_leaf = 0;
>      lp->ptr = PHYS_MAP_NODE_NIL;
>  }
>
> -static void destroy_all_mappings(void)
> +static void destroy_all_mappings(PhysMap *map)
>  {
> -    destroy_l2_mapping(&phys_map, P_L2_LEVELS - 1);
> -    phys_map_nodes_reset();
> -}
> +    PhysPageEntry *root = &map->root;
>
> -static uint16_t phys_section_add(MemoryRegionSection *section)
> -{
> -    if (phys_sections_nb == phys_sections_nb_alloc) {
> -        phys_sections_nb_alloc = MAX(phys_sections_nb_alloc * 2, 16);
> -        phys_sections = g_renew(MemoryRegionSection, phys_sections,
> -                                phys_sections_nb_alloc);
> -    }
> -    phys_sections[phys_sections_nb] = *section;
> -    return phys_sections_nb++;
> +    destroy_l2_mapping(map, root, P_L2_LEVELS - 1);
>  }
>
> -static void phys_sections_clear(void)
> +static uint16_t phys_section_add(PhysMap *map, MemoryRegionSection *section)
>  {
> -    phys_sections_nb = 0;
> +    if (map->phys_sections_nb == map->phys_sections_nb_alloc) {
> +        map->phys_sections_nb_alloc = MAX(map->phys_sections_nb_alloc * 2, 16);
> +        map->phys_sections = g_renew(MemoryRegionSection, map->phys_sections,
> +                                map->phys_sections_nb_alloc);
> +    }
> +    map->phys_sections[map->phys_sections_nb] = *section;
> +    return map->phys_sections_nb++;
>  }
>
>  /* register physical memory.
> @@ -2232,12 +2291,13 @@ static void phys_sections_clear(void)
>     start_addr and region_offset are rounded down to a page boundary
>     before calculating this offset.  This should not be a problem unless
>     the low bits of start_addr and region_offset differ.  */
> -static void register_subpage(MemoryRegionSection *section)
> +static void register_subpage(PhysMap *map, MemoryRegionSection *section)
>  {
>      subpage_t *subpage;
>      target_phys_addr_t base = section->offset_within_address_space
>          & TARGET_PAGE_MASK;
> -    MemoryRegionSection *existing = phys_page_find(base >> TARGET_PAGE_BITS);
> +    MemoryRegionSection *existing = phys_page_find_internal(map,
> +                                            base >> TARGET_PAGE_BITS);
>      MemoryRegionSection subsection = {
>          .offset_within_address_space = base,
>          .size = TARGET_PAGE_SIZE,
> @@ -2247,30 +2307,30 @@ static void register_subpage(MemoryRegionSection *section)
>      assert(existing->mr->subpage || existing->mr == &io_mem_unassigned);
>
>      if (!(existing->mr->subpage)) {
> -        subpage = subpage_init(base);
> +        subpage = subpage_init(map, base);
>          subsection.mr = &subpage->iomem;
> -        phys_page_set(base >> TARGET_PAGE_BITS, 1,
> -                      phys_section_add(&subsection));
> +        phys_page_set(map, base >> TARGET_PAGE_BITS, 1,
> +                      phys_section_add(map, &subsection));
>      } else {
>          subpage = container_of(existing->mr, subpage_t, iomem);
>      }
>      start = section->offset_within_address_space & ~TARGET_PAGE_MASK;
>      end = start + section->size;
> -    subpage_register(subpage, start, end, phys_section_add(section));
> +    subpage_register(map, subpage, start, end, phys_section_add(map, section));
>  }
>
>
> -static void register_multipage(MemoryRegionSection *section)
> +static void register_multipage(PhysMap *map, MemoryRegionSection *section)
>  {
>      target_phys_addr_t start_addr = section->offset_within_address_space;
>      ram_addr_t size = section->size;
>      target_phys_addr_t addr;
> -    uint16_t section_index = phys_section_add(section);
> +    uint16_t section_index = phys_section_add(map, section);
>
>      assert(size);
>
>      addr = start_addr;
> -    phys_page_set(addr >> TARGET_PAGE_BITS, size >> TARGET_PAGE_BITS,
> +    phys_page_set(map, addr >> TARGET_PAGE_BITS, size >> TARGET_PAGE_BITS,
>                    section_index);
>  }
>
> @@ -2278,13 +2338,14 @@ void cpu_register_physical_memory_log(MemoryRegionSection *section,
>                                        bool readonly)
>  {
>      MemoryRegionSection now = *section, remain = *section;
> +    PhysMap *map = next_map;
>
>      if ((now.offset_within_address_space & ~TARGET_PAGE_MASK)
>          || (now.size < TARGET_PAGE_SIZE)) {
>          now.size = MIN(TARGET_PAGE_ALIGN(now.offset_within_address_space)
>                         - now.offset_within_address_space,
>                         now.size);
> -        register_subpage(&now);
> +        register_subpage(map, &now);
>          remain.size -= now.size;
>          remain.offset_within_address_space += now.size;
>          remain.offset_within_region += now.size;
> @@ -2292,14 +2353,14 @@ void cpu_register_physical_memory_log(MemoryRegionSection *section,
>      now = remain;
>      now.size &= TARGET_PAGE_MASK;
>      if (now.size) {
> -        register_multipage(&now);
> +        register_multipage(map, &now);
>          remain.size -= now.size;
>          remain.offset_within_address_space += now.size;
>          remain.offset_within_region += now.size;
>      }
>      now = remain;
>      if (now.size) {
> -        register_subpage(&now);
> +        register_subpage(map, &now);
>      }
>  }
>
> @@ -3001,7 +3062,7 @@ static uint64_t subpage_read(void *opaque, target_phys_addr_t addr,
>             mmio, len, addr, idx);
>  #endif
>
> -    section = &phys_sections[mmio->sub_section[idx]];
> +    section = &mmio->map->phys_sections[mmio->sub_section[idx]];
>      addr += mmio->base;
>      addr -= section->offset_within_address_space;
>      addr += section->offset_within_region;
> @@ -3020,7 +3081,7 @@ static void subpage_write(void *opaque, target_phys_addr_t addr,
>             __func__, mmio, len, addr, idx, value);
>  #endif
>
> -    section = &phys_sections[mmio->sub_section[idx]];
> +    section = &mmio->map->phys_sections[mmio->sub_section[idx]];
>      addr += mmio->base;
>      addr -= section->offset_within_address_space;
>      addr += section->offset_within_region;
> @@ -3065,8 +3126,8 @@ static const MemoryRegionOps subpage_ram_ops = {
>      .endianness = DEVICE_NATIVE_ENDIAN,
>  };
>
> -static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
> -                             uint16_t section)
> +static int subpage_register(PhysMap *map, subpage_t *mmio, uint32_t start,
> +                              uint32_t end, uint16_t section)
>  {
>      int idx, eidx;
>
> @@ -3078,10 +3139,10 @@ static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
>      printf("%s: %p start %08x end %08x idx %08x eidx %08x mem %ld\n", __func__,
>             mmio, start, end, idx, eidx, memory);
>  #endif
> -    if (memory_region_is_ram(phys_sections[section].mr)) {
> -        MemoryRegionSection new_section = phys_sections[section];
> +    if (memory_region_is_ram(map->phys_sections[section].mr)) {
> +        MemoryRegionSection new_section = map->phys_sections[section];
>          new_section.mr = &io_mem_subpage_ram;
> -        section = phys_section_add(&new_section);
> +        section = phys_section_add(map, &new_section);
>      }
>      for (; idx <= eidx; idx++) {
>          mmio->sub_section[idx] = section;
> @@ -3090,12 +3151,13 @@ static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
>      return 0;
>  }
>
> -static subpage_t *subpage_init(target_phys_addr_t base)
> +static subpage_t *subpage_init(PhysMap *map, target_phys_addr_t base)
>  {
>      subpage_t *mmio;
>
>      mmio = g_malloc0(sizeof(subpage_t));
>
> +    mmio->map = map;
>      mmio->base = base;
>      memory_region_init_io(&mmio->iomem, &subpage_ops, mmio,
>                            "subpage", TARGET_PAGE_SIZE);
> @@ -3104,12 +3166,12 @@ static subpage_t *subpage_init(target_phys_addr_t base)
>      printf("%s: %p base " TARGET_FMT_plx " len %08x %d\n", __func__,
>             mmio, base, TARGET_PAGE_SIZE, subpage_memory);
>  #endif
> -    subpage_register(mmio, 0, TARGET_PAGE_SIZE-1, phys_section_unassigned);
> +    subpage_register(map, mmio, 0, TARGET_PAGE_SIZE-1, phys_section_unassigned);
>
>      return mmio;
>  }
>
> -static uint16_t dummy_section(MemoryRegion *mr)
> +static uint16_t dummy_section(PhysMap *map, MemoryRegion *mr)
>  {
>      MemoryRegionSection section = {
>          .mr = mr,
> @@ -3118,7 +3180,7 @@ static uint16_t dummy_section(MemoryRegion *mr)
>          .size = UINT64_MAX,
>      };
>
> -    return phys_section_add(&section);
> +    return phys_section_add(map, &section);
>  }
>
>  MemoryRegion *iotlb_to_region(target_phys_addr_t index)
> @@ -3140,15 +3202,32 @@ static void io_mem_init(void)
>                            "watch", UINT64_MAX);
>  }
>
> -static void core_begin(MemoryListener *listener)
> +#if 0
> +static void physmap_init(void)
> +{
> +    FlatView v = { .ranges = NULL,
> +                             .nr = 0,
> +                             .nr_allocated = 0,
> +    };
> +
> +    init_map.views[0] = v;
> +    init_map.views[1] = v;
> +    cur_map =  &init_map;
> +}
> +#endif

Please delete.

> +
> +static void core_begin(MemoryListener *listener, PhysMap *new_map)
>  {
> -    destroy_all_mappings();
> -    phys_sections_clear();
> -    phys_map.ptr = PHYS_MAP_NODE_NIL;
> -    phys_section_unassigned = dummy_section(&io_mem_unassigned);
> -    phys_section_notdirty = dummy_section(&io_mem_notdirty);
> -    phys_section_rom = dummy_section(&io_mem_rom);
> -    phys_section_watch = dummy_section(&io_mem_watch);
> +
> +    new_map->root.ptr = PHYS_MAP_NODE_NIL;
> +    new_map->root.is_leaf = 0;
> +
> +    /* In all the map, these sections have the same index */
> +    phys_section_unassigned = dummy_section(new_map, &io_mem_unassigned);
> +    phys_section_notdirty = dummy_section(new_map, &io_mem_notdirty);
> +    phys_section_rom = dummy_section(new_map, &io_mem_rom);
> +    phys_section_watch = dummy_section(new_map, &io_mem_watch);
> +    next_map = new_map;
>  }
>
>  static void core_commit(MemoryListener *listener)
> @@ -3161,6 +3240,16 @@ static void core_commit(MemoryListener *listener)
>      for(env = first_cpu; env != NULL; env = env->next_cpu) {
>          tlb_flush(env, 1);
>      }
> +
> +/* move into high layer
> +    qemu_mutex_lock(&cur_map_lock);
> +    if (cur_map != NULL) {
> +        physmap_put(cur_map);
> +    }
> +    cur_map = next_map;
> +    smp_mb();
> +    qemu_mutex_unlock(&cur_map_lock);
> +*/

Also commented out code should be deleted.

>  }
>
>  static void core_region_add(MemoryListener *listener,
> @@ -3217,7 +3306,7 @@ static void core_eventfd_del(MemoryListener *listener,
>  {
>  }
>
> -static void io_begin(MemoryListener *listener)
> +static void io_begin(MemoryListener *listener, PhysMap *next)
>  {
>  }
>
> @@ -3329,6 +3418,20 @@ static void memory_map_init(void)
>      memory_listener_register(&io_memory_listener, system_io);
>  }
>
> +void physmap_init(void)
> +{
> +    FlatView v = { .ranges = NULL, .nr = 0, .nr_allocated = 0,
> +                           };
> +    PhysMap *init_map = g_malloc0(sizeof(PhysMap));
> +
> +    atomic_set(&init_map->ref, 1);
> +    init_map->root.ptr = PHYS_MAP_NODE_NIL;
> +    init_map->root.is_leaf = 0;
> +    init_map->views[0] = v;
> +    init_map->views[1] = v;
> +    cur_map = init_map;
> +}
> +
>  MemoryRegion *get_system_memory(void)
>  {
>      return system_memory;
> @@ -3391,6 +3494,7 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>      uint32_t val;
>      target_phys_addr_t page;
>      MemoryRegionSection *section;
> +    PhysMap *cur = cur_map_get();
>
>      while (len > 0) {
>          page = addr & TARGET_PAGE_MASK;
> @@ -3472,6 +3576,7 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>          buf += l;
>          addr += l;
>      }
> +    physmap_put(cur);
>  }
>
>  /* used for ROM loading : can write in RAM and ROM */
> diff --git a/hw/vhost.c b/hw/vhost.c
> index 43664e7..df58345 100644
> --- a/hw/vhost.c
> +++ b/hw/vhost.c
> @@ -438,7 +438,7 @@ static bool vhost_section(MemoryRegionSection *section)
>          && memory_region_is_ram(section->mr);
>  }
>
> -static void vhost_begin(MemoryListener *listener)
> +static void vhost_begin(MemoryListener *listener, PhysMap *next)
>  {
>  }
>
> diff --git a/hw/xen_pt.c b/hw/xen_pt.c
> index 3b6d186..fba8586 100644
> --- a/hw/xen_pt.c
> +++ b/hw/xen_pt.c
> @@ -597,7 +597,7 @@ static void xen_pt_region_update(XenPCIPassthroughState *s,
>      }
>  }
>
> -static void xen_pt_begin(MemoryListener *l)
> +static void xen_pt_begin(MemoryListener *l, PhysMap *next)
>  {
>  }
>
> diff --git a/kvm-all.c b/kvm-all.c
> index f8e4328..bc42cab 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -693,7 +693,7 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, bool add)
>      }
>  }
>
> -static void kvm_begin(MemoryListener *listener)
> +static void kvm_begin(MemoryListener *listener, PhysMap *next)
>  {
>  }
>
> diff --git a/memory.c b/memory.c
> index c7f2cfd..54cdc7f 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -20,6 +20,7 @@
>  #include "kvm.h"
>  #include <assert.h>
>  #include "hw/qdev.h"
> +#include "qemu-thread.h"
>
>  #define WANT_EXEC_OBSOLETE
>  #include "exec-obsolete.h"
> @@ -192,7 +193,7 @@ typedef struct AddressSpaceOps AddressSpaceOps;
>  /* A system address space - I/O, memory, etc. */
>  struct AddressSpace {
>      MemoryRegion *root;
> -    FlatView current_map;
> +    int view_id;
>      int ioeventfd_nb;
>      MemoryRegionIoeventfd *ioeventfds;
>  };
> @@ -232,11 +233,6 @@ static void flatview_insert(FlatView *view, unsigned pos, FlatRange *range)
>      ++view->nr;
>  }
>
> -static void flatview_destroy(FlatView *view)
> -{
> -    g_free(view->ranges);
> -}
> -
>  static bool can_merge(FlatRange *r1, FlatRange *r2)
>  {
>      return int128_eq(addrrange_end(r1->addr), r2->addr.start)
> @@ -594,8 +590,10 @@ static void address_space_update_ioeventfds(AddressSpace *as)
>      MemoryRegionIoeventfd *ioeventfds = NULL;
>      AddrRange tmp;
>      unsigned i;
> +    PhysMap *map = cur_map_get();
> +    FlatView *view = &map->views[as->view_id];
>
> -    FOR_EACH_FLAT_RANGE(fr, &as->current_map) {
> +    FOR_EACH_FLAT_RANGE(fr, view) {
>          for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
>              tmp = addrrange_shift(fr->mr->ioeventfds[i].addr,
>                                    int128_sub(fr->addr.start,
> @@ -616,6 +614,7 @@ static void address_space_update_ioeventfds(AddressSpace *as)
>      g_free(as->ioeventfds);
>      as->ioeventfds = ioeventfds;
>      as->ioeventfd_nb = ioeventfd_nb;
> +    physmap_put(map);
>  }
>
>  static void address_space_update_topology_pass(AddressSpace *as,
> @@ -681,21 +680,23 @@ static void address_space_update_topology_pass(AddressSpace *as,
>  }
>
>
> -static void address_space_update_topology(AddressSpace *as)
> +static void address_space_update_topology(AddressSpace *as, PhysMap *prev,
> +                                            PhysMap *next)
>  {
> -    FlatView old_view = as->current_map;
> +    FlatView old_view = prev->views[as->view_id];
>      FlatView new_view = generate_memory_topology(as->root);
>
>      address_space_update_topology_pass(as, old_view, new_view, false);
>      address_space_update_topology_pass(as, old_view, new_view, true);
> +    next->views[as->view_id] = new_view;
>
> -    as->current_map = new_view;
> -    flatview_destroy(&old_view);
>      address_space_update_ioeventfds(as);
>  }
>
>  static void memory_region_update_topology(MemoryRegion *mr)
>  {
> +    PhysMap *prev, *next;
> +
>      if (memory_region_transaction_depth) {
>          memory_region_update_pending |= !mr || mr->enabled;
>          return;
> @@ -705,16 +706,20 @@ static void memory_region_update_topology(MemoryRegion *mr)
>          return;
>      }
>
> -    MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
> +     prev = cur_map_get();
> +    /* allocate PhysMap next here */
> +    next = alloc_next_map();
> +    MEMORY_LISTENER_CALL_GLOBAL(begin, Forward, next);
>
>      if (address_space_memory.root) {
> -        address_space_update_topology(&address_space_memory);
> +        address_space_update_topology(&address_space_memory, prev, next);
>      }
>      if (address_space_io.root) {
> -        address_space_update_topology(&address_space_io);
> +        address_space_update_topology(&address_space_io, prev, next);
>      }
>
>      MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
> +    cur_map_update(next);
>
>      memory_region_update_pending = false;
>  }
> @@ -1071,7 +1076,7 @@ void memory_region_put(MemoryRegion *mr)
>
>      if (atomic_dec_and_test(&mr->ref)) {
>          /* to fix, using call_rcu( ,release) */
> -        mr->life_ops->put(mr);
> +        physmap_reclaimer_enqueue(mr, (ReleaseHandler *)mr->life_ops->put);
>      }
>  }
>
> @@ -1147,13 +1152,18 @@ void memory_region_set_dirty(MemoryRegion *mr, target_phys_addr_t addr,
>  void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
>  {
>      FlatRange *fr;
> +    FlatView *fview;
> +    PhysMap *map;
>
> -    FOR_EACH_FLAT_RANGE(fr, &address_space_memory.current_map) {
> +    map = cur_map_get();
> +    fview = &map->views[address_space_memory.view_id];
> +    FOR_EACH_FLAT_RANGE(fr, fview) {
>          if (fr->mr == mr) {
>              MEMORY_LISTENER_UPDATE_REGION(fr, &address_space_memory,
>                                            Forward, log_sync);
>          }
>      }
> +    physmap_put(map);
>  }
>
>  void memory_region_set_readonly(MemoryRegion *mr, bool readonly)
> @@ -1201,8 +1211,12 @@ static void memory_region_update_coalesced_range(MemoryRegion *mr)
>      FlatRange *fr;
>      CoalescedMemoryRange *cmr;
>      AddrRange tmp;
> +    FlatView *fview;
> +    PhysMap *map;
>
> -    FOR_EACH_FLAT_RANGE(fr, &address_space_memory.current_map) {
> +    map = cur_map_get();
> +    fview = &map->views[address_space_memory.view_id];
> +    FOR_EACH_FLAT_RANGE(fr, fview) {
>          if (fr->mr == mr) {
>              qemu_unregister_coalesced_mmio(int128_get64(fr->addr.start),
>                                             int128_get64(fr->addr.size));
> @@ -1219,6 +1233,7 @@ static void memory_region_update_coalesced_range(MemoryRegion *mr)
>              }
>          }
>      }
> +    physmap_put(map);
>  }
>
>  void memory_region_set_coalescing(MemoryRegion *mr)
> @@ -1458,29 +1473,49 @@ static int cmp_flatrange_addr(const void *addr_, const void *fr_)
>      return 0;
>  }
>
> -static FlatRange *address_space_lookup(AddressSpace *as, AddrRange addr)
> +static FlatRange *address_space_lookup(FlatView *view, AddrRange addr)
>  {
> -    return bsearch(&addr, as->current_map.ranges, as->current_map.nr,
> +    return bsearch(&addr, view->ranges, view->nr,
>                     sizeof(FlatRange), cmp_flatrange_addr);
>  }
>
> +/* dec the ref, which inc by memory_region_find*/
> +void memory_region_section_put(MemoryRegionSection *mrs)
> +{
> +    if (mrs->mr != NULL) {
> +        memory_region_put(mrs->mr);
> +    }
> +}
> +
> +/* inc mr's ref. Caller need dec mr's ref */
>  MemoryRegionSection memory_region_find(MemoryRegion *address_space,
>                                         target_phys_addr_t addr, uint64_t size)
>  {
> +    PhysMap *map;
>      AddressSpace *as = memory_region_to_address_space(address_space);
>      AddrRange range = addrrange_make(int128_make64(addr),
>                                       int128_make64(size));
> -    FlatRange *fr = address_space_lookup(as, range);
> +    FlatView *fview;
> +
> +    map = cur_map_get();
> +
> +    fview = &map->views[as->view_id];
> +    FlatRange *fr = address_space_lookup(fview, range);
>      MemoryRegionSection ret = { .mr = NULL, .size = 0 };
>
>      if (!fr) {
> +        physmap_put(map);
>          return ret;
>      }
>
> -    while (fr > as->current_map.ranges
> +    while (fr > fview->ranges
>             && addrrange_intersects(fr[-1].addr, range)) {
>          --fr;
>      }
> +    /* To fix, the caller must in rcu, or we must inc fr->mr->ref here
> +     */
> +    memory_region_get(fr->mr);
> +    physmap_put(map);
>
>      ret.mr = fr->mr;
>      range = addrrange_intersection(range, fr->addr);
> @@ -1497,10 +1532,13 @@ void memory_global_sync_dirty_bitmap(MemoryRegion *address_space)
>  {
>      AddressSpace *as = memory_region_to_address_space(address_space);
>      FlatRange *fr;
> +    PhysMap *map = cur_map_get();
> +    FlatView *view = &map->views[as->view_id];
>
> -    FOR_EACH_FLAT_RANGE(fr, &as->current_map) {
> +    FOR_EACH_FLAT_RANGE(fr, view) {
>          MEMORY_LISTENER_UPDATE_REGION(fr, as, Forward, log_sync);
>      }
> +    physmap_put(map);
>  }
>
>  void memory_global_dirty_log_start(void)
> @@ -1519,6 +1557,8 @@ static void listener_add_address_space(MemoryListener *listener,
>                                         AddressSpace *as)
>  {
>      FlatRange *fr;
> +    PhysMap *map;
> +    FlatView *view;
>
>      if (listener->address_space_filter
>          && listener->address_space_filter != as->root) {
> @@ -1528,7 +1568,10 @@ static void listener_add_address_space(MemoryListener *listener,
>      if (global_dirty_log) {
>          listener->log_global_start(listener);
>      }
> -    FOR_EACH_FLAT_RANGE(fr, &as->current_map) {
> +
> +    map = cur_map_get();
> +    view = &map->views[as->view_id];
> +    FOR_EACH_FLAT_RANGE(fr, view) {
>          MemoryRegionSection section = {
>              .mr = fr->mr,
>              .address_space = as->root,
> @@ -1539,6 +1582,7 @@ static void listener_add_address_space(MemoryListener *listener,
>          };
>          listener->region_add(listener, &section);
>      }
> +    physmap_put(map);
>  }
>
>  void memory_listener_register(MemoryListener *listener, MemoryRegion *filter)
> @@ -1570,12 +1614,14 @@ void memory_listener_unregister(MemoryListener *listener)
>  void set_system_memory_map(MemoryRegion *mr)
>  {
>      address_space_memory.root = mr;
> +    address_space_memory.view_id = 0;
>      memory_region_update_topology(NULL);
>  }
>
>  void set_system_io_map(MemoryRegion *mr)
>  {
>      address_space_io.root = mr;
> +    address_space_io.view_id = 1;
>      memory_region_update_topology(NULL);
>  }
>
> diff --git a/memory.h b/memory.h
> index 357edd8..18442d4 100644
> --- a/memory.h
> +++ b/memory.h
> @@ -256,7 +256,7 @@ typedef struct MemoryListener MemoryListener;
>   * Use with memory_listener_register() and memory_listener_unregister().
>   */
>  struct MemoryListener {
> -    void (*begin)(MemoryListener *listener);
> +    void (*begin)(MemoryListener *listener, PhysMap *next);
>      void (*commit)(MemoryListener *listener);
>      void (*region_add)(MemoryListener *listener, MemoryRegionSection *section);
>      void (*region_del)(MemoryListener *listener, MemoryRegionSection *section);
> @@ -829,6 +829,13 @@ void mtree_info(fprintf_function mon_printf, void *f);
>
>  void memory_region_get(MemoryRegion *mr);
>  void memory_region_put(MemoryRegion *mr);
> +void physmap_reclaimer_enqueue(void *opaque, ReleaseHandler *release);
> +void physmap_get(PhysMap *map);
> +void physmap_put(PhysMap *map);
> +PhysMap *cur_map_get(void);
> +PhysMap *alloc_next_map(void);
> +void cur_map_update(PhysMap *next);
> +void physmap_init(void);
>  #endif
>
>  #endif
> diff --git a/vl.c b/vl.c
> index 1329c30..12af523 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -3346,6 +3346,7 @@ int main(int argc, char **argv, char **envp)
>      if (ram_size == 0) {
>          ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
>      }
> +    physmap_init();
>
>      configure_accelerator();
>
> diff --git a/xen-all.c b/xen-all.c
> index 59f2323..41d82fd 100644
> --- a/xen-all.c
> +++ b/xen-all.c
> @@ -452,7 +452,7 @@ static void xen_set_memory(struct MemoryListener *listener,
>      }
>  }
>
> -static void xen_begin(MemoryListener *listener)
> +static void xen_begin(MemoryListener *listener, PhysMap *next)
>  {
>  }
>
> --
> 1.7.4.4
>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 06/15] memory: use refcnt to manage MemoryRegion
  2012-08-08  9:20     ` [Qemu-devel] " Avi Kivity
@ 2012-08-09  7:27       ` liu ping fan
  -1 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:27 UTC (permalink / raw)
  To: Avi Kivity
  Cc: qemu-devel, kvm, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber

On Wed, Aug 8, 2012 at 5:20 PM, Avi Kivity <avi@redhat.com> wrote:
> On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> Using refcnt for mr, so we can separate mr's life cycle management
>> from refered object.
>>   When mr->ref 0->1, inc the refered object.
>>   When mr->ref 1->0, dec the refered object.
>>
>> The refered object can be DeviceStae, another mr, or other opaque.
>
> Please explain the motivation more fully.
>
Actually, the aim is to mange the reference of an object, used by mem view.
DeviceState can be referred by different system, when it comes to the
view of subsystem, we hold dev's ref. And any indirect reference will
just mr->ref++, not dev's.
This can help us avoid the down-walk through the referred chain, like
alias----> mr ---> DeviceState.

In the previous discussion, you have suggest add dev->ref++ in
core_region_add.  But I think, if we can move it to higher layer --
memory_region_{add,del}_subregion, so we can avoid to duplicate do
this in other xx_region_add.
As a payment for this, we need to handle alias which can be avoid at
core_region_add().  And mr's ref can help to avoid
 the down-walk.

Regards,
pingfan
> Usually a MemoryRegion will be embedded within some DeviceState, or its
> lifecycle will be managed by the DeviceState.  So long as we keep the
> DeviceState alive all associated MemoryRegions should be alive as well.
>  Why not do this directly?
>
>
> --
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 06/15] memory: use refcnt to manage MemoryRegion
@ 2012-08-09  7:27       ` liu ping fan
  0 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:27 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Wed, Aug 8, 2012 at 5:20 PM, Avi Kivity <avi@redhat.com> wrote:
> On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> Using refcnt for mr, so we can separate mr's life cycle management
>> from refered object.
>>   When mr->ref 0->1, inc the refered object.
>>   When mr->ref 1->0, dec the refered object.
>>
>> The refered object can be DeviceStae, another mr, or other opaque.
>
> Please explain the motivation more fully.
>
Actually, the aim is to mange the reference of an object, used by mem view.
DeviceState can be referred by different system, when it comes to the
view of subsystem, we hold dev's ref. And any indirect reference will
just mr->ref++, not dev's.
This can help us avoid the down-walk through the referred chain, like
alias----> mr ---> DeviceState.

In the previous discussion, you have suggest add dev->ref++ in
core_region_add.  But I think, if we can move it to higher layer --
memory_region_{add,del}_subregion, so we can avoid to duplicate do
this in other xx_region_add.
As a payment for this, we need to handle alias which can be avoid at
core_region_add().  And mr's ref can help to avoid
 the down-walk.

Regards,
pingfan
> Usually a MemoryRegion will be embedded within some DeviceState, or its
> lifecycle will be managed by the DeviceState.  So long as we keep the
> DeviceState alive all associated MemoryRegions should be alive as well.
>  Why not do this directly?
>
>
> --
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 11/15] lock: introduce global lock for device tree
  2012-08-08  9:42     ` [Qemu-devel] " Avi Kivity
@ 2012-08-09  7:27       ` liu ping fan
  -1 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:27 UTC (permalink / raw)
  To: Avi Kivity
  Cc: qemu-devel, kvm, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber

On Wed, Aug 8, 2012 at 5:42 PM, Avi Kivity <avi@redhat.com> wrote:
> On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>
> Please explain the motivation.  AFAICT, the big qemu lock is sufficient.
>
Oh, this is one of the series locks for the removal of big qemu lock.
The degradation of big lock will take several steps, including to
introduce device's private lock. Till then, when the device add path
from iothread and the remove path in io-dispatch is out of the big
qemu lock.  We need this extra lock.

These series is too big, so I send out the 1st phase for review.

Regards,
pingan
>
> --
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 11/15] lock: introduce global lock for device tree
@ 2012-08-09  7:27       ` liu ping fan
  0 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:27 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Wed, Aug 8, 2012 at 5:42 PM, Avi Kivity <avi@redhat.com> wrote:
> On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>
> Please explain the motivation.  AFAICT, the big qemu lock is sufficient.
>
Oh, this is one of the series locks for the removal of big qemu lock.
The degradation of big lock will take several steps, including to
introduce device's private lock. Till then, when the device add path
from iothread and the remove path in io-dispatch is out of the big
qemu lock.  We need this extra lock.

These series is too big, so I send out the 1st phase for review.

Regards,
pingan
>
> --
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 11/15] lock: introduce global lock for device tree
  2012-08-08  9:41     ` [Qemu-devel] " Paolo Bonzini
@ 2012-08-09  7:28       ` liu ping fan
  -1 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, kvm, Anthony Liguori, Avi Kivity, Jan Kiszka,
	Marcelo Tosatti, Stefan Hajnoczi, Blue Swirl,
	Andreas Färber

On Wed, Aug 8, 2012 at 5:41 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  cpus.c      |   12 ++++++++++++
>>  main-loop.h |    3 +++
>>  2 files changed, 15 insertions(+), 0 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index b182b3d..a734b36 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -611,6 +611,7 @@ static void qemu_tcg_init_cpu_signals(void)
>>  }
>>  #endif /* _WIN32 */
>>
>> +QemuMutex qemu_device_tree_mutex;
>>  QemuMutex qemu_global_mutex;
>>  static QemuCond qemu_io_proceeded_cond;
>>  static bool iothread_requesting_mutex;
>> @@ -634,6 +635,7 @@ void qemu_init_cpu_loop(void)
>>      qemu_cond_init(&qemu_work_cond);
>>      qemu_cond_init(&qemu_io_proceeded_cond);
>>      qemu_mutex_init(&qemu_global_mutex);
>> +    qemu_mutex_init(&qemu_device_tree_mutex);
>>
>>      qemu_thread_get_self(&io_thread);
>>  }
>> @@ -911,6 +913,16 @@ void qemu_mutex_unlock_iothread(void)
>>      qemu_mutex_unlock(&qemu_global_mutex);
>>  }
>>
>> +void qemu_lock_devtree(void)
>> +{
>> +    qemu_mutex_lock(&qemu_device_tree_mutex);
>> +}
>> +
>> +void qemu_unlock_devtree(void)
>> +{
>> +    qemu_mutex_unlock(&qemu_device_tree_mutex);
>> +}
>
> We don't need the wrappers.  They are there for the big lock just
> because TCG needs extra work for iothread_requesting_mutex.
>
Sorry, could you give more detail about TCG, what is extra work.

Thanks,
pingfan

> Paolo
>
>>  static int all_vcpus_paused(void)
>>  {
>>      CPUArchState *penv = first_cpu;
>> diff --git a/main-loop.h b/main-loop.h
>> index dce1cd9..17e959a 100644
>> --- a/main-loop.h
>> +++ b/main-loop.h
>> @@ -353,6 +353,9 @@ void qemu_mutex_lock_iothread(void);
>>   */
>>  void qemu_mutex_unlock_iothread(void);
>>
>> +void qemu_lock_devtree(void);
>> +void qemu_unlock_devtree(void);
>> +
>>  /* internal interfaces */
>>
>>  void qemu_fd_register(int fd);
>>
>
>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 11/15] lock: introduce global lock for device tree
@ 2012-08-09  7:28       ` liu ping fan
  0 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

On Wed, Aug 8, 2012 at 5:41 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  cpus.c      |   12 ++++++++++++
>>  main-loop.h |    3 +++
>>  2 files changed, 15 insertions(+), 0 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index b182b3d..a734b36 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -611,6 +611,7 @@ static void qemu_tcg_init_cpu_signals(void)
>>  }
>>  #endif /* _WIN32 */
>>
>> +QemuMutex qemu_device_tree_mutex;
>>  QemuMutex qemu_global_mutex;
>>  static QemuCond qemu_io_proceeded_cond;
>>  static bool iothread_requesting_mutex;
>> @@ -634,6 +635,7 @@ void qemu_init_cpu_loop(void)
>>      qemu_cond_init(&qemu_work_cond);
>>      qemu_cond_init(&qemu_io_proceeded_cond);
>>      qemu_mutex_init(&qemu_global_mutex);
>> +    qemu_mutex_init(&qemu_device_tree_mutex);
>>
>>      qemu_thread_get_self(&io_thread);
>>  }
>> @@ -911,6 +913,16 @@ void qemu_mutex_unlock_iothread(void)
>>      qemu_mutex_unlock(&qemu_global_mutex);
>>  }
>>
>> +void qemu_lock_devtree(void)
>> +{
>> +    qemu_mutex_lock(&qemu_device_tree_mutex);
>> +}
>> +
>> +void qemu_unlock_devtree(void)
>> +{
>> +    qemu_mutex_unlock(&qemu_device_tree_mutex);
>> +}
>
> We don't need the wrappers.  They are there for the big lock just
> because TCG needs extra work for iothread_requesting_mutex.
>
Sorry, could you give more detail about TCG, what is extra work.

Thanks,
pingfan

> Paolo
>
>>  static int all_vcpus_paused(void)
>>  {
>>      CPUArchState *penv = first_cpu;
>> diff --git a/main-loop.h b/main-loop.h
>> index dce1cd9..17e959a 100644
>> --- a/main-loop.h
>> +++ b/main-loop.h
>> @@ -353,6 +353,9 @@ void qemu_mutex_lock_iothread(void);
>>   */
>>  void qemu_mutex_unlock_iothread(void);
>>
>> +void qemu_lock_devtree(void);
>> +void qemu_unlock_devtree(void);
>> +
>>  /* internal interfaces */
>>
>>  void qemu_fd_register(int fd);
>>
>
>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 15/15] e1000: using new interface--unmap to unplug
  2012-08-08  9:56     ` [Qemu-devel] " Paolo Bonzini
@ 2012-08-09  7:28       ` liu ping fan
  -1 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, kvm, Anthony Liguori, Avi Kivity, Jan Kiszka,
	Marcelo Tosatti, Stefan Hajnoczi, Blue Swirl,
	Andreas Färber

On Wed, Aug 8, 2012 at 5:56 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
>>
>> +static void
>> +pci_e1000_unmap(PCIDevice *p)
>> +{
>> +    /* DO NOT FREE anything!until refcnt=0 */
>> +    /* isolate from memory view */
>> +}
>
> At least you need to call the superclass method.
>
Refer to  0013-hotplug-introduce-qdev_unplug_complete-to-remove-dev.patch,
we have the following sequence
qdev_unmap->pci_unmap_device->pci_e1000_unmap.  So pci_e1000_unmap
need not to do anything.

Regards,
pingfan

> Paolo
>
>>  static int
>>  pci_e1000_uninit(PCIDevice *dev)
>>  {
>> @@ -1275,6 +1282,7 @@ static void e1000_class_init(ObjectClass *klass, void *data)
>>      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>>
>>      k->init = pci_e1000_init;
>> +    k->unmap = pci_e1000_unmap;
>>      k->exit = pci_e1000_uninit;
>>      k->romfile = "pxe-e1000.rom";
>>      k->vendor_id = PCI_VENDOR_ID_INTEL;
>
>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 15/15] e1000: using new interface--unmap to unplug
@ 2012-08-09  7:28       ` liu ping fan
  0 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

On Wed, Aug 8, 2012 at 5:56 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
>>
>> +static void
>> +pci_e1000_unmap(PCIDevice *p)
>> +{
>> +    /* DO NOT FREE anything!until refcnt=0 */
>> +    /* isolate from memory view */
>> +}
>
> At least you need to call the superclass method.
>
Refer to  0013-hotplug-introduce-qdev_unplug_complete-to-remove-dev.patch,
we have the following sequence
qdev_unmap->pci_unmap_device->pci_e1000_unmap.  So pci_e1000_unmap
need not to do anything.

Regards,
pingfan

> Paolo
>
>>  static int
>>  pci_e1000_uninit(PCIDevice *dev)
>>  {
>> @@ -1275,6 +1282,7 @@ static void e1000_class_init(ObjectClass *klass, void *data)
>>      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>>
>>      k->init = pci_e1000_init;
>> +    k->unmap = pci_e1000_unmap;
>>      k->exit = pci_e1000_uninit;
>>      k->romfile = "pxe-e1000.rom";
>>      k->vendor_id = PCI_VENDOR_ID_INTEL;
>
>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 13/15] hotplug: introduce qdev_unplug_complete() to remove device from views
  2012-08-08  9:52     ` [Qemu-devel] " Paolo Bonzini
@ 2012-08-09  7:28       ` liu ping fan
  -1 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, kvm, Anthony Liguori, Avi Kivity, Jan Kiszka,
	Marcelo Tosatti, Stefan Hajnoczi, Blue Swirl,
	Andreas Färber

On Wed, Aug 8, 2012 at 5:52 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
>> +void qdev_unplug_complete(DeviceState *dev, Error **errp)
>> +{
>> +    /* isolate from mem view */
>> +    qdev_unmap(dev);
>> +    qemu_lock_devtree();
>> +    /* isolate from device tree */
>> +    qdev_unset_parent(dev);
>> +    qemu_unlock_devtree();
>> +    object_unref(OBJECT(dev));
>
> Rather than deferring the free, you should defer the unref.  Otherwise
> the following can happen when you have "real" RCU access to the memory
> map on the read-side:
>
>     VCPU thread                    I/O thread
> =====================================================================
>     get MMIO request
>     rcu_read_lock()
>     walk memory map
>                                    qdev_unmap()
>                                    lock_devtree()
>                                    ...
>                                    unlock_devtree
>                                    unref dev -> refcnt=0, free enqueued
>     ref()

No ref() for dev here, while we have ref to flatview+radix in my patches.
I use rcu to protect radix+flatview+mr refered. As to dev, its ref has
inc when it is added into mem view -- that is
memory_region_add_subregion -> memory_region_get() {
if(atomic_add_and_return()) dev->ref++  }.
So not until reclaimer of mem view, the dev's ref is hold by mem view.
In a short word, rcu protect mem view, while device is protected by refcnt.

>     rcu_read_unlock()
>                                    free()
>     <dangling pointer!>
>
> If you defer the unref, you have instead
>
>     VCPU thread                    I/O thread
> =====================================================================
>     get MMIO request
>     rcu_read_lock()
>     walk memory map
>                                    qdev_unmap()
>                                    lock_devtree()
>                                    ...
>                                    unlock_devtree
>                                    unref is enqueued
>     ref() -> refcnt = 2
>     rcu_read_unlock()
>                                    unref() -> refcnt=1
>     unref() -> refcnt = 1
>
> So this also makes patch 14 unnecessary.
>
> Paolo
>
>> +}
>
>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 13/15] hotplug: introduce qdev_unplug_complete() to remove device from views
@ 2012-08-09  7:28       ` liu ping fan
  0 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

On Wed, Aug 8, 2012 at 5:52 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
>> +void qdev_unplug_complete(DeviceState *dev, Error **errp)
>> +{
>> +    /* isolate from mem view */
>> +    qdev_unmap(dev);
>> +    qemu_lock_devtree();
>> +    /* isolate from device tree */
>> +    qdev_unset_parent(dev);
>> +    qemu_unlock_devtree();
>> +    object_unref(OBJECT(dev));
>
> Rather than deferring the free, you should defer the unref.  Otherwise
> the following can happen when you have "real" RCU access to the memory
> map on the read-side:
>
>     VCPU thread                    I/O thread
> =====================================================================
>     get MMIO request
>     rcu_read_lock()
>     walk memory map
>                                    qdev_unmap()
>                                    lock_devtree()
>                                    ...
>                                    unlock_devtree
>                                    unref dev -> refcnt=0, free enqueued
>     ref()

No ref() for dev here, while we have ref to flatview+radix in my patches.
I use rcu to protect radix+flatview+mr refered. As to dev, its ref has
inc when it is added into mem view -- that is
memory_region_add_subregion -> memory_region_get() {
if(atomic_add_and_return()) dev->ref++  }.
So not until reclaimer of mem view, the dev's ref is hold by mem view.
In a short word, rcu protect mem view, while device is protected by refcnt.

>     rcu_read_unlock()
>                                    free()
>     <dangling pointer!>
>
> If you defer the unref, you have instead
>
>     VCPU thread                    I/O thread
> =====================================================================
>     get MMIO request
>     rcu_read_lock()
>     walk memory map
>                                    qdev_unmap()
>                                    lock_devtree()
>                                    ...
>                                    unlock_devtree
>                                    unref is enqueued
>     ref() -> refcnt = 2
>     rcu_read_unlock()
>                                    unref() -> refcnt=1
>     unref() -> refcnt = 1
>
> So this also makes patch 14 unnecessary.
>
> Paolo
>
>> +}
>
>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 04/15] memory: MemoryRegion topology must be stable when updating
  2012-08-08 19:17     ` [Qemu-devel] " Blue Swirl
@ 2012-08-09  7:28       ` liu ping fan
  -1 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:28 UTC (permalink / raw)
  To: Blue Swirl
  Cc: qemu-devel, kvm, Anthony Liguori, Avi Kivity, Jan Kiszka,
	Marcelo Tosatti, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Thu, Aug 9, 2012 at 3:17 AM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Wed, Aug 8, 2012 at 6:25 AM, Liu Ping Fan <qemulist@gmail.com> wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> Using mem_map_lock to protect among updaters. So we can get the intact
>> snapshot of mem topology -- FlatView & radix-tree.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  exec.c   |    3 +++
>>  memory.c |   22 ++++++++++++++++++++++
>>  memory.h |    2 ++
>>  3 files changed, 27 insertions(+), 0 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 8244d54..0e29ef9 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -210,6 +210,8 @@ static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
>>     The bottom level has pointers to MemoryRegionSections.  */
>>  static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
>>
>> +QemuMutex mem_map_lock;
>> +
>>  static void io_mem_init(void);
>>  static void memory_map_init(void);
>>
>> @@ -637,6 +639,7 @@ void cpu_exec_init_all(void)
>>  #if !defined(CONFIG_USER_ONLY)
>>      memory_map_init();
>>      io_mem_init();
>> +    qemu_mutex_init(&mem_map_lock);
>
> I'd move this and the mutex to memory.c since there are no other uses.
> The mutex could be static then.
>
But the init entry is in exec.c, not memory.c.

Regards,
pingfan

>>  #endif
>>  }
>>
>> diff --git a/memory.c b/memory.c
>> index aab4a31..5986532 100644
>> --- a/memory.c
>> +++ b/memory.c
>> @@ -761,7 +761,9 @@ void memory_region_transaction_commit(void)
>>      assert(memory_region_transaction_depth);
>>      --memory_region_transaction_depth;
>>      if (!memory_region_transaction_depth && memory_region_update_pending) {
>> +        qemu_mutex_lock(&mem_map_lock);
>>          memory_region_update_topology(NULL);
>> +        qemu_mutex_unlock(&mem_map_lock);
>>      }
>>  }
>>
>> @@ -1069,8 +1071,10 @@ void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
>>  {
>>      uint8_t mask = 1 << client;
>>
>> +    qemu_mutex_lock(&mem_map_lock);
>>      mr->dirty_log_mask = (mr->dirty_log_mask & ~mask) | (log * mask);
>>      memory_region_update_topology(mr);
>> +    qemu_mutex_unlock(&mem_map_lock);
>>  }
>>
>>  bool memory_region_get_dirty(MemoryRegion *mr, target_phys_addr_t addr,
>> @@ -1103,8 +1107,10 @@ void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
>>  void memory_region_set_readonly(MemoryRegion *mr, bool readonly)
>>  {
>>      if (mr->readonly != readonly) {
>> +        qemu_mutex_lock(&mem_map_lock);
>>          mr->readonly = readonly;
>>          memory_region_update_topology(mr);
>> +        qemu_mutex_unlock(&mem_map_lock);
>>      }
>>  }
>>
>> @@ -1112,7 +1118,9 @@ void memory_region_rom_device_set_readable(MemoryRegion *mr, bool readable)
>>  {
>>      if (mr->readable != readable) {
>>          mr->readable = readable;
>> +        qemu_mutex_lock(&mem_map_lock);
>>          memory_region_update_topology(mr);
>> +        qemu_mutex_unlock(&mem_map_lock);
>>      }
>>  }
>>
>> @@ -1206,6 +1214,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
>>      };
>>      unsigned i;
>>
>> +    qemu_mutex_lock(&mem_map_lock);
>>      for (i = 0; i < mr->ioeventfd_nb; ++i) {
>>          if (memory_region_ioeventfd_before(mrfd, mr->ioeventfds[i])) {
>>              break;
>> @@ -1218,6 +1227,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
>>              sizeof(*mr->ioeventfds) * (mr->ioeventfd_nb-1 - i));
>>      mr->ioeventfds[i] = mrfd;
>>      memory_region_update_topology(mr);
>> +    qemu_mutex_unlock(&mem_map_lock);
>>  }
>>
>>  void memory_region_del_eventfd(MemoryRegion *mr,
>> @@ -1236,6 +1246,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
>>      };
>>      unsigned i;
>>
>> +    qemu_mutex_lock(&mem_map_lock);
>>      for (i = 0; i < mr->ioeventfd_nb; ++i) {
>>          if (memory_region_ioeventfd_equal(mrfd, mr->ioeventfds[i])) {
>>              break;
>> @@ -1248,6 +1259,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
>>      mr->ioeventfds = g_realloc(mr->ioeventfds,
>>                                    sizeof(*mr->ioeventfds)*mr->ioeventfd_nb + 1);
>>      memory_region_update_topology(mr);
>> +    qemu_mutex_unlock(&mem_map_lock);
>>  }
>>
>>  static void memory_region_add_subregion_common(MemoryRegion *mr,
>> @@ -1259,6 +1271,8 @@ static void memory_region_add_subregion_common(MemoryRegion *mr,
>>      assert(!subregion->parent);
>>      subregion->parent = mr;
>>      subregion->addr = offset;
>> +
>> +    qemu_mutex_lock(&mem_map_lock);
>>      QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
>>          if (subregion->may_overlap || other->may_overlap) {
>>              continue;
>> @@ -1289,6 +1303,7 @@ static void memory_region_add_subregion_common(MemoryRegion *mr,
>>      QTAILQ_INSERT_TAIL(&mr->subregions, subregion, subregions_link);
>>  done:
>>      memory_region_update_topology(mr);
>> +    qemu_mutex_unlock(&mem_map_lock);
>>  }
>>
>>
>> @@ -1316,8 +1331,11 @@ void memory_region_del_subregion(MemoryRegion *mr,
>>  {
>>      assert(subregion->parent == mr);
>>      subregion->parent = NULL;
>> +
>> +    qemu_mutex_lock(&mem_map_lock);
>>      QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
>>      memory_region_update_topology(mr);
>> +    qemu_mutex_unlock(&mem_map_lock);
>>  }
>>
>>  void memory_region_set_enabled(MemoryRegion *mr, bool enabled)
>> @@ -1325,8 +1343,10 @@ void memory_region_set_enabled(MemoryRegion *mr, bool enabled)
>>      if (enabled == mr->enabled) {
>>          return;
>>      }
>> +    qemu_mutex_lock(&mem_map_lock);
>>      mr->enabled = enabled;
>>      memory_region_update_topology(NULL);
>> +    qemu_mutex_unlock(&mem_map_lock);
>>  }
>>
>>  void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr)
>> @@ -1361,7 +1381,9 @@ void memory_region_set_alias_offset(MemoryRegion *mr, target_phys_addr_t offset)
>>          return;
>>      }
>>
>> +    qemu_mutex_lock(&mem_map_lock);
>>      memory_region_update_topology(mr);
>> +    qemu_mutex_unlock(&mem_map_lock);
>>  }
>>
>>  ram_addr_t memory_region_get_ram_addr(MemoryRegion *mr)
>> diff --git a/memory.h b/memory.h
>> index 740c48e..fe6aefa 100644
>> --- a/memory.h
>> +++ b/memory.h
>> @@ -25,6 +25,7 @@
>>  #include "iorange.h"
>>  #include "ioport.h"
>>  #include "int128.h"
>> +#include "qemu-thread.h"
>>
>>  typedef struct MemoryRegionOps MemoryRegionOps;
>>  typedef struct MemoryRegion MemoryRegion;
>> @@ -207,6 +208,7 @@ struct MemoryListener {
>>      QTAILQ_ENTRY(MemoryListener) link;
>>  };
>>
>> +extern QemuMutex mem_map_lock;
>>  /**
>>   * memory_region_init: Initialize a memory region
>>   *
>> --
>> 1.7.4.4
>>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 04/15] memory: MemoryRegion topology must be stable when updating
@ 2012-08-09  7:28       ` liu ping fan
  0 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:28 UTC (permalink / raw)
  To: Blue Swirl
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Thu, Aug 9, 2012 at 3:17 AM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Wed, Aug 8, 2012 at 6:25 AM, Liu Ping Fan <qemulist@gmail.com> wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> Using mem_map_lock to protect among updaters. So we can get the intact
>> snapshot of mem topology -- FlatView & radix-tree.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  exec.c   |    3 +++
>>  memory.c |   22 ++++++++++++++++++++++
>>  memory.h |    2 ++
>>  3 files changed, 27 insertions(+), 0 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 8244d54..0e29ef9 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -210,6 +210,8 @@ static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
>>     The bottom level has pointers to MemoryRegionSections.  */
>>  static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
>>
>> +QemuMutex mem_map_lock;
>> +
>>  static void io_mem_init(void);
>>  static void memory_map_init(void);
>>
>> @@ -637,6 +639,7 @@ void cpu_exec_init_all(void)
>>  #if !defined(CONFIG_USER_ONLY)
>>      memory_map_init();
>>      io_mem_init();
>> +    qemu_mutex_init(&mem_map_lock);
>
> I'd move this and the mutex to memory.c since there are no other uses.
> The mutex could be static then.
>
But the init entry is in exec.c, not memory.c.

Regards,
pingfan

>>  #endif
>>  }
>>
>> diff --git a/memory.c b/memory.c
>> index aab4a31..5986532 100644
>> --- a/memory.c
>> +++ b/memory.c
>> @@ -761,7 +761,9 @@ void memory_region_transaction_commit(void)
>>      assert(memory_region_transaction_depth);
>>      --memory_region_transaction_depth;
>>      if (!memory_region_transaction_depth && memory_region_update_pending) {
>> +        qemu_mutex_lock(&mem_map_lock);
>>          memory_region_update_topology(NULL);
>> +        qemu_mutex_unlock(&mem_map_lock);
>>      }
>>  }
>>
>> @@ -1069,8 +1071,10 @@ void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
>>  {
>>      uint8_t mask = 1 << client;
>>
>> +    qemu_mutex_lock(&mem_map_lock);
>>      mr->dirty_log_mask = (mr->dirty_log_mask & ~mask) | (log * mask);
>>      memory_region_update_topology(mr);
>> +    qemu_mutex_unlock(&mem_map_lock);
>>  }
>>
>>  bool memory_region_get_dirty(MemoryRegion *mr, target_phys_addr_t addr,
>> @@ -1103,8 +1107,10 @@ void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
>>  void memory_region_set_readonly(MemoryRegion *mr, bool readonly)
>>  {
>>      if (mr->readonly != readonly) {
>> +        qemu_mutex_lock(&mem_map_lock);
>>          mr->readonly = readonly;
>>          memory_region_update_topology(mr);
>> +        qemu_mutex_unlock(&mem_map_lock);
>>      }
>>  }
>>
>> @@ -1112,7 +1118,9 @@ void memory_region_rom_device_set_readable(MemoryRegion *mr, bool readable)
>>  {
>>      if (mr->readable != readable) {
>>          mr->readable = readable;
>> +        qemu_mutex_lock(&mem_map_lock);
>>          memory_region_update_topology(mr);
>> +        qemu_mutex_unlock(&mem_map_lock);
>>      }
>>  }
>>
>> @@ -1206,6 +1214,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
>>      };
>>      unsigned i;
>>
>> +    qemu_mutex_lock(&mem_map_lock);
>>      for (i = 0; i < mr->ioeventfd_nb; ++i) {
>>          if (memory_region_ioeventfd_before(mrfd, mr->ioeventfds[i])) {
>>              break;
>> @@ -1218,6 +1227,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
>>              sizeof(*mr->ioeventfds) * (mr->ioeventfd_nb-1 - i));
>>      mr->ioeventfds[i] = mrfd;
>>      memory_region_update_topology(mr);
>> +    qemu_mutex_unlock(&mem_map_lock);
>>  }
>>
>>  void memory_region_del_eventfd(MemoryRegion *mr,
>> @@ -1236,6 +1246,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
>>      };
>>      unsigned i;
>>
>> +    qemu_mutex_lock(&mem_map_lock);
>>      for (i = 0; i < mr->ioeventfd_nb; ++i) {
>>          if (memory_region_ioeventfd_equal(mrfd, mr->ioeventfds[i])) {
>>              break;
>> @@ -1248,6 +1259,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
>>      mr->ioeventfds = g_realloc(mr->ioeventfds,
>>                                    sizeof(*mr->ioeventfds)*mr->ioeventfd_nb + 1);
>>      memory_region_update_topology(mr);
>> +    qemu_mutex_unlock(&mem_map_lock);
>>  }
>>
>>  static void memory_region_add_subregion_common(MemoryRegion *mr,
>> @@ -1259,6 +1271,8 @@ static void memory_region_add_subregion_common(MemoryRegion *mr,
>>      assert(!subregion->parent);
>>      subregion->parent = mr;
>>      subregion->addr = offset;
>> +
>> +    qemu_mutex_lock(&mem_map_lock);
>>      QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
>>          if (subregion->may_overlap || other->may_overlap) {
>>              continue;
>> @@ -1289,6 +1303,7 @@ static void memory_region_add_subregion_common(MemoryRegion *mr,
>>      QTAILQ_INSERT_TAIL(&mr->subregions, subregion, subregions_link);
>>  done:
>>      memory_region_update_topology(mr);
>> +    qemu_mutex_unlock(&mem_map_lock);
>>  }
>>
>>
>> @@ -1316,8 +1331,11 @@ void memory_region_del_subregion(MemoryRegion *mr,
>>  {
>>      assert(subregion->parent == mr);
>>      subregion->parent = NULL;
>> +
>> +    qemu_mutex_lock(&mem_map_lock);
>>      QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
>>      memory_region_update_topology(mr);
>> +    qemu_mutex_unlock(&mem_map_lock);
>>  }
>>
>>  void memory_region_set_enabled(MemoryRegion *mr, bool enabled)
>> @@ -1325,8 +1343,10 @@ void memory_region_set_enabled(MemoryRegion *mr, bool enabled)
>>      if (enabled == mr->enabled) {
>>          return;
>>      }
>> +    qemu_mutex_lock(&mem_map_lock);
>>      mr->enabled = enabled;
>>      memory_region_update_topology(NULL);
>> +    qemu_mutex_unlock(&mem_map_lock);
>>  }
>>
>>  void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr)
>> @@ -1361,7 +1381,9 @@ void memory_region_set_alias_offset(MemoryRegion *mr, target_phys_addr_t offset)
>>          return;
>>      }
>>
>> +    qemu_mutex_lock(&mem_map_lock);
>>      memory_region_update_topology(mr);
>> +    qemu_mutex_unlock(&mem_map_lock);
>>  }
>>
>>  ram_addr_t memory_region_get_ram_addr(MemoryRegion *mr)
>> diff --git a/memory.h b/memory.h
>> index 740c48e..fe6aefa 100644
>> --- a/memory.h
>> +++ b/memory.h
>> @@ -25,6 +25,7 @@
>>  #include "iorange.h"
>>  #include "ioport.h"
>>  #include "int128.h"
>> +#include "qemu-thread.h"
>>
>>  typedef struct MemoryRegionOps MemoryRegionOps;
>>  typedef struct MemoryRegion MemoryRegion;
>> @@ -207,6 +208,7 @@ struct MemoryListener {
>>      QTAILQ_ENTRY(MemoryListener) link;
>>  };
>>
>> +extern QemuMutex mem_map_lock;
>>  /**
>>   * memory_region_init: Initialize a memory region
>>   *
>> --
>> 1.7.4.4
>>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 04/15] memory: MemoryRegion topology must be stable when updating
  2012-08-08  9:13     ` [Qemu-devel] " Avi Kivity
@ 2012-08-09  7:28       ` liu ping fan
  -1 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:28 UTC (permalink / raw)
  To: Avi Kivity
  Cc: qemu-devel, kvm, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber

On Wed, Aug 8, 2012 at 5:13 PM, Avi Kivity <avi@redhat.com> wrote:
> On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> Using mem_map_lock to protect among updaters. So we can get the intact
>> snapshot of mem topology -- FlatView & radix-tree.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  exec.c   |    3 +++
>>  memory.c |   22 ++++++++++++++++++++++
>>  memory.h |    2 ++
>>  3 files changed, 27 insertions(+), 0 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 8244d54..0e29ef9 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -210,6 +210,8 @@ static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
>>     The bottom level has pointers to MemoryRegionSections.  */
>>  static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
>>
>> +QemuMutex mem_map_lock;
>> +
>>  static void io_mem_init(void);
>>  static void memory_map_init(void);
>>
>> @@ -637,6 +639,7 @@ void cpu_exec_init_all(void)
>>  #if !defined(CONFIG_USER_ONLY)
>>      memory_map_init();
>>      io_mem_init();
>> +    qemu_mutex_init(&mem_map_lock);
>>  #endif
>>  }
>>
>> diff --git a/memory.c b/memory.c
>> index aab4a31..5986532 100644
>> --- a/memory.c
>> +++ b/memory.c
>> @@ -761,7 +761,9 @@ void memory_region_transaction_commit(void)
>>      assert(memory_region_transaction_depth);
>>      --memory_region_transaction_depth;
>>      if (!memory_region_transaction_depth && memory_region_update_pending) {
>> +        qemu_mutex_lock(&mem_map_lock);
>>          memory_region_update_topology(NULL);
>> +        qemu_mutex_unlock(&mem_map_lock);
>>      }
>>  }
>
> Seems to me that nothing in memory.c can susceptible to races.  It must
> already be called under the big qemu lock, and with the exception of
> mutators (memory_region_set_*), changes aren't directly visible.
>
Yes, what I want to do is "prepare unplug out of protection of global
lock".  When io-dispatch and mmio-dispatch are all out of big lock, we
will run into the following scene:
    In vcpu context A, qdev_unplug_complete()-> delete subregion;
    In context B, write pci bar --> pci mapping update    -> add subregion

> I think it's sufficient to take the mem_map_lock at the beginning of
> core_begin() and drop it at the end of core_commit().  That means all
> updates of volatile state, phys_map, are protected.
>
The mem_map_lock is to protect both address_space_io and
address_space_memory. When without the protection of big lock,
competing will raise among the updaters
(memory_region_{add,del}_subregion and the readers
generate_memory_topology()->render_memory_region().

If just in core_begin/commit, we will duplicate it for
xx_begin/commit, right?  And at the same time, mr->subregions is
exposed under SMP without big lock.

Thanks and regards,
pingfan

>
>
> --
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 04/15] memory: MemoryRegion topology must be stable when updating
@ 2012-08-09  7:28       ` liu ping fan
  0 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:28 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Wed, Aug 8, 2012 at 5:13 PM, Avi Kivity <avi@redhat.com> wrote:
> On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> Using mem_map_lock to protect among updaters. So we can get the intact
>> snapshot of mem topology -- FlatView & radix-tree.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  exec.c   |    3 +++
>>  memory.c |   22 ++++++++++++++++++++++
>>  memory.h |    2 ++
>>  3 files changed, 27 insertions(+), 0 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 8244d54..0e29ef9 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -210,6 +210,8 @@ static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
>>     The bottom level has pointers to MemoryRegionSections.  */
>>  static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
>>
>> +QemuMutex mem_map_lock;
>> +
>>  static void io_mem_init(void);
>>  static void memory_map_init(void);
>>
>> @@ -637,6 +639,7 @@ void cpu_exec_init_all(void)
>>  #if !defined(CONFIG_USER_ONLY)
>>      memory_map_init();
>>      io_mem_init();
>> +    qemu_mutex_init(&mem_map_lock);
>>  #endif
>>  }
>>
>> diff --git a/memory.c b/memory.c
>> index aab4a31..5986532 100644
>> --- a/memory.c
>> +++ b/memory.c
>> @@ -761,7 +761,9 @@ void memory_region_transaction_commit(void)
>>      assert(memory_region_transaction_depth);
>>      --memory_region_transaction_depth;
>>      if (!memory_region_transaction_depth && memory_region_update_pending) {
>> +        qemu_mutex_lock(&mem_map_lock);
>>          memory_region_update_topology(NULL);
>> +        qemu_mutex_unlock(&mem_map_lock);
>>      }
>>  }
>
> Seems to me that nothing in memory.c can susceptible to races.  It must
> already be called under the big qemu lock, and with the exception of
> mutators (memory_region_set_*), changes aren't directly visible.
>
Yes, what I want to do is "prepare unplug out of protection of global
lock".  When io-dispatch and mmio-dispatch are all out of big lock, we
will run into the following scene:
    In vcpu context A, qdev_unplug_complete()-> delete subregion;
    In context B, write pci bar --> pci mapping update    -> add subregion

> I think it's sufficient to take the mem_map_lock at the beginning of
> core_begin() and drop it at the end of core_commit().  That means all
> updates of volatile state, phys_map, are protected.
>
The mem_map_lock is to protect both address_space_io and
address_space_memory. When without the protection of big lock,
competing will raise among the updaters
(memory_region_{add,del}_subregion and the readers
generate_memory_topology()->render_memory_region().

If just in core_begin/commit, we will duplicate it for
xx_begin/commit, right?  And at the same time, mr->subregions is
exposed under SMP without big lock.

Thanks and regards,
pingfan

>
>
> --
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 08/15] memory: introduce PhysMap to present snapshot of toploygy
  2012-08-08 19:18     ` [Qemu-devel] " Blue Swirl
@ 2012-08-09  7:29       ` liu ping fan
  -1 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:29 UTC (permalink / raw)
  To: Blue Swirl
  Cc: qemu-devel, kvm, Anthony Liguori, Avi Kivity, Jan Kiszka,
	Marcelo Tosatti, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Thu, Aug 9, 2012 at 3:18 AM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Wed, Aug 8, 2012 at 6:25 AM, Liu Ping Fan <qemulist@gmail.com> wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> PhysMap contain the flatview and radix-tree view, they are snapshot
>> of system topology and should be consistent. With PhysMap, we can
>> swap the pointer when updating and achieve the atomic.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  exec.c   |    8 --------
>>  memory.c |   33 ---------------------------------
>>  memory.h |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>>  3 files changed, 60 insertions(+), 43 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 0e29ef9..01b91b0 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -156,8 +156,6 @@ typedef struct PageDesc {
>>  #endif
>>
>>  /* Size of the L2 (and L3, etc) page tables.  */
>
> Please copy this comment to the header file.
>
OK, thanks.
pingfan

>> -#define L2_BITS 10
>> -#define L2_SIZE (1 << L2_BITS)
>>
>>  #define P_L2_LEVELS \
>>      (((TARGET_PHYS_ADDR_SPACE_BITS - TARGET_PAGE_BITS - 1) / L2_BITS) + 1)
>> @@ -185,7 +183,6 @@ uintptr_t qemu_host_page_mask;
>>  static void *l1_map[V_L1_SIZE];
>>
>>  #if !defined(CONFIG_USER_ONLY)
>> -typedef struct PhysPageEntry PhysPageEntry;
>>
>>  static MemoryRegionSection *phys_sections;
>>  static unsigned phys_sections_nb, phys_sections_nb_alloc;
>> @@ -194,11 +191,6 @@ static uint16_t phys_section_notdirty;
>>  static uint16_t phys_section_rom;
>>  static uint16_t phys_section_watch;
>>
>> -struct PhysPageEntry {
>> -    uint16_t is_leaf : 1;
>> -     /* index into phys_sections (is_leaf) or phys_map_nodes (!is_leaf) */
>> -    uint16_t ptr : 15;
>> -};
>>
>>  /* Simple allocator for PhysPageEntry nodes */
>>  static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
>> diff --git a/memory.c b/memory.c
>> index 2eaa2fc..c7f2cfd 100644
>> --- a/memory.c
>> +++ b/memory.c
>> @@ -31,17 +31,6 @@ static bool global_dirty_log = false;
>>  static QTAILQ_HEAD(memory_listeners, MemoryListener) memory_listeners
>>      = QTAILQ_HEAD_INITIALIZER(memory_listeners);
>>
>> -typedef struct AddrRange AddrRange;
>> -
>> -/*
>> - * Note using signed integers limits us to physical addresses at most
>> - * 63 bits wide.  They are needed for negative offsetting in aliases
>> - * (large MemoryRegion::alias_offset).
>> - */
>> -struct AddrRange {
>> -    Int128 start;
>> -    Int128 size;
>> -};
>>
>>  static AddrRange addrrange_make(Int128 start, Int128 size)
>>  {
>> @@ -197,28 +186,6 @@ static bool memory_region_ioeventfd_equal(MemoryRegionIoeventfd a,
>>          && !memory_region_ioeventfd_before(b, a);
>>  }
>>
>> -typedef struct FlatRange FlatRange;
>> -typedef struct FlatView FlatView;
>> -
>> -/* Range of memory in the global map.  Addresses are absolute. */
>> -struct FlatRange {
>> -    MemoryRegion *mr;
>> -    target_phys_addr_t offset_in_region;
>> -    AddrRange addr;
>> -    uint8_t dirty_log_mask;
>> -    bool readable;
>> -    bool readonly;
>> -};
>> -
>> -/* Flattened global view of current active memory hierarchy.  Kept in sorted
>> - * order.
>> - */
>> -struct FlatView {
>> -    FlatRange *ranges;
>> -    unsigned nr;
>> -    unsigned nr_allocated;
>> -};
>> -
>>  typedef struct AddressSpace AddressSpace;
>>  typedef struct AddressSpaceOps AddressSpaceOps;
>>
>> diff --git a/memory.h b/memory.h
>> index 740f018..357edd8 100644
>> --- a/memory.h
>> +++ b/memory.h
>> @@ -29,12 +29,72 @@
>>  #include "qemu-thread.h"
>>  #include "qemu/reclaimer.h"
>>
>> +typedef struct AddrRange AddrRange;
>> +typedef struct FlatRange FlatRange;
>> +typedef struct FlatView FlatView;
>> +typedef struct PhysPageEntry PhysPageEntry;
>> +typedef struct PhysMap PhysMap;
>> +typedef struct MemoryRegionSection MemoryRegionSection;
>>  typedef struct MemoryRegionOps MemoryRegionOps;
>>  typedef struct MemoryRegionLifeOps MemoryRegionLifeOps;
>>  typedef struct MemoryRegion MemoryRegion;
>>  typedef struct MemoryRegionPortio MemoryRegionPortio;
>>  typedef struct MemoryRegionMmio MemoryRegionMmio;
>>
>> +/*
>> + * Note using signed integers limits us to physical addresses at most
>> + * 63 bits wide.  They are needed for negative offsetting in aliases
>> + * (large MemoryRegion::alias_offset).
>> + */
>> +struct AddrRange {
>> +    Int128 start;
>> +    Int128 size;
>> +};
>> +
>> +/* Range of memory in the global map.  Addresses are absolute. */
>> +struct FlatRange {
>> +    MemoryRegion *mr;
>> +    target_phys_addr_t offset_in_region;
>> +    AddrRange addr;
>> +    uint8_t dirty_log_mask;
>> +    bool readable;
>> +    bool readonly;
>> +};
>> +
>> +/* Flattened global view of current active memory hierarchy.  Kept in sorted
>> + * order.
>> + */
>> +struct FlatView {
>> +    FlatRange *ranges;
>> +    unsigned nr;
>> +    unsigned nr_allocated;
>> +};
>> +
>> +struct PhysPageEntry {
>> +    uint16_t is_leaf:1;
>> +     /* index into phys_sections (is_leaf) or phys_map_nodes (!is_leaf) */
>> +    uint16_t ptr:15;
>> +};
>> +
>> +#define L2_BITS 10
>> +#define L2_SIZE (1 << L2_BITS)
>> +/* This is a multi-level map on the physical address space.
>> +   The bottom level has pointers to MemoryRegionSections.  */
>> +struct PhysMap {
>> +    Atomic ref;
>> +    PhysPageEntry root;
>> +    PhysPageEntry (*phys_map_nodes)[L2_SIZE];
>> +    unsigned phys_map_nodes_nb;
>> +    unsigned phys_map_nodes_nb_alloc;
>> +
>> +    MemoryRegionSection *phys_sections;
>> +    unsigned phys_sections_nb;
>> +    unsigned phys_sections_nb_alloc;
>> +
>> +    /* FlatView */
>> +    FlatView views[2];
>> +};
>> +
>>  /* Must match *_DIRTY_FLAGS in cpu-all.h.  To be replaced with dynamic
>>   * registration.
>>   */
>> @@ -167,8 +227,6 @@ struct MemoryRegionPortio {
>>
>>  #define PORTIO_END_OF_LIST() { }
>>
>> -typedef struct MemoryRegionSection MemoryRegionSection;
>> -
>>  /**
>>   * MemoryRegionSection: describes a fragment of a #MemoryRegion
>>   *
>> --
>> 1.7.4.4
>>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 08/15] memory: introduce PhysMap to present snapshot of toploygy
@ 2012-08-09  7:29       ` liu ping fan
  0 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:29 UTC (permalink / raw)
  To: Blue Swirl
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Thu, Aug 9, 2012 at 3:18 AM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Wed, Aug 8, 2012 at 6:25 AM, Liu Ping Fan <qemulist@gmail.com> wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> PhysMap contain the flatview and radix-tree view, they are snapshot
>> of system topology and should be consistent. With PhysMap, we can
>> swap the pointer when updating and achieve the atomic.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  exec.c   |    8 --------
>>  memory.c |   33 ---------------------------------
>>  memory.h |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>>  3 files changed, 60 insertions(+), 43 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 0e29ef9..01b91b0 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -156,8 +156,6 @@ typedef struct PageDesc {
>>  #endif
>>
>>  /* Size of the L2 (and L3, etc) page tables.  */
>
> Please copy this comment to the header file.
>
OK, thanks.
pingfan

>> -#define L2_BITS 10
>> -#define L2_SIZE (1 << L2_BITS)
>>
>>  #define P_L2_LEVELS \
>>      (((TARGET_PHYS_ADDR_SPACE_BITS - TARGET_PAGE_BITS - 1) / L2_BITS) + 1)
>> @@ -185,7 +183,6 @@ uintptr_t qemu_host_page_mask;
>>  static void *l1_map[V_L1_SIZE];
>>
>>  #if !defined(CONFIG_USER_ONLY)
>> -typedef struct PhysPageEntry PhysPageEntry;
>>
>>  static MemoryRegionSection *phys_sections;
>>  static unsigned phys_sections_nb, phys_sections_nb_alloc;
>> @@ -194,11 +191,6 @@ static uint16_t phys_section_notdirty;
>>  static uint16_t phys_section_rom;
>>  static uint16_t phys_section_watch;
>>
>> -struct PhysPageEntry {
>> -    uint16_t is_leaf : 1;
>> -     /* index into phys_sections (is_leaf) or phys_map_nodes (!is_leaf) */
>> -    uint16_t ptr : 15;
>> -};
>>
>>  /* Simple allocator for PhysPageEntry nodes */
>>  static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
>> diff --git a/memory.c b/memory.c
>> index 2eaa2fc..c7f2cfd 100644
>> --- a/memory.c
>> +++ b/memory.c
>> @@ -31,17 +31,6 @@ static bool global_dirty_log = false;
>>  static QTAILQ_HEAD(memory_listeners, MemoryListener) memory_listeners
>>      = QTAILQ_HEAD_INITIALIZER(memory_listeners);
>>
>> -typedef struct AddrRange AddrRange;
>> -
>> -/*
>> - * Note using signed integers limits us to physical addresses at most
>> - * 63 bits wide.  They are needed for negative offsetting in aliases
>> - * (large MemoryRegion::alias_offset).
>> - */
>> -struct AddrRange {
>> -    Int128 start;
>> -    Int128 size;
>> -};
>>
>>  static AddrRange addrrange_make(Int128 start, Int128 size)
>>  {
>> @@ -197,28 +186,6 @@ static bool memory_region_ioeventfd_equal(MemoryRegionIoeventfd a,
>>          && !memory_region_ioeventfd_before(b, a);
>>  }
>>
>> -typedef struct FlatRange FlatRange;
>> -typedef struct FlatView FlatView;
>> -
>> -/* Range of memory in the global map.  Addresses are absolute. */
>> -struct FlatRange {
>> -    MemoryRegion *mr;
>> -    target_phys_addr_t offset_in_region;
>> -    AddrRange addr;
>> -    uint8_t dirty_log_mask;
>> -    bool readable;
>> -    bool readonly;
>> -};
>> -
>> -/* Flattened global view of current active memory hierarchy.  Kept in sorted
>> - * order.
>> - */
>> -struct FlatView {
>> -    FlatRange *ranges;
>> -    unsigned nr;
>> -    unsigned nr_allocated;
>> -};
>> -
>>  typedef struct AddressSpace AddressSpace;
>>  typedef struct AddressSpaceOps AddressSpaceOps;
>>
>> diff --git a/memory.h b/memory.h
>> index 740f018..357edd8 100644
>> --- a/memory.h
>> +++ b/memory.h
>> @@ -29,12 +29,72 @@
>>  #include "qemu-thread.h"
>>  #include "qemu/reclaimer.h"
>>
>> +typedef struct AddrRange AddrRange;
>> +typedef struct FlatRange FlatRange;
>> +typedef struct FlatView FlatView;
>> +typedef struct PhysPageEntry PhysPageEntry;
>> +typedef struct PhysMap PhysMap;
>> +typedef struct MemoryRegionSection MemoryRegionSection;
>>  typedef struct MemoryRegionOps MemoryRegionOps;
>>  typedef struct MemoryRegionLifeOps MemoryRegionLifeOps;
>>  typedef struct MemoryRegion MemoryRegion;
>>  typedef struct MemoryRegionPortio MemoryRegionPortio;
>>  typedef struct MemoryRegionMmio MemoryRegionMmio;
>>
>> +/*
>> + * Note using signed integers limits us to physical addresses at most
>> + * 63 bits wide.  They are needed for negative offsetting in aliases
>> + * (large MemoryRegion::alias_offset).
>> + */
>> +struct AddrRange {
>> +    Int128 start;
>> +    Int128 size;
>> +};
>> +
>> +/* Range of memory in the global map.  Addresses are absolute. */
>> +struct FlatRange {
>> +    MemoryRegion *mr;
>> +    target_phys_addr_t offset_in_region;
>> +    AddrRange addr;
>> +    uint8_t dirty_log_mask;
>> +    bool readable;
>> +    bool readonly;
>> +};
>> +
>> +/* Flattened global view of current active memory hierarchy.  Kept in sorted
>> + * order.
>> + */
>> +struct FlatView {
>> +    FlatRange *ranges;
>> +    unsigned nr;
>> +    unsigned nr_allocated;
>> +};
>> +
>> +struct PhysPageEntry {
>> +    uint16_t is_leaf:1;
>> +     /* index into phys_sections (is_leaf) or phys_map_nodes (!is_leaf) */
>> +    uint16_t ptr:15;
>> +};
>> +
>> +#define L2_BITS 10
>> +#define L2_SIZE (1 << L2_BITS)
>> +/* This is a multi-level map on the physical address space.
>> +   The bottom level has pointers to MemoryRegionSections.  */
>> +struct PhysMap {
>> +    Atomic ref;
>> +    PhysPageEntry root;
>> +    PhysPageEntry (*phys_map_nodes)[L2_SIZE];
>> +    unsigned phys_map_nodes_nb;
>> +    unsigned phys_map_nodes_nb_alloc;
>> +
>> +    MemoryRegionSection *phys_sections;
>> +    unsigned phys_sections_nb;
>> +    unsigned phys_sections_nb_alloc;
>> +
>> +    /* FlatView */
>> +    FlatView views[2];
>> +};
>> +
>>  /* Must match *_DIRTY_FLAGS in cpu-all.h.  To be replaced with dynamic
>>   * registration.
>>   */
>> @@ -167,8 +227,6 @@ struct MemoryRegionPortio {
>>
>>  #define PORTIO_END_OF_LIST() { }
>>
>> -typedef struct MemoryRegionSection MemoryRegionSection;
>> -
>>  /**
>>   * MemoryRegionSection: describes a fragment of a #MemoryRegion
>>   *
>> --
>> 1.7.4.4
>>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 09/15] memory: prepare flatview and radix-tree for rcu style access
  2012-08-08 19:23     ` [Qemu-devel] " Blue Swirl
@ 2012-08-09  7:29       ` liu ping fan
  -1 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:29 UTC (permalink / raw)
  To: Blue Swirl
  Cc: qemu-devel, kvm, Anthony Liguori, Avi Kivity, Jan Kiszka,
	Marcelo Tosatti, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Thu, Aug 9, 2012 at 3:23 AM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Wed, Aug 8, 2012 at 6:25 AM, Liu Ping Fan <qemulist@gmail.com> wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> Flatview and radix view are all under the protection of pointer.
>> And this make sure the change of them seem to be atomic!
>>
>> The mr accessed by radix-tree leaf or flatview will be reclaimed
>> after the prev PhysMap not in use any longer
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  exec.c      |  303 +++++++++++++++++++++++++++++++++++++++-------------------
>>  hw/vhost.c  |    2 +-
>>  hw/xen_pt.c |    2 +-
>>  kvm-all.c   |    2 +-
>>  memory.c    |   92 ++++++++++++++-----
>>  memory.h    |    9 ++-
>>  vl.c        |    1 +
>>  xen-all.c   |    2 +-
>>  8 files changed, 286 insertions(+), 127 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 01b91b0..97addb9 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -24,6 +24,7 @@
>>  #include <sys/mman.h>
>>  #endif
>>
>> +#include "qemu/atomic.h"
>>  #include "qemu-common.h"
>>  #include "cpu.h"
>>  #include "tcg.h"
>> @@ -35,6 +36,8 @@
>>  #include "qemu-timer.h"
>>  #include "memory.h"
>>  #include "exec-memory.h"
>> +#include "qemu-thread.h"
>> +#include "qemu/reclaimer.h"
>>  #if defined(CONFIG_USER_ONLY)
>>  #include <qemu.h>
>>  #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
>> @@ -184,25 +187,17 @@ static void *l1_map[V_L1_SIZE];
>>
>>  #if !defined(CONFIG_USER_ONLY)
>>
>> -static MemoryRegionSection *phys_sections;
>> -static unsigned phys_sections_nb, phys_sections_nb_alloc;
>>  static uint16_t phys_section_unassigned;
>>  static uint16_t phys_section_notdirty;
>>  static uint16_t phys_section_rom;
>>  static uint16_t phys_section_watch;
>>
>> -
>> -/* Simple allocator for PhysPageEntry nodes */
>> -static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
>> -static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
>> -
>>  #define PHYS_MAP_NODE_NIL (((uint16_t)~0) >> 1)
>>
>> -/* This is a multi-level map on the physical address space.
>> -   The bottom level has pointers to MemoryRegionSections.  */
>> -static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
>> -
>> +static QemuMutex cur_map_lock;
>> +static PhysMap *cur_map;
>>  QemuMutex mem_map_lock;
>> +static PhysMap *next_map;
>>
>>  static void io_mem_init(void);
>>  static void memory_map_init(void);
>> @@ -383,41 +378,38 @@ static inline PageDesc *page_find(tb_page_addr_t index)
>>
>>  #if !defined(CONFIG_USER_ONLY)
>>
>> -static void phys_map_node_reserve(unsigned nodes)
>> +static void phys_map_node_reserve(PhysMap *map, unsigned nodes)
>>  {
>> -    if (phys_map_nodes_nb + nodes > phys_map_nodes_nb_alloc) {
>> +    if (map->phys_map_nodes_nb + nodes > map->phys_map_nodes_nb_alloc) {
>>          typedef PhysPageEntry Node[L2_SIZE];
>> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc * 2, 16);
>> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc,
>> -                                      phys_map_nodes_nb + nodes);
>> -        phys_map_nodes = g_renew(Node, phys_map_nodes,
>> -                                 phys_map_nodes_nb_alloc);
>> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc * 2,
>> +                                                                        16);
>> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc,
>> +                                      map->phys_map_nodes_nb + nodes);
>> +        map->phys_map_nodes = g_renew(Node, map->phys_map_nodes,
>> +                                 map->phys_map_nodes_nb_alloc);
>>      }
>>  }
>>
>> -static uint16_t phys_map_node_alloc(void)
>> +static uint16_t phys_map_node_alloc(PhysMap *map)
>>  {
>>      unsigned i;
>>      uint16_t ret;
>>
>> -    ret = phys_map_nodes_nb++;
>> +    ret = map->phys_map_nodes_nb++;
>>      assert(ret != PHYS_MAP_NODE_NIL);
>> -    assert(ret != phys_map_nodes_nb_alloc);
>> +    assert(ret != map->phys_map_nodes_nb_alloc);
>>      for (i = 0; i < L2_SIZE; ++i) {
>> -        phys_map_nodes[ret][i].is_leaf = 0;
>> -        phys_map_nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
>> +        map->phys_map_nodes[ret][i].is_leaf = 0;
>> +        map->phys_map_nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
>>      }
>>      return ret;
>>  }
>>
>> -static void phys_map_nodes_reset(void)
>> -{
>> -    phys_map_nodes_nb = 0;
>> -}
>> -
>> -
>> -static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
>> -                                target_phys_addr_t *nb, uint16_t leaf,
>> +static void phys_page_set_level(PhysMap *map, PhysPageEntry *lp,
>> +                                target_phys_addr_t *index,
>> +                                target_phys_addr_t *nb,
>> +                                uint16_t leaf,
>>                                  int level)
>>  {
>>      PhysPageEntry *p;
>> @@ -425,8 +417,8 @@ static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
>>      target_phys_addr_t step = (target_phys_addr_t)1 << (level * L2_BITS);
>>
>>      if (!lp->is_leaf && lp->ptr == PHYS_MAP_NODE_NIL) {
>> -        lp->ptr = phys_map_node_alloc();
>> -        p = phys_map_nodes[lp->ptr];
>> +        lp->ptr = phys_map_node_alloc(map);
>> +        p = map->phys_map_nodes[lp->ptr];
>>          if (level == 0) {
>>              for (i = 0; i < L2_SIZE; i++) {
>>                  p[i].is_leaf = 1;
>> @@ -434,7 +426,7 @@ static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
>>              }
>>          }
>>      } else {
>> -        p = phys_map_nodes[lp->ptr];
>> +        p = map->phys_map_nodes[lp->ptr];
>>      }
>>      lp = &p[(*index >> (level * L2_BITS)) & (L2_SIZE - 1)];
>>
>> @@ -445,24 +437,27 @@ static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
>>              *index += step;
>>              *nb -= step;
>>          } else {
>> -            phys_page_set_level(lp, index, nb, leaf, level - 1);
>> +            phys_page_set_level(map, lp, index, nb, leaf, level - 1);
>>          }
>>          ++lp;
>>      }
>>  }
>>
>> -static void phys_page_set(target_phys_addr_t index, target_phys_addr_t nb,
>> -                          uint16_t leaf)
>> +static void phys_page_set(PhysMap *map, target_phys_addr_t index,
>> +                            target_phys_addr_t nb,
>> +                            uint16_t leaf)
>>  {
>>      /* Wildly overreserve - it doesn't matter much. */
>> -    phys_map_node_reserve(3 * P_L2_LEVELS);
>> +    phys_map_node_reserve(map, 3 * P_L2_LEVELS);
>>
>> -    phys_page_set_level(&phys_map, &index, &nb, leaf, P_L2_LEVELS - 1);
>> +    /* update in new tree*/
>> +    phys_page_set_level(map, &map->root, &index, &nb, leaf, P_L2_LEVELS - 1);
>>  }
>>
>> -MemoryRegionSection *phys_page_find(target_phys_addr_t index)
>> +static MemoryRegionSection *phys_page_find_internal(PhysMap *map,
>> +                           target_phys_addr_t index)
>>  {
>> -    PhysPageEntry lp = phys_map;
>> +    PhysPageEntry lp = map->root;
>>      PhysPageEntry *p;
>>      int i;
>>      uint16_t s_index = phys_section_unassigned;
>> @@ -471,13 +466,79 @@ MemoryRegionSection *phys_page_find(target_phys_addr_t index)
>>          if (lp.ptr == PHYS_MAP_NODE_NIL) {
>>              goto not_found;
>>          }
>> -        p = phys_map_nodes[lp.ptr];
>> +        p = map->phys_map_nodes[lp.ptr];
>>          lp = p[(index >> (i * L2_BITS)) & (L2_SIZE - 1)];
>>      }
>>
>>      s_index = lp.ptr;
>>  not_found:
>> -    return &phys_sections[s_index];
>> +    return &map->phys_sections[s_index];
>> +}
>> +
>> +MemoryRegionSection *phys_page_find(target_phys_addr_t index)
>> +{
>> +    return phys_page_find_internal(cur_map, index);
>> +}
>> +
>> +void physmap_get(PhysMap *map)
>> +{
>> +    atomic_inc(&map->ref);
>> +}
>> +
>> +/* Untill rcu read side finished, do this reclaim */
>
> Until
>
adopted

>> +static ChunkHead physmap_reclaimer_list = { .lh_first = NULL };
>
> Please insert a blank line here.
>
adopted

>> +void physmap_reclaimer_enqueue(void *opaque, ReleaseHandler *release)
>> +{
>> +    reclaimer_enqueue(&physmap_reclaimer_list, opaque, release);
>> +}
>> +
>> +static void destroy_all_mappings(PhysMap *map);
>
> Prototypes belong to the top of the file.
>
adopted

>> +static void phys_map_release(PhysMap *map)
>> +{
>> +    /* emulate for rcu reclaimer for mr */
>> +    reclaimer_worker(&physmap_reclaimer_list);
>> +
>> +    destroy_all_mappings(map);
>> +    g_free(map->phys_map_nodes);
>> +    g_free(map->phys_sections);
>> +    g_free(map->views[0].ranges);
>> +    g_free(map->views[1].ranges);
>> +    g_free(map);
>> +}
>> +
>> +void physmap_put(PhysMap *map)
>> +{
>> +    if (atomic_dec_and_test(&map->ref)) {
>> +        phys_map_release(map);
>> +    }
>> +}
>> +
>> +void cur_map_update(PhysMap *next)
>> +{
>> +    qemu_mutex_lock(&cur_map_lock);
>> +    physmap_put(cur_map);
>> +    cur_map = next;
>> +    smp_mb();
>> +    qemu_mutex_unlock(&cur_map_lock);
>> +}
>> +
>> +PhysMap *cur_map_get(void)
>> +{
>> +    PhysMap *ret;
>> +
>> +    qemu_mutex_lock(&cur_map_lock);
>> +    ret = cur_map;
>> +    physmap_get(ret);
>> +    smp_mb();
>> +    qemu_mutex_unlock(&cur_map_lock);
>> +    return ret;
>> +}
>> +
>> +PhysMap *alloc_next_map(void)
>> +{
>> +    PhysMap *next = g_malloc0(sizeof(PhysMap));
>> +    atomic_set(&next->ref, 1);
>> +    return next;
>>  }
>>
>>  bool memory_region_is_unassigned(MemoryRegion *mr)
>> @@ -632,6 +693,7 @@ void cpu_exec_init_all(void)
>>      memory_map_init();
>>      io_mem_init();
>>      qemu_mutex_init(&mem_map_lock);
>> +    qemu_mutex_init(&cur_map_lock);
>>  #endif
>>  }
>>
>> @@ -2161,17 +2223,18 @@ int page_unprotect(target_ulong address, uintptr_t pc, void *puc)
>>
>>  #define SUBPAGE_IDX(addr) ((addr) & ~TARGET_PAGE_MASK)
>>  typedef struct subpage_t {
>> +    PhysMap *map;
>>      MemoryRegion iomem;
>>      target_phys_addr_t base;
>>      uint16_t sub_section[TARGET_PAGE_SIZE];
>>  } subpage_t;
>>
>> -static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
>> -                             uint16_t section);
>> -static subpage_t *subpage_init(target_phys_addr_t base);
>> -static void destroy_page_desc(uint16_t section_index)
>> +static int subpage_register(PhysMap *map, subpage_t *mmio, uint32_t start,
>> +                            uint32_t end, uint16_t section);
>> +static subpage_t *subpage_init(PhysMap *map, target_phys_addr_t base);
>> +static void destroy_page_desc(PhysMap *map, uint16_t section_index)
>>  {
>> -    MemoryRegionSection *section = &phys_sections[section_index];
>> +    MemoryRegionSection *section = &map->phys_sections[section_index];
>>      MemoryRegion *mr = section->mr;
>>
>>      if (mr->subpage) {
>> @@ -2181,7 +2244,7 @@ static void destroy_page_desc(uint16_t section_index)
>>      }
>>  }
>>
>> -static void destroy_l2_mapping(PhysPageEntry *lp, unsigned level)
>> +static void destroy_l2_mapping(PhysMap *map, PhysPageEntry *lp, unsigned level)
>>  {
>>      unsigned i;
>>      PhysPageEntry *p;
>> @@ -2190,38 +2253,34 @@ static void destroy_l2_mapping(PhysPageEntry *lp, unsigned level)
>>          return;
>>      }
>>
>> -    p = phys_map_nodes[lp->ptr];
>> +    p = map->phys_map_nodes[lp->ptr];
>>      for (i = 0; i < L2_SIZE; ++i) {
>>          if (!p[i].is_leaf) {
>> -            destroy_l2_mapping(&p[i], level - 1);
>> +            destroy_l2_mapping(map, &p[i], level - 1);
>>          } else {
>> -            destroy_page_desc(p[i].ptr);
>> +            destroy_page_desc(map, p[i].ptr);
>>          }
>>      }
>>      lp->is_leaf = 0;
>>      lp->ptr = PHYS_MAP_NODE_NIL;
>>  }
>>
>> -static void destroy_all_mappings(void)
>> +static void destroy_all_mappings(PhysMap *map)
>>  {
>> -    destroy_l2_mapping(&phys_map, P_L2_LEVELS - 1);
>> -    phys_map_nodes_reset();
>> -}
>> +    PhysPageEntry *root = &map->root;
>>
>> -static uint16_t phys_section_add(MemoryRegionSection *section)
>> -{
>> -    if (phys_sections_nb == phys_sections_nb_alloc) {
>> -        phys_sections_nb_alloc = MAX(phys_sections_nb_alloc * 2, 16);
>> -        phys_sections = g_renew(MemoryRegionSection, phys_sections,
>> -                                phys_sections_nb_alloc);
>> -    }
>> -    phys_sections[phys_sections_nb] = *section;
>> -    return phys_sections_nb++;
>> +    destroy_l2_mapping(map, root, P_L2_LEVELS - 1);
>>  }
>>
>> -static void phys_sections_clear(void)
>> +static uint16_t phys_section_add(PhysMap *map, MemoryRegionSection *section)
>>  {
>> -    phys_sections_nb = 0;
>> +    if (map->phys_sections_nb == map->phys_sections_nb_alloc) {
>> +        map->phys_sections_nb_alloc = MAX(map->phys_sections_nb_alloc * 2, 16);
>> +        map->phys_sections = g_renew(MemoryRegionSection, map->phys_sections,
>> +                                map->phys_sections_nb_alloc);
>> +    }
>> +    map->phys_sections[map->phys_sections_nb] = *section;
>> +    return map->phys_sections_nb++;
>>  }
>>
>>  /* register physical memory.
>> @@ -2232,12 +2291,13 @@ static void phys_sections_clear(void)
>>     start_addr and region_offset are rounded down to a page boundary
>>     before calculating this offset.  This should not be a problem unless
>>     the low bits of start_addr and region_offset differ.  */
>> -static void register_subpage(MemoryRegionSection *section)
>> +static void register_subpage(PhysMap *map, MemoryRegionSection *section)
>>  {
>>      subpage_t *subpage;
>>      target_phys_addr_t base = section->offset_within_address_space
>>          & TARGET_PAGE_MASK;
>> -    MemoryRegionSection *existing = phys_page_find(base >> TARGET_PAGE_BITS);
>> +    MemoryRegionSection *existing = phys_page_find_internal(map,
>> +                                            base >> TARGET_PAGE_BITS);
>>      MemoryRegionSection subsection = {
>>          .offset_within_address_space = base,
>>          .size = TARGET_PAGE_SIZE,
>> @@ -2247,30 +2307,30 @@ static void register_subpage(MemoryRegionSection *section)
>>      assert(existing->mr->subpage || existing->mr == &io_mem_unassigned);
>>
>>      if (!(existing->mr->subpage)) {
>> -        subpage = subpage_init(base);
>> +        subpage = subpage_init(map, base);
>>          subsection.mr = &subpage->iomem;
>> -        phys_page_set(base >> TARGET_PAGE_BITS, 1,
>> -                      phys_section_add(&subsection));
>> +        phys_page_set(map, base >> TARGET_PAGE_BITS, 1,
>> +                      phys_section_add(map, &subsection));
>>      } else {
>>          subpage = container_of(existing->mr, subpage_t, iomem);
>>      }
>>      start = section->offset_within_address_space & ~TARGET_PAGE_MASK;
>>      end = start + section->size;
>> -    subpage_register(subpage, start, end, phys_section_add(section));
>> +    subpage_register(map, subpage, start, end, phys_section_add(map, section));
>>  }
>>
>>
>> -static void register_multipage(MemoryRegionSection *section)
>> +static void register_multipage(PhysMap *map, MemoryRegionSection *section)
>>  {
>>      target_phys_addr_t start_addr = section->offset_within_address_space;
>>      ram_addr_t size = section->size;
>>      target_phys_addr_t addr;
>> -    uint16_t section_index = phys_section_add(section);
>> +    uint16_t section_index = phys_section_add(map, section);
>>
>>      assert(size);
>>
>>      addr = start_addr;
>> -    phys_page_set(addr >> TARGET_PAGE_BITS, size >> TARGET_PAGE_BITS,
>> +    phys_page_set(map, addr >> TARGET_PAGE_BITS, size >> TARGET_PAGE_BITS,
>>                    section_index);
>>  }
>>
>> @@ -2278,13 +2338,14 @@ void cpu_register_physical_memory_log(MemoryRegionSection *section,
>>                                        bool readonly)
>>  {
>>      MemoryRegionSection now = *section, remain = *section;
>> +    PhysMap *map = next_map;
>>
>>      if ((now.offset_within_address_space & ~TARGET_PAGE_MASK)
>>          || (now.size < TARGET_PAGE_SIZE)) {
>>          now.size = MIN(TARGET_PAGE_ALIGN(now.offset_within_address_space)
>>                         - now.offset_within_address_space,
>>                         now.size);
>> -        register_subpage(&now);
>> +        register_subpage(map, &now);
>>          remain.size -= now.size;
>>          remain.offset_within_address_space += now.size;
>>          remain.offset_within_region += now.size;
>> @@ -2292,14 +2353,14 @@ void cpu_register_physical_memory_log(MemoryRegionSection *section,
>>      now = remain;
>>      now.size &= TARGET_PAGE_MASK;
>>      if (now.size) {
>> -        register_multipage(&now);
>> +        register_multipage(map, &now);
>>          remain.size -= now.size;
>>          remain.offset_within_address_space += now.size;
>>          remain.offset_within_region += now.size;
>>      }
>>      now = remain;
>>      if (now.size) {
>> -        register_subpage(&now);
>> +        register_subpage(map, &now);
>>      }
>>  }
>>
>> @@ -3001,7 +3062,7 @@ static uint64_t subpage_read(void *opaque, target_phys_addr_t addr,
>>             mmio, len, addr, idx);
>>  #endif
>>
>> -    section = &phys_sections[mmio->sub_section[idx]];
>> +    section = &mmio->map->phys_sections[mmio->sub_section[idx]];
>>      addr += mmio->base;
>>      addr -= section->offset_within_address_space;
>>      addr += section->offset_within_region;
>> @@ -3020,7 +3081,7 @@ static void subpage_write(void *opaque, target_phys_addr_t addr,
>>             __func__, mmio, len, addr, idx, value);
>>  #endif
>>
>> -    section = &phys_sections[mmio->sub_section[idx]];
>> +    section = &mmio->map->phys_sections[mmio->sub_section[idx]];
>>      addr += mmio->base;
>>      addr -= section->offset_within_address_space;
>>      addr += section->offset_within_region;
>> @@ -3065,8 +3126,8 @@ static const MemoryRegionOps subpage_ram_ops = {
>>      .endianness = DEVICE_NATIVE_ENDIAN,
>>  };
>>
>> -static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
>> -                             uint16_t section)
>> +static int subpage_register(PhysMap *map, subpage_t *mmio, uint32_t start,
>> +                              uint32_t end, uint16_t section)
>>  {
>>      int idx, eidx;
>>
>> @@ -3078,10 +3139,10 @@ static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
>>      printf("%s: %p start %08x end %08x idx %08x eidx %08x mem %ld\n", __func__,
>>             mmio, start, end, idx, eidx, memory);
>>  #endif
>> -    if (memory_region_is_ram(phys_sections[section].mr)) {
>> -        MemoryRegionSection new_section = phys_sections[section];
>> +    if (memory_region_is_ram(map->phys_sections[section].mr)) {
>> +        MemoryRegionSection new_section = map->phys_sections[section];
>>          new_section.mr = &io_mem_subpage_ram;
>> -        section = phys_section_add(&new_section);
>> +        section = phys_section_add(map, &new_section);
>>      }
>>      for (; idx <= eidx; idx++) {
>>          mmio->sub_section[idx] = section;
>> @@ -3090,12 +3151,13 @@ static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
>>      return 0;
>>  }
>>
>> -static subpage_t *subpage_init(target_phys_addr_t base)
>> +static subpage_t *subpage_init(PhysMap *map, target_phys_addr_t base)
>>  {
>>      subpage_t *mmio;
>>
>>      mmio = g_malloc0(sizeof(subpage_t));
>>
>> +    mmio->map = map;
>>      mmio->base = base;
>>      memory_region_init_io(&mmio->iomem, &subpage_ops, mmio,
>>                            "subpage", TARGET_PAGE_SIZE);
>> @@ -3104,12 +3166,12 @@ static subpage_t *subpage_init(target_phys_addr_t base)
>>      printf("%s: %p base " TARGET_FMT_plx " len %08x %d\n", __func__,
>>             mmio, base, TARGET_PAGE_SIZE, subpage_memory);
>>  #endif
>> -    subpage_register(mmio, 0, TARGET_PAGE_SIZE-1, phys_section_unassigned);
>> +    subpage_register(map, mmio, 0, TARGET_PAGE_SIZE-1, phys_section_unassigned);
>>
>>      return mmio;
>>  }
>>
>> -static uint16_t dummy_section(MemoryRegion *mr)
>> +static uint16_t dummy_section(PhysMap *map, MemoryRegion *mr)
>>  {
>>      MemoryRegionSection section = {
>>          .mr = mr,
>> @@ -3118,7 +3180,7 @@ static uint16_t dummy_section(MemoryRegion *mr)
>>          .size = UINT64_MAX,
>>      };
>>
>> -    return phys_section_add(&section);
>> +    return phys_section_add(map, &section);
>>  }
>>
>>  MemoryRegion *iotlb_to_region(target_phys_addr_t index)
>> @@ -3140,15 +3202,32 @@ static void io_mem_init(void)
>>                            "watch", UINT64_MAX);
>>  }
>>
>> -static void core_begin(MemoryListener *listener)
>> +#if 0
>> +static void physmap_init(void)
>> +{
>> +    FlatView v = { .ranges = NULL,
>> +                             .nr = 0,
>> +                             .nr_allocated = 0,
>> +    };
>> +
>> +    init_map.views[0] = v;
>> +    init_map.views[1] = v;
>> +    cur_map =  &init_map;
>> +}
>> +#endif
>
> Please delete.
>
adopted

Thanks and regards,
pingfan
>> +
>> +static void core_begin(MemoryListener *listener, PhysMap *new_map)
>>  {
>> -    destroy_all_mappings();
>> -    phys_sections_clear();
>> -    phys_map.ptr = PHYS_MAP_NODE_NIL;
>> -    phys_section_unassigned = dummy_section(&io_mem_unassigned);
>> -    phys_section_notdirty = dummy_section(&io_mem_notdirty);
>> -    phys_section_rom = dummy_section(&io_mem_rom);
>> -    phys_section_watch = dummy_section(&io_mem_watch);
>> +
>> +    new_map->root.ptr = PHYS_MAP_NODE_NIL;
>> +    new_map->root.is_leaf = 0;
>> +
>> +    /* In all the map, these sections have the same index */
>> +    phys_section_unassigned = dummy_section(new_map, &io_mem_unassigned);
>> +    phys_section_notdirty = dummy_section(new_map, &io_mem_notdirty);
>> +    phys_section_rom = dummy_section(new_map, &io_mem_rom);
>> +    phys_section_watch = dummy_section(new_map, &io_mem_watch);
>> +    next_map = new_map;
>>  }
>>
>>  static void core_commit(MemoryListener *listener)
>> @@ -3161,6 +3240,16 @@ static void core_commit(MemoryListener *listener)
>>      for(env = first_cpu; env != NULL; env = env->next_cpu) {
>>          tlb_flush(env, 1);
>>      }
>> +
>> +/* move into high layer
>> +    qemu_mutex_lock(&cur_map_lock);
>> +    if (cur_map != NULL) {
>> +        physmap_put(cur_map);
>> +    }
>> +    cur_map = next_map;
>> +    smp_mb();
>> +    qemu_mutex_unlock(&cur_map_lock);
>> +*/
>
> Also commented out code should be deleted.
>
>>  }
>>
>>  static void core_region_add(MemoryListener *listener,
>> @@ -3217,7 +3306,7 @@ static void core_eventfd_del(MemoryListener *listener,
>>  {
>>  }
>>
>> -static void io_begin(MemoryListener *listener)
>> +static void io_begin(MemoryListener *listener, PhysMap *next)
>>  {
>>  }
>>
>> @@ -3329,6 +3418,20 @@ static void memory_map_init(void)
>>      memory_listener_register(&io_memory_listener, system_io);
>>  }
>>
>> +void physmap_init(void)
>> +{
>> +    FlatView v = { .ranges = NULL, .nr = 0, .nr_allocated = 0,
>> +                           };
>> +    PhysMap *init_map = g_malloc0(sizeof(PhysMap));
>> +
>> +    atomic_set(&init_map->ref, 1);
>> +    init_map->root.ptr = PHYS_MAP_NODE_NIL;
>> +    init_map->root.is_leaf = 0;
>> +    init_map->views[0] = v;
>> +    init_map->views[1] = v;
>> +    cur_map = init_map;
>> +}
>> +
>>  MemoryRegion *get_system_memory(void)
>>  {
>>      return system_memory;
>> @@ -3391,6 +3494,7 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>      uint32_t val;
>>      target_phys_addr_t page;
>>      MemoryRegionSection *section;
>> +    PhysMap *cur = cur_map_get();
>>
>>      while (len > 0) {
>>          page = addr & TARGET_PAGE_MASK;
>> @@ -3472,6 +3576,7 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>          buf += l;
>>          addr += l;
>>      }
>> +    physmap_put(cur);
>>  }
>>
>>  /* used for ROM loading : can write in RAM and ROM */
>> diff --git a/hw/vhost.c b/hw/vhost.c
>> index 43664e7..df58345 100644
>> --- a/hw/vhost.c
>> +++ b/hw/vhost.c
>> @@ -438,7 +438,7 @@ static bool vhost_section(MemoryRegionSection *section)
>>          && memory_region_is_ram(section->mr);
>>  }
>>
>> -static void vhost_begin(MemoryListener *listener)
>> +static void vhost_begin(MemoryListener *listener, PhysMap *next)
>>  {
>>  }
>>
>> diff --git a/hw/xen_pt.c b/hw/xen_pt.c
>> index 3b6d186..fba8586 100644
>> --- a/hw/xen_pt.c
>> +++ b/hw/xen_pt.c
>> @@ -597,7 +597,7 @@ static void xen_pt_region_update(XenPCIPassthroughState *s,
>>      }
>>  }
>>
>> -static void xen_pt_begin(MemoryListener *l)
>> +static void xen_pt_begin(MemoryListener *l, PhysMap *next)
>>  {
>>  }
>>
>> diff --git a/kvm-all.c b/kvm-all.c
>> index f8e4328..bc42cab 100644
>> --- a/kvm-all.c
>> +++ b/kvm-all.c
>> @@ -693,7 +693,7 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, bool add)
>>      }
>>  }
>>
>> -static void kvm_begin(MemoryListener *listener)
>> +static void kvm_begin(MemoryListener *listener, PhysMap *next)
>>  {
>>  }
>>
>> diff --git a/memory.c b/memory.c
>> index c7f2cfd..54cdc7f 100644
>> --- a/memory.c
>> +++ b/memory.c
>> @@ -20,6 +20,7 @@
>>  #include "kvm.h"
>>  #include <assert.h>
>>  #include "hw/qdev.h"
>> +#include "qemu-thread.h"
>>
>>  #define WANT_EXEC_OBSOLETE
>>  #include "exec-obsolete.h"
>> @@ -192,7 +193,7 @@ typedef struct AddressSpaceOps AddressSpaceOps;
>>  /* A system address space - I/O, memory, etc. */
>>  struct AddressSpace {
>>      MemoryRegion *root;
>> -    FlatView current_map;
>> +    int view_id;
>>      int ioeventfd_nb;
>>      MemoryRegionIoeventfd *ioeventfds;
>>  };
>> @@ -232,11 +233,6 @@ static void flatview_insert(FlatView *view, unsigned pos, FlatRange *range)
>>      ++view->nr;
>>  }
>>
>> -static void flatview_destroy(FlatView *view)
>> -{
>> -    g_free(view->ranges);
>> -}
>> -
>>  static bool can_merge(FlatRange *r1, FlatRange *r2)
>>  {
>>      return int128_eq(addrrange_end(r1->addr), r2->addr.start)
>> @@ -594,8 +590,10 @@ static void address_space_update_ioeventfds(AddressSpace *as)
>>      MemoryRegionIoeventfd *ioeventfds = NULL;
>>      AddrRange tmp;
>>      unsigned i;
>> +    PhysMap *map = cur_map_get();
>> +    FlatView *view = &map->views[as->view_id];
>>
>> -    FOR_EACH_FLAT_RANGE(fr, &as->current_map) {
>> +    FOR_EACH_FLAT_RANGE(fr, view) {
>>          for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
>>              tmp = addrrange_shift(fr->mr->ioeventfds[i].addr,
>>                                    int128_sub(fr->addr.start,
>> @@ -616,6 +614,7 @@ static void address_space_update_ioeventfds(AddressSpace *as)
>>      g_free(as->ioeventfds);
>>      as->ioeventfds = ioeventfds;
>>      as->ioeventfd_nb = ioeventfd_nb;
>> +    physmap_put(map);
>>  }
>>
>>  static void address_space_update_topology_pass(AddressSpace *as,
>> @@ -681,21 +680,23 @@ static void address_space_update_topology_pass(AddressSpace *as,
>>  }
>>
>>
>> -static void address_space_update_topology(AddressSpace *as)
>> +static void address_space_update_topology(AddressSpace *as, PhysMap *prev,
>> +                                            PhysMap *next)
>>  {
>> -    FlatView old_view = as->current_map;
>> +    FlatView old_view = prev->views[as->view_id];
>>      FlatView new_view = generate_memory_topology(as->root);
>>
>>      address_space_update_topology_pass(as, old_view, new_view, false);
>>      address_space_update_topology_pass(as, old_view, new_view, true);
>> +    next->views[as->view_id] = new_view;
>>
>> -    as->current_map = new_view;
>> -    flatview_destroy(&old_view);
>>      address_space_update_ioeventfds(as);
>>  }
>>
>>  static void memory_region_update_topology(MemoryRegion *mr)
>>  {
>> +    PhysMap *prev, *next;
>> +
>>      if (memory_region_transaction_depth) {
>>          memory_region_update_pending |= !mr || mr->enabled;
>>          return;
>> @@ -705,16 +706,20 @@ static void memory_region_update_topology(MemoryRegion *mr)
>>          return;
>>      }
>>
>> -    MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
>> +     prev = cur_map_get();
>> +    /* allocate PhysMap next here */
>> +    next = alloc_next_map();
>> +    MEMORY_LISTENER_CALL_GLOBAL(begin, Forward, next);
>>
>>      if (address_space_memory.root) {
>> -        address_space_update_topology(&address_space_memory);
>> +        address_space_update_topology(&address_space_memory, prev, next);
>>      }
>>      if (address_space_io.root) {
>> -        address_space_update_topology(&address_space_io);
>> +        address_space_update_topology(&address_space_io, prev, next);
>>      }
>>
>>      MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
>> +    cur_map_update(next);
>>
>>      memory_region_update_pending = false;
>>  }
>> @@ -1071,7 +1076,7 @@ void memory_region_put(MemoryRegion *mr)
>>
>>      if (atomic_dec_and_test(&mr->ref)) {
>>          /* to fix, using call_rcu( ,release) */
>> -        mr->life_ops->put(mr);
>> +        physmap_reclaimer_enqueue(mr, (ReleaseHandler *)mr->life_ops->put);
>>      }
>>  }
>>
>> @@ -1147,13 +1152,18 @@ void memory_region_set_dirty(MemoryRegion *mr, target_phys_addr_t addr,
>>  void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
>>  {
>>      FlatRange *fr;
>> +    FlatView *fview;
>> +    PhysMap *map;
>>
>> -    FOR_EACH_FLAT_RANGE(fr, &address_space_memory.current_map) {
>> +    map = cur_map_get();
>> +    fview = &map->views[address_space_memory.view_id];
>> +    FOR_EACH_FLAT_RANGE(fr, fview) {
>>          if (fr->mr == mr) {
>>              MEMORY_LISTENER_UPDATE_REGION(fr, &address_space_memory,
>>                                            Forward, log_sync);
>>          }
>>      }
>> +    physmap_put(map);
>>  }
>>
>>  void memory_region_set_readonly(MemoryRegion *mr, bool readonly)
>> @@ -1201,8 +1211,12 @@ static void memory_region_update_coalesced_range(MemoryRegion *mr)
>>      FlatRange *fr;
>>      CoalescedMemoryRange *cmr;
>>      AddrRange tmp;
>> +    FlatView *fview;
>> +    PhysMap *map;
>>
>> -    FOR_EACH_FLAT_RANGE(fr, &address_space_memory.current_map) {
>> +    map = cur_map_get();
>> +    fview = &map->views[address_space_memory.view_id];
>> +    FOR_EACH_FLAT_RANGE(fr, fview) {
>>          if (fr->mr == mr) {
>>              qemu_unregister_coalesced_mmio(int128_get64(fr->addr.start),
>>                                             int128_get64(fr->addr.size));
>> @@ -1219,6 +1233,7 @@ static void memory_region_update_coalesced_range(MemoryRegion *mr)
>>              }
>>          }
>>      }
>> +    physmap_put(map);
>>  }
>>
>>  void memory_region_set_coalescing(MemoryRegion *mr)
>> @@ -1458,29 +1473,49 @@ static int cmp_flatrange_addr(const void *addr_, const void *fr_)
>>      return 0;
>>  }
>>
>> -static FlatRange *address_space_lookup(AddressSpace *as, AddrRange addr)
>> +static FlatRange *address_space_lookup(FlatView *view, AddrRange addr)
>>  {
>> -    return bsearch(&addr, as->current_map.ranges, as->current_map.nr,
>> +    return bsearch(&addr, view->ranges, view->nr,
>>                     sizeof(FlatRange), cmp_flatrange_addr);
>>  }
>>
>> +/* dec the ref, which inc by memory_region_find*/
>> +void memory_region_section_put(MemoryRegionSection *mrs)
>> +{
>> +    if (mrs->mr != NULL) {
>> +        memory_region_put(mrs->mr);
>> +    }
>> +}
>> +
>> +/* inc mr's ref. Caller need dec mr's ref */
>>  MemoryRegionSection memory_region_find(MemoryRegion *address_space,
>>                                         target_phys_addr_t addr, uint64_t size)
>>  {
>> +    PhysMap *map;
>>      AddressSpace *as = memory_region_to_address_space(address_space);
>>      AddrRange range = addrrange_make(int128_make64(addr),
>>                                       int128_make64(size));
>> -    FlatRange *fr = address_space_lookup(as, range);
>> +    FlatView *fview;
>> +
>> +    map = cur_map_get();
>> +
>> +    fview = &map->views[as->view_id];
>> +    FlatRange *fr = address_space_lookup(fview, range);
>>      MemoryRegionSection ret = { .mr = NULL, .size = 0 };
>>
>>      if (!fr) {
>> +        physmap_put(map);
>>          return ret;
>>      }
>>
>> -    while (fr > as->current_map.ranges
>> +    while (fr > fview->ranges
>>             && addrrange_intersects(fr[-1].addr, range)) {
>>          --fr;
>>      }
>> +    /* To fix, the caller must in rcu, or we must inc fr->mr->ref here
>> +     */
>> +    memory_region_get(fr->mr);
>> +    physmap_put(map);
>>
>>      ret.mr = fr->mr;
>>      range = addrrange_intersection(range, fr->addr);
>> @@ -1497,10 +1532,13 @@ void memory_global_sync_dirty_bitmap(MemoryRegion *address_space)
>>  {
>>      AddressSpace *as = memory_region_to_address_space(address_space);
>>      FlatRange *fr;
>> +    PhysMap *map = cur_map_get();
>> +    FlatView *view = &map->views[as->view_id];
>>
>> -    FOR_EACH_FLAT_RANGE(fr, &as->current_map) {
>> +    FOR_EACH_FLAT_RANGE(fr, view) {
>>          MEMORY_LISTENER_UPDATE_REGION(fr, as, Forward, log_sync);
>>      }
>> +    physmap_put(map);
>>  }
>>
>>  void memory_global_dirty_log_start(void)
>> @@ -1519,6 +1557,8 @@ static void listener_add_address_space(MemoryListener *listener,
>>                                         AddressSpace *as)
>>  {
>>      FlatRange *fr;
>> +    PhysMap *map;
>> +    FlatView *view;
>>
>>      if (listener->address_space_filter
>>          && listener->address_space_filter != as->root) {
>> @@ -1528,7 +1568,10 @@ static void listener_add_address_space(MemoryListener *listener,
>>      if (global_dirty_log) {
>>          listener->log_global_start(listener);
>>      }
>> -    FOR_EACH_FLAT_RANGE(fr, &as->current_map) {
>> +
>> +    map = cur_map_get();
>> +    view = &map->views[as->view_id];
>> +    FOR_EACH_FLAT_RANGE(fr, view) {
>>          MemoryRegionSection section = {
>>              .mr = fr->mr,
>>              .address_space = as->root,
>> @@ -1539,6 +1582,7 @@ static void listener_add_address_space(MemoryListener *listener,
>>          };
>>          listener->region_add(listener, &section);
>>      }
>> +    physmap_put(map);
>>  }
>>
>>  void memory_listener_register(MemoryListener *listener, MemoryRegion *filter)
>> @@ -1570,12 +1614,14 @@ void memory_listener_unregister(MemoryListener *listener)
>>  void set_system_memory_map(MemoryRegion *mr)
>>  {
>>      address_space_memory.root = mr;
>> +    address_space_memory.view_id = 0;
>>      memory_region_update_topology(NULL);
>>  }
>>
>>  void set_system_io_map(MemoryRegion *mr)
>>  {
>>      address_space_io.root = mr;
>> +    address_space_io.view_id = 1;
>>      memory_region_update_topology(NULL);
>>  }
>>
>> diff --git a/memory.h b/memory.h
>> index 357edd8..18442d4 100644
>> --- a/memory.h
>> +++ b/memory.h
>> @@ -256,7 +256,7 @@ typedef struct MemoryListener MemoryListener;
>>   * Use with memory_listener_register() and memory_listener_unregister().
>>   */
>>  struct MemoryListener {
>> -    void (*begin)(MemoryListener *listener);
>> +    void (*begin)(MemoryListener *listener, PhysMap *next);
>>      void (*commit)(MemoryListener *listener);
>>      void (*region_add)(MemoryListener *listener, MemoryRegionSection *section);
>>      void (*region_del)(MemoryListener *listener, MemoryRegionSection *section);
>> @@ -829,6 +829,13 @@ void mtree_info(fprintf_function mon_printf, void *f);
>>
>>  void memory_region_get(MemoryRegion *mr);
>>  void memory_region_put(MemoryRegion *mr);
>> +void physmap_reclaimer_enqueue(void *opaque, ReleaseHandler *release);
>> +void physmap_get(PhysMap *map);
>> +void physmap_put(PhysMap *map);
>> +PhysMap *cur_map_get(void);
>> +PhysMap *alloc_next_map(void);
>> +void cur_map_update(PhysMap *next);
>> +void physmap_init(void);
>>  #endif
>>
>>  #endif
>> diff --git a/vl.c b/vl.c
>> index 1329c30..12af523 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -3346,6 +3346,7 @@ int main(int argc, char **argv, char **envp)
>>      if (ram_size == 0) {
>>          ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
>>      }
>> +    physmap_init();
>>
>>      configure_accelerator();
>>
>> diff --git a/xen-all.c b/xen-all.c
>> index 59f2323..41d82fd 100644
>> --- a/xen-all.c
>> +++ b/xen-all.c
>> @@ -452,7 +452,7 @@ static void xen_set_memory(struct MemoryListener *listener,
>>      }
>>  }
>>
>> -static void xen_begin(MemoryListener *listener)
>> +static void xen_begin(MemoryListener *listener, PhysMap *next)
>>  {
>>  }
>>
>> --
>> 1.7.4.4
>>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 09/15] memory: prepare flatview and radix-tree for rcu style access
@ 2012-08-09  7:29       ` liu ping fan
  0 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:29 UTC (permalink / raw)
  To: Blue Swirl
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Thu, Aug 9, 2012 at 3:23 AM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Wed, Aug 8, 2012 at 6:25 AM, Liu Ping Fan <qemulist@gmail.com> wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> Flatview and radix view are all under the protection of pointer.
>> And this make sure the change of them seem to be atomic!
>>
>> The mr accessed by radix-tree leaf or flatview will be reclaimed
>> after the prev PhysMap not in use any longer
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  exec.c      |  303 +++++++++++++++++++++++++++++++++++++++-------------------
>>  hw/vhost.c  |    2 +-
>>  hw/xen_pt.c |    2 +-
>>  kvm-all.c   |    2 +-
>>  memory.c    |   92 ++++++++++++++-----
>>  memory.h    |    9 ++-
>>  vl.c        |    1 +
>>  xen-all.c   |    2 +-
>>  8 files changed, 286 insertions(+), 127 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 01b91b0..97addb9 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -24,6 +24,7 @@
>>  #include <sys/mman.h>
>>  #endif
>>
>> +#include "qemu/atomic.h"
>>  #include "qemu-common.h"
>>  #include "cpu.h"
>>  #include "tcg.h"
>> @@ -35,6 +36,8 @@
>>  #include "qemu-timer.h"
>>  #include "memory.h"
>>  #include "exec-memory.h"
>> +#include "qemu-thread.h"
>> +#include "qemu/reclaimer.h"
>>  #if defined(CONFIG_USER_ONLY)
>>  #include <qemu.h>
>>  #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
>> @@ -184,25 +187,17 @@ static void *l1_map[V_L1_SIZE];
>>
>>  #if !defined(CONFIG_USER_ONLY)
>>
>> -static MemoryRegionSection *phys_sections;
>> -static unsigned phys_sections_nb, phys_sections_nb_alloc;
>>  static uint16_t phys_section_unassigned;
>>  static uint16_t phys_section_notdirty;
>>  static uint16_t phys_section_rom;
>>  static uint16_t phys_section_watch;
>>
>> -
>> -/* Simple allocator for PhysPageEntry nodes */
>> -static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
>> -static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
>> -
>>  #define PHYS_MAP_NODE_NIL (((uint16_t)~0) >> 1)
>>
>> -/* This is a multi-level map on the physical address space.
>> -   The bottom level has pointers to MemoryRegionSections.  */
>> -static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
>> -
>> +static QemuMutex cur_map_lock;
>> +static PhysMap *cur_map;
>>  QemuMutex mem_map_lock;
>> +static PhysMap *next_map;
>>
>>  static void io_mem_init(void);
>>  static void memory_map_init(void);
>> @@ -383,41 +378,38 @@ static inline PageDesc *page_find(tb_page_addr_t index)
>>
>>  #if !defined(CONFIG_USER_ONLY)
>>
>> -static void phys_map_node_reserve(unsigned nodes)
>> +static void phys_map_node_reserve(PhysMap *map, unsigned nodes)
>>  {
>> -    if (phys_map_nodes_nb + nodes > phys_map_nodes_nb_alloc) {
>> +    if (map->phys_map_nodes_nb + nodes > map->phys_map_nodes_nb_alloc) {
>>          typedef PhysPageEntry Node[L2_SIZE];
>> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc * 2, 16);
>> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc,
>> -                                      phys_map_nodes_nb + nodes);
>> -        phys_map_nodes = g_renew(Node, phys_map_nodes,
>> -                                 phys_map_nodes_nb_alloc);
>> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc * 2,
>> +                                                                        16);
>> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc,
>> +                                      map->phys_map_nodes_nb + nodes);
>> +        map->phys_map_nodes = g_renew(Node, map->phys_map_nodes,
>> +                                 map->phys_map_nodes_nb_alloc);
>>      }
>>  }
>>
>> -static uint16_t phys_map_node_alloc(void)
>> +static uint16_t phys_map_node_alloc(PhysMap *map)
>>  {
>>      unsigned i;
>>      uint16_t ret;
>>
>> -    ret = phys_map_nodes_nb++;
>> +    ret = map->phys_map_nodes_nb++;
>>      assert(ret != PHYS_MAP_NODE_NIL);
>> -    assert(ret != phys_map_nodes_nb_alloc);
>> +    assert(ret != map->phys_map_nodes_nb_alloc);
>>      for (i = 0; i < L2_SIZE; ++i) {
>> -        phys_map_nodes[ret][i].is_leaf = 0;
>> -        phys_map_nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
>> +        map->phys_map_nodes[ret][i].is_leaf = 0;
>> +        map->phys_map_nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
>>      }
>>      return ret;
>>  }
>>
>> -static void phys_map_nodes_reset(void)
>> -{
>> -    phys_map_nodes_nb = 0;
>> -}
>> -
>> -
>> -static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
>> -                                target_phys_addr_t *nb, uint16_t leaf,
>> +static void phys_page_set_level(PhysMap *map, PhysPageEntry *lp,
>> +                                target_phys_addr_t *index,
>> +                                target_phys_addr_t *nb,
>> +                                uint16_t leaf,
>>                                  int level)
>>  {
>>      PhysPageEntry *p;
>> @@ -425,8 +417,8 @@ static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
>>      target_phys_addr_t step = (target_phys_addr_t)1 << (level * L2_BITS);
>>
>>      if (!lp->is_leaf && lp->ptr == PHYS_MAP_NODE_NIL) {
>> -        lp->ptr = phys_map_node_alloc();
>> -        p = phys_map_nodes[lp->ptr];
>> +        lp->ptr = phys_map_node_alloc(map);
>> +        p = map->phys_map_nodes[lp->ptr];
>>          if (level == 0) {
>>              for (i = 0; i < L2_SIZE; i++) {
>>                  p[i].is_leaf = 1;
>> @@ -434,7 +426,7 @@ static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
>>              }
>>          }
>>      } else {
>> -        p = phys_map_nodes[lp->ptr];
>> +        p = map->phys_map_nodes[lp->ptr];
>>      }
>>      lp = &p[(*index >> (level * L2_BITS)) & (L2_SIZE - 1)];
>>
>> @@ -445,24 +437,27 @@ static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
>>              *index += step;
>>              *nb -= step;
>>          } else {
>> -            phys_page_set_level(lp, index, nb, leaf, level - 1);
>> +            phys_page_set_level(map, lp, index, nb, leaf, level - 1);
>>          }
>>          ++lp;
>>      }
>>  }
>>
>> -static void phys_page_set(target_phys_addr_t index, target_phys_addr_t nb,
>> -                          uint16_t leaf)
>> +static void phys_page_set(PhysMap *map, target_phys_addr_t index,
>> +                            target_phys_addr_t nb,
>> +                            uint16_t leaf)
>>  {
>>      /* Wildly overreserve - it doesn't matter much. */
>> -    phys_map_node_reserve(3 * P_L2_LEVELS);
>> +    phys_map_node_reserve(map, 3 * P_L2_LEVELS);
>>
>> -    phys_page_set_level(&phys_map, &index, &nb, leaf, P_L2_LEVELS - 1);
>> +    /* update in new tree*/
>> +    phys_page_set_level(map, &map->root, &index, &nb, leaf, P_L2_LEVELS - 1);
>>  }
>>
>> -MemoryRegionSection *phys_page_find(target_phys_addr_t index)
>> +static MemoryRegionSection *phys_page_find_internal(PhysMap *map,
>> +                           target_phys_addr_t index)
>>  {
>> -    PhysPageEntry lp = phys_map;
>> +    PhysPageEntry lp = map->root;
>>      PhysPageEntry *p;
>>      int i;
>>      uint16_t s_index = phys_section_unassigned;
>> @@ -471,13 +466,79 @@ MemoryRegionSection *phys_page_find(target_phys_addr_t index)
>>          if (lp.ptr == PHYS_MAP_NODE_NIL) {
>>              goto not_found;
>>          }
>> -        p = phys_map_nodes[lp.ptr];
>> +        p = map->phys_map_nodes[lp.ptr];
>>          lp = p[(index >> (i * L2_BITS)) & (L2_SIZE - 1)];
>>      }
>>
>>      s_index = lp.ptr;
>>  not_found:
>> -    return &phys_sections[s_index];
>> +    return &map->phys_sections[s_index];
>> +}
>> +
>> +MemoryRegionSection *phys_page_find(target_phys_addr_t index)
>> +{
>> +    return phys_page_find_internal(cur_map, index);
>> +}
>> +
>> +void physmap_get(PhysMap *map)
>> +{
>> +    atomic_inc(&map->ref);
>> +}
>> +
>> +/* Untill rcu read side finished, do this reclaim */
>
> Until
>
adopted

>> +static ChunkHead physmap_reclaimer_list = { .lh_first = NULL };
>
> Please insert a blank line here.
>
adopted

>> +void physmap_reclaimer_enqueue(void *opaque, ReleaseHandler *release)
>> +{
>> +    reclaimer_enqueue(&physmap_reclaimer_list, opaque, release);
>> +}
>> +
>> +static void destroy_all_mappings(PhysMap *map);
>
> Prototypes belong to the top of the file.
>
adopted

>> +static void phys_map_release(PhysMap *map)
>> +{
>> +    /* emulate for rcu reclaimer for mr */
>> +    reclaimer_worker(&physmap_reclaimer_list);
>> +
>> +    destroy_all_mappings(map);
>> +    g_free(map->phys_map_nodes);
>> +    g_free(map->phys_sections);
>> +    g_free(map->views[0].ranges);
>> +    g_free(map->views[1].ranges);
>> +    g_free(map);
>> +}
>> +
>> +void physmap_put(PhysMap *map)
>> +{
>> +    if (atomic_dec_and_test(&map->ref)) {
>> +        phys_map_release(map);
>> +    }
>> +}
>> +
>> +void cur_map_update(PhysMap *next)
>> +{
>> +    qemu_mutex_lock(&cur_map_lock);
>> +    physmap_put(cur_map);
>> +    cur_map = next;
>> +    smp_mb();
>> +    qemu_mutex_unlock(&cur_map_lock);
>> +}
>> +
>> +PhysMap *cur_map_get(void)
>> +{
>> +    PhysMap *ret;
>> +
>> +    qemu_mutex_lock(&cur_map_lock);
>> +    ret = cur_map;
>> +    physmap_get(ret);
>> +    smp_mb();
>> +    qemu_mutex_unlock(&cur_map_lock);
>> +    return ret;
>> +}
>> +
>> +PhysMap *alloc_next_map(void)
>> +{
>> +    PhysMap *next = g_malloc0(sizeof(PhysMap));
>> +    atomic_set(&next->ref, 1);
>> +    return next;
>>  }
>>
>>  bool memory_region_is_unassigned(MemoryRegion *mr)
>> @@ -632,6 +693,7 @@ void cpu_exec_init_all(void)
>>      memory_map_init();
>>      io_mem_init();
>>      qemu_mutex_init(&mem_map_lock);
>> +    qemu_mutex_init(&cur_map_lock);
>>  #endif
>>  }
>>
>> @@ -2161,17 +2223,18 @@ int page_unprotect(target_ulong address, uintptr_t pc, void *puc)
>>
>>  #define SUBPAGE_IDX(addr) ((addr) & ~TARGET_PAGE_MASK)
>>  typedef struct subpage_t {
>> +    PhysMap *map;
>>      MemoryRegion iomem;
>>      target_phys_addr_t base;
>>      uint16_t sub_section[TARGET_PAGE_SIZE];
>>  } subpage_t;
>>
>> -static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
>> -                             uint16_t section);
>> -static subpage_t *subpage_init(target_phys_addr_t base);
>> -static void destroy_page_desc(uint16_t section_index)
>> +static int subpage_register(PhysMap *map, subpage_t *mmio, uint32_t start,
>> +                            uint32_t end, uint16_t section);
>> +static subpage_t *subpage_init(PhysMap *map, target_phys_addr_t base);
>> +static void destroy_page_desc(PhysMap *map, uint16_t section_index)
>>  {
>> -    MemoryRegionSection *section = &phys_sections[section_index];
>> +    MemoryRegionSection *section = &map->phys_sections[section_index];
>>      MemoryRegion *mr = section->mr;
>>
>>      if (mr->subpage) {
>> @@ -2181,7 +2244,7 @@ static void destroy_page_desc(uint16_t section_index)
>>      }
>>  }
>>
>> -static void destroy_l2_mapping(PhysPageEntry *lp, unsigned level)
>> +static void destroy_l2_mapping(PhysMap *map, PhysPageEntry *lp, unsigned level)
>>  {
>>      unsigned i;
>>      PhysPageEntry *p;
>> @@ -2190,38 +2253,34 @@ static void destroy_l2_mapping(PhysPageEntry *lp, unsigned level)
>>          return;
>>      }
>>
>> -    p = phys_map_nodes[lp->ptr];
>> +    p = map->phys_map_nodes[lp->ptr];
>>      for (i = 0; i < L2_SIZE; ++i) {
>>          if (!p[i].is_leaf) {
>> -            destroy_l2_mapping(&p[i], level - 1);
>> +            destroy_l2_mapping(map, &p[i], level - 1);
>>          } else {
>> -            destroy_page_desc(p[i].ptr);
>> +            destroy_page_desc(map, p[i].ptr);
>>          }
>>      }
>>      lp->is_leaf = 0;
>>      lp->ptr = PHYS_MAP_NODE_NIL;
>>  }
>>
>> -static void destroy_all_mappings(void)
>> +static void destroy_all_mappings(PhysMap *map)
>>  {
>> -    destroy_l2_mapping(&phys_map, P_L2_LEVELS - 1);
>> -    phys_map_nodes_reset();
>> -}
>> +    PhysPageEntry *root = &map->root;
>>
>> -static uint16_t phys_section_add(MemoryRegionSection *section)
>> -{
>> -    if (phys_sections_nb == phys_sections_nb_alloc) {
>> -        phys_sections_nb_alloc = MAX(phys_sections_nb_alloc * 2, 16);
>> -        phys_sections = g_renew(MemoryRegionSection, phys_sections,
>> -                                phys_sections_nb_alloc);
>> -    }
>> -    phys_sections[phys_sections_nb] = *section;
>> -    return phys_sections_nb++;
>> +    destroy_l2_mapping(map, root, P_L2_LEVELS - 1);
>>  }
>>
>> -static void phys_sections_clear(void)
>> +static uint16_t phys_section_add(PhysMap *map, MemoryRegionSection *section)
>>  {
>> -    phys_sections_nb = 0;
>> +    if (map->phys_sections_nb == map->phys_sections_nb_alloc) {
>> +        map->phys_sections_nb_alloc = MAX(map->phys_sections_nb_alloc * 2, 16);
>> +        map->phys_sections = g_renew(MemoryRegionSection, map->phys_sections,
>> +                                map->phys_sections_nb_alloc);
>> +    }
>> +    map->phys_sections[map->phys_sections_nb] = *section;
>> +    return map->phys_sections_nb++;
>>  }
>>
>>  /* register physical memory.
>> @@ -2232,12 +2291,13 @@ static void phys_sections_clear(void)
>>     start_addr and region_offset are rounded down to a page boundary
>>     before calculating this offset.  This should not be a problem unless
>>     the low bits of start_addr and region_offset differ.  */
>> -static void register_subpage(MemoryRegionSection *section)
>> +static void register_subpage(PhysMap *map, MemoryRegionSection *section)
>>  {
>>      subpage_t *subpage;
>>      target_phys_addr_t base = section->offset_within_address_space
>>          & TARGET_PAGE_MASK;
>> -    MemoryRegionSection *existing = phys_page_find(base >> TARGET_PAGE_BITS);
>> +    MemoryRegionSection *existing = phys_page_find_internal(map,
>> +                                            base >> TARGET_PAGE_BITS);
>>      MemoryRegionSection subsection = {
>>          .offset_within_address_space = base,
>>          .size = TARGET_PAGE_SIZE,
>> @@ -2247,30 +2307,30 @@ static void register_subpage(MemoryRegionSection *section)
>>      assert(existing->mr->subpage || existing->mr == &io_mem_unassigned);
>>
>>      if (!(existing->mr->subpage)) {
>> -        subpage = subpage_init(base);
>> +        subpage = subpage_init(map, base);
>>          subsection.mr = &subpage->iomem;
>> -        phys_page_set(base >> TARGET_PAGE_BITS, 1,
>> -                      phys_section_add(&subsection));
>> +        phys_page_set(map, base >> TARGET_PAGE_BITS, 1,
>> +                      phys_section_add(map, &subsection));
>>      } else {
>>          subpage = container_of(existing->mr, subpage_t, iomem);
>>      }
>>      start = section->offset_within_address_space & ~TARGET_PAGE_MASK;
>>      end = start + section->size;
>> -    subpage_register(subpage, start, end, phys_section_add(section));
>> +    subpage_register(map, subpage, start, end, phys_section_add(map, section));
>>  }
>>
>>
>> -static void register_multipage(MemoryRegionSection *section)
>> +static void register_multipage(PhysMap *map, MemoryRegionSection *section)
>>  {
>>      target_phys_addr_t start_addr = section->offset_within_address_space;
>>      ram_addr_t size = section->size;
>>      target_phys_addr_t addr;
>> -    uint16_t section_index = phys_section_add(section);
>> +    uint16_t section_index = phys_section_add(map, section);
>>
>>      assert(size);
>>
>>      addr = start_addr;
>> -    phys_page_set(addr >> TARGET_PAGE_BITS, size >> TARGET_PAGE_BITS,
>> +    phys_page_set(map, addr >> TARGET_PAGE_BITS, size >> TARGET_PAGE_BITS,
>>                    section_index);
>>  }
>>
>> @@ -2278,13 +2338,14 @@ void cpu_register_physical_memory_log(MemoryRegionSection *section,
>>                                        bool readonly)
>>  {
>>      MemoryRegionSection now = *section, remain = *section;
>> +    PhysMap *map = next_map;
>>
>>      if ((now.offset_within_address_space & ~TARGET_PAGE_MASK)
>>          || (now.size < TARGET_PAGE_SIZE)) {
>>          now.size = MIN(TARGET_PAGE_ALIGN(now.offset_within_address_space)
>>                         - now.offset_within_address_space,
>>                         now.size);
>> -        register_subpage(&now);
>> +        register_subpage(map, &now);
>>          remain.size -= now.size;
>>          remain.offset_within_address_space += now.size;
>>          remain.offset_within_region += now.size;
>> @@ -2292,14 +2353,14 @@ void cpu_register_physical_memory_log(MemoryRegionSection *section,
>>      now = remain;
>>      now.size &= TARGET_PAGE_MASK;
>>      if (now.size) {
>> -        register_multipage(&now);
>> +        register_multipage(map, &now);
>>          remain.size -= now.size;
>>          remain.offset_within_address_space += now.size;
>>          remain.offset_within_region += now.size;
>>      }
>>      now = remain;
>>      if (now.size) {
>> -        register_subpage(&now);
>> +        register_subpage(map, &now);
>>      }
>>  }
>>
>> @@ -3001,7 +3062,7 @@ static uint64_t subpage_read(void *opaque, target_phys_addr_t addr,
>>             mmio, len, addr, idx);
>>  #endif
>>
>> -    section = &phys_sections[mmio->sub_section[idx]];
>> +    section = &mmio->map->phys_sections[mmio->sub_section[idx]];
>>      addr += mmio->base;
>>      addr -= section->offset_within_address_space;
>>      addr += section->offset_within_region;
>> @@ -3020,7 +3081,7 @@ static void subpage_write(void *opaque, target_phys_addr_t addr,
>>             __func__, mmio, len, addr, idx, value);
>>  #endif
>>
>> -    section = &phys_sections[mmio->sub_section[idx]];
>> +    section = &mmio->map->phys_sections[mmio->sub_section[idx]];
>>      addr += mmio->base;
>>      addr -= section->offset_within_address_space;
>>      addr += section->offset_within_region;
>> @@ -3065,8 +3126,8 @@ static const MemoryRegionOps subpage_ram_ops = {
>>      .endianness = DEVICE_NATIVE_ENDIAN,
>>  };
>>
>> -static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
>> -                             uint16_t section)
>> +static int subpage_register(PhysMap *map, subpage_t *mmio, uint32_t start,
>> +                              uint32_t end, uint16_t section)
>>  {
>>      int idx, eidx;
>>
>> @@ -3078,10 +3139,10 @@ static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
>>      printf("%s: %p start %08x end %08x idx %08x eidx %08x mem %ld\n", __func__,
>>             mmio, start, end, idx, eidx, memory);
>>  #endif
>> -    if (memory_region_is_ram(phys_sections[section].mr)) {
>> -        MemoryRegionSection new_section = phys_sections[section];
>> +    if (memory_region_is_ram(map->phys_sections[section].mr)) {
>> +        MemoryRegionSection new_section = map->phys_sections[section];
>>          new_section.mr = &io_mem_subpage_ram;
>> -        section = phys_section_add(&new_section);
>> +        section = phys_section_add(map, &new_section);
>>      }
>>      for (; idx <= eidx; idx++) {
>>          mmio->sub_section[idx] = section;
>> @@ -3090,12 +3151,13 @@ static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
>>      return 0;
>>  }
>>
>> -static subpage_t *subpage_init(target_phys_addr_t base)
>> +static subpage_t *subpage_init(PhysMap *map, target_phys_addr_t base)
>>  {
>>      subpage_t *mmio;
>>
>>      mmio = g_malloc0(sizeof(subpage_t));
>>
>> +    mmio->map = map;
>>      mmio->base = base;
>>      memory_region_init_io(&mmio->iomem, &subpage_ops, mmio,
>>                            "subpage", TARGET_PAGE_SIZE);
>> @@ -3104,12 +3166,12 @@ static subpage_t *subpage_init(target_phys_addr_t base)
>>      printf("%s: %p base " TARGET_FMT_plx " len %08x %d\n", __func__,
>>             mmio, base, TARGET_PAGE_SIZE, subpage_memory);
>>  #endif
>> -    subpage_register(mmio, 0, TARGET_PAGE_SIZE-1, phys_section_unassigned);
>> +    subpage_register(map, mmio, 0, TARGET_PAGE_SIZE-1, phys_section_unassigned);
>>
>>      return mmio;
>>  }
>>
>> -static uint16_t dummy_section(MemoryRegion *mr)
>> +static uint16_t dummy_section(PhysMap *map, MemoryRegion *mr)
>>  {
>>      MemoryRegionSection section = {
>>          .mr = mr,
>> @@ -3118,7 +3180,7 @@ static uint16_t dummy_section(MemoryRegion *mr)
>>          .size = UINT64_MAX,
>>      };
>>
>> -    return phys_section_add(&section);
>> +    return phys_section_add(map, &section);
>>  }
>>
>>  MemoryRegion *iotlb_to_region(target_phys_addr_t index)
>> @@ -3140,15 +3202,32 @@ static void io_mem_init(void)
>>                            "watch", UINT64_MAX);
>>  }
>>
>> -static void core_begin(MemoryListener *listener)
>> +#if 0
>> +static void physmap_init(void)
>> +{
>> +    FlatView v = { .ranges = NULL,
>> +                             .nr = 0,
>> +                             .nr_allocated = 0,
>> +    };
>> +
>> +    init_map.views[0] = v;
>> +    init_map.views[1] = v;
>> +    cur_map =  &init_map;
>> +}
>> +#endif
>
> Please delete.
>
adopted

Thanks and regards,
pingfan
>> +
>> +static void core_begin(MemoryListener *listener, PhysMap *new_map)
>>  {
>> -    destroy_all_mappings();
>> -    phys_sections_clear();
>> -    phys_map.ptr = PHYS_MAP_NODE_NIL;
>> -    phys_section_unassigned = dummy_section(&io_mem_unassigned);
>> -    phys_section_notdirty = dummy_section(&io_mem_notdirty);
>> -    phys_section_rom = dummy_section(&io_mem_rom);
>> -    phys_section_watch = dummy_section(&io_mem_watch);
>> +
>> +    new_map->root.ptr = PHYS_MAP_NODE_NIL;
>> +    new_map->root.is_leaf = 0;
>> +
>> +    /* In all the map, these sections have the same index */
>> +    phys_section_unassigned = dummy_section(new_map, &io_mem_unassigned);
>> +    phys_section_notdirty = dummy_section(new_map, &io_mem_notdirty);
>> +    phys_section_rom = dummy_section(new_map, &io_mem_rom);
>> +    phys_section_watch = dummy_section(new_map, &io_mem_watch);
>> +    next_map = new_map;
>>  }
>>
>>  static void core_commit(MemoryListener *listener)
>> @@ -3161,6 +3240,16 @@ static void core_commit(MemoryListener *listener)
>>      for(env = first_cpu; env != NULL; env = env->next_cpu) {
>>          tlb_flush(env, 1);
>>      }
>> +
>> +/* move into high layer
>> +    qemu_mutex_lock(&cur_map_lock);
>> +    if (cur_map != NULL) {
>> +        physmap_put(cur_map);
>> +    }
>> +    cur_map = next_map;
>> +    smp_mb();
>> +    qemu_mutex_unlock(&cur_map_lock);
>> +*/
>
> Also commented out code should be deleted.
>
>>  }
>>
>>  static void core_region_add(MemoryListener *listener,
>> @@ -3217,7 +3306,7 @@ static void core_eventfd_del(MemoryListener *listener,
>>  {
>>  }
>>
>> -static void io_begin(MemoryListener *listener)
>> +static void io_begin(MemoryListener *listener, PhysMap *next)
>>  {
>>  }
>>
>> @@ -3329,6 +3418,20 @@ static void memory_map_init(void)
>>      memory_listener_register(&io_memory_listener, system_io);
>>  }
>>
>> +void physmap_init(void)
>> +{
>> +    FlatView v = { .ranges = NULL, .nr = 0, .nr_allocated = 0,
>> +                           };
>> +    PhysMap *init_map = g_malloc0(sizeof(PhysMap));
>> +
>> +    atomic_set(&init_map->ref, 1);
>> +    init_map->root.ptr = PHYS_MAP_NODE_NIL;
>> +    init_map->root.is_leaf = 0;
>> +    init_map->views[0] = v;
>> +    init_map->views[1] = v;
>> +    cur_map = init_map;
>> +}
>> +
>>  MemoryRegion *get_system_memory(void)
>>  {
>>      return system_memory;
>> @@ -3391,6 +3494,7 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>      uint32_t val;
>>      target_phys_addr_t page;
>>      MemoryRegionSection *section;
>> +    PhysMap *cur = cur_map_get();
>>
>>      while (len > 0) {
>>          page = addr & TARGET_PAGE_MASK;
>> @@ -3472,6 +3576,7 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>          buf += l;
>>          addr += l;
>>      }
>> +    physmap_put(cur);
>>  }
>>
>>  /* used for ROM loading : can write in RAM and ROM */
>> diff --git a/hw/vhost.c b/hw/vhost.c
>> index 43664e7..df58345 100644
>> --- a/hw/vhost.c
>> +++ b/hw/vhost.c
>> @@ -438,7 +438,7 @@ static bool vhost_section(MemoryRegionSection *section)
>>          && memory_region_is_ram(section->mr);
>>  }
>>
>> -static void vhost_begin(MemoryListener *listener)
>> +static void vhost_begin(MemoryListener *listener, PhysMap *next)
>>  {
>>  }
>>
>> diff --git a/hw/xen_pt.c b/hw/xen_pt.c
>> index 3b6d186..fba8586 100644
>> --- a/hw/xen_pt.c
>> +++ b/hw/xen_pt.c
>> @@ -597,7 +597,7 @@ static void xen_pt_region_update(XenPCIPassthroughState *s,
>>      }
>>  }
>>
>> -static void xen_pt_begin(MemoryListener *l)
>> +static void xen_pt_begin(MemoryListener *l, PhysMap *next)
>>  {
>>  }
>>
>> diff --git a/kvm-all.c b/kvm-all.c
>> index f8e4328..bc42cab 100644
>> --- a/kvm-all.c
>> +++ b/kvm-all.c
>> @@ -693,7 +693,7 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, bool add)
>>      }
>>  }
>>
>> -static void kvm_begin(MemoryListener *listener)
>> +static void kvm_begin(MemoryListener *listener, PhysMap *next)
>>  {
>>  }
>>
>> diff --git a/memory.c b/memory.c
>> index c7f2cfd..54cdc7f 100644
>> --- a/memory.c
>> +++ b/memory.c
>> @@ -20,6 +20,7 @@
>>  #include "kvm.h"
>>  #include <assert.h>
>>  #include "hw/qdev.h"
>> +#include "qemu-thread.h"
>>
>>  #define WANT_EXEC_OBSOLETE
>>  #include "exec-obsolete.h"
>> @@ -192,7 +193,7 @@ typedef struct AddressSpaceOps AddressSpaceOps;
>>  /* A system address space - I/O, memory, etc. */
>>  struct AddressSpace {
>>      MemoryRegion *root;
>> -    FlatView current_map;
>> +    int view_id;
>>      int ioeventfd_nb;
>>      MemoryRegionIoeventfd *ioeventfds;
>>  };
>> @@ -232,11 +233,6 @@ static void flatview_insert(FlatView *view, unsigned pos, FlatRange *range)
>>      ++view->nr;
>>  }
>>
>> -static void flatview_destroy(FlatView *view)
>> -{
>> -    g_free(view->ranges);
>> -}
>> -
>>  static bool can_merge(FlatRange *r1, FlatRange *r2)
>>  {
>>      return int128_eq(addrrange_end(r1->addr), r2->addr.start)
>> @@ -594,8 +590,10 @@ static void address_space_update_ioeventfds(AddressSpace *as)
>>      MemoryRegionIoeventfd *ioeventfds = NULL;
>>      AddrRange tmp;
>>      unsigned i;
>> +    PhysMap *map = cur_map_get();
>> +    FlatView *view = &map->views[as->view_id];
>>
>> -    FOR_EACH_FLAT_RANGE(fr, &as->current_map) {
>> +    FOR_EACH_FLAT_RANGE(fr, view) {
>>          for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
>>              tmp = addrrange_shift(fr->mr->ioeventfds[i].addr,
>>                                    int128_sub(fr->addr.start,
>> @@ -616,6 +614,7 @@ static void address_space_update_ioeventfds(AddressSpace *as)
>>      g_free(as->ioeventfds);
>>      as->ioeventfds = ioeventfds;
>>      as->ioeventfd_nb = ioeventfd_nb;
>> +    physmap_put(map);
>>  }
>>
>>  static void address_space_update_topology_pass(AddressSpace *as,
>> @@ -681,21 +680,23 @@ static void address_space_update_topology_pass(AddressSpace *as,
>>  }
>>
>>
>> -static void address_space_update_topology(AddressSpace *as)
>> +static void address_space_update_topology(AddressSpace *as, PhysMap *prev,
>> +                                            PhysMap *next)
>>  {
>> -    FlatView old_view = as->current_map;
>> +    FlatView old_view = prev->views[as->view_id];
>>      FlatView new_view = generate_memory_topology(as->root);
>>
>>      address_space_update_topology_pass(as, old_view, new_view, false);
>>      address_space_update_topology_pass(as, old_view, new_view, true);
>> +    next->views[as->view_id] = new_view;
>>
>> -    as->current_map = new_view;
>> -    flatview_destroy(&old_view);
>>      address_space_update_ioeventfds(as);
>>  }
>>
>>  static void memory_region_update_topology(MemoryRegion *mr)
>>  {
>> +    PhysMap *prev, *next;
>> +
>>      if (memory_region_transaction_depth) {
>>          memory_region_update_pending |= !mr || mr->enabled;
>>          return;
>> @@ -705,16 +706,20 @@ static void memory_region_update_topology(MemoryRegion *mr)
>>          return;
>>      }
>>
>> -    MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
>> +     prev = cur_map_get();
>> +    /* allocate PhysMap next here */
>> +    next = alloc_next_map();
>> +    MEMORY_LISTENER_CALL_GLOBAL(begin, Forward, next);
>>
>>      if (address_space_memory.root) {
>> -        address_space_update_topology(&address_space_memory);
>> +        address_space_update_topology(&address_space_memory, prev, next);
>>      }
>>      if (address_space_io.root) {
>> -        address_space_update_topology(&address_space_io);
>> +        address_space_update_topology(&address_space_io, prev, next);
>>      }
>>
>>      MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
>> +    cur_map_update(next);
>>
>>      memory_region_update_pending = false;
>>  }
>> @@ -1071,7 +1076,7 @@ void memory_region_put(MemoryRegion *mr)
>>
>>      if (atomic_dec_and_test(&mr->ref)) {
>>          /* to fix, using call_rcu( ,release) */
>> -        mr->life_ops->put(mr);
>> +        physmap_reclaimer_enqueue(mr, (ReleaseHandler *)mr->life_ops->put);
>>      }
>>  }
>>
>> @@ -1147,13 +1152,18 @@ void memory_region_set_dirty(MemoryRegion *mr, target_phys_addr_t addr,
>>  void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
>>  {
>>      FlatRange *fr;
>> +    FlatView *fview;
>> +    PhysMap *map;
>>
>> -    FOR_EACH_FLAT_RANGE(fr, &address_space_memory.current_map) {
>> +    map = cur_map_get();
>> +    fview = &map->views[address_space_memory.view_id];
>> +    FOR_EACH_FLAT_RANGE(fr, fview) {
>>          if (fr->mr == mr) {
>>              MEMORY_LISTENER_UPDATE_REGION(fr, &address_space_memory,
>>                                            Forward, log_sync);
>>          }
>>      }
>> +    physmap_put(map);
>>  }
>>
>>  void memory_region_set_readonly(MemoryRegion *mr, bool readonly)
>> @@ -1201,8 +1211,12 @@ static void memory_region_update_coalesced_range(MemoryRegion *mr)
>>      FlatRange *fr;
>>      CoalescedMemoryRange *cmr;
>>      AddrRange tmp;
>> +    FlatView *fview;
>> +    PhysMap *map;
>>
>> -    FOR_EACH_FLAT_RANGE(fr, &address_space_memory.current_map) {
>> +    map = cur_map_get();
>> +    fview = &map->views[address_space_memory.view_id];
>> +    FOR_EACH_FLAT_RANGE(fr, fview) {
>>          if (fr->mr == mr) {
>>              qemu_unregister_coalesced_mmio(int128_get64(fr->addr.start),
>>                                             int128_get64(fr->addr.size));
>> @@ -1219,6 +1233,7 @@ static void memory_region_update_coalesced_range(MemoryRegion *mr)
>>              }
>>          }
>>      }
>> +    physmap_put(map);
>>  }
>>
>>  void memory_region_set_coalescing(MemoryRegion *mr)
>> @@ -1458,29 +1473,49 @@ static int cmp_flatrange_addr(const void *addr_, const void *fr_)
>>      return 0;
>>  }
>>
>> -static FlatRange *address_space_lookup(AddressSpace *as, AddrRange addr)
>> +static FlatRange *address_space_lookup(FlatView *view, AddrRange addr)
>>  {
>> -    return bsearch(&addr, as->current_map.ranges, as->current_map.nr,
>> +    return bsearch(&addr, view->ranges, view->nr,
>>                     sizeof(FlatRange), cmp_flatrange_addr);
>>  }
>>
>> +/* dec the ref, which inc by memory_region_find*/
>> +void memory_region_section_put(MemoryRegionSection *mrs)
>> +{
>> +    if (mrs->mr != NULL) {
>> +        memory_region_put(mrs->mr);
>> +    }
>> +}
>> +
>> +/* inc mr's ref. Caller need dec mr's ref */
>>  MemoryRegionSection memory_region_find(MemoryRegion *address_space,
>>                                         target_phys_addr_t addr, uint64_t size)
>>  {
>> +    PhysMap *map;
>>      AddressSpace *as = memory_region_to_address_space(address_space);
>>      AddrRange range = addrrange_make(int128_make64(addr),
>>                                       int128_make64(size));
>> -    FlatRange *fr = address_space_lookup(as, range);
>> +    FlatView *fview;
>> +
>> +    map = cur_map_get();
>> +
>> +    fview = &map->views[as->view_id];
>> +    FlatRange *fr = address_space_lookup(fview, range);
>>      MemoryRegionSection ret = { .mr = NULL, .size = 0 };
>>
>>      if (!fr) {
>> +        physmap_put(map);
>>          return ret;
>>      }
>>
>> -    while (fr > as->current_map.ranges
>> +    while (fr > fview->ranges
>>             && addrrange_intersects(fr[-1].addr, range)) {
>>          --fr;
>>      }
>> +    /* To fix, the caller must in rcu, or we must inc fr->mr->ref here
>> +     */
>> +    memory_region_get(fr->mr);
>> +    physmap_put(map);
>>
>>      ret.mr = fr->mr;
>>      range = addrrange_intersection(range, fr->addr);
>> @@ -1497,10 +1532,13 @@ void memory_global_sync_dirty_bitmap(MemoryRegion *address_space)
>>  {
>>      AddressSpace *as = memory_region_to_address_space(address_space);
>>      FlatRange *fr;
>> +    PhysMap *map = cur_map_get();
>> +    FlatView *view = &map->views[as->view_id];
>>
>> -    FOR_EACH_FLAT_RANGE(fr, &as->current_map) {
>> +    FOR_EACH_FLAT_RANGE(fr, view) {
>>          MEMORY_LISTENER_UPDATE_REGION(fr, as, Forward, log_sync);
>>      }
>> +    physmap_put(map);
>>  }
>>
>>  void memory_global_dirty_log_start(void)
>> @@ -1519,6 +1557,8 @@ static void listener_add_address_space(MemoryListener *listener,
>>                                         AddressSpace *as)
>>  {
>>      FlatRange *fr;
>> +    PhysMap *map;
>> +    FlatView *view;
>>
>>      if (listener->address_space_filter
>>          && listener->address_space_filter != as->root) {
>> @@ -1528,7 +1568,10 @@ static void listener_add_address_space(MemoryListener *listener,
>>      if (global_dirty_log) {
>>          listener->log_global_start(listener);
>>      }
>> -    FOR_EACH_FLAT_RANGE(fr, &as->current_map) {
>> +
>> +    map = cur_map_get();
>> +    view = &map->views[as->view_id];
>> +    FOR_EACH_FLAT_RANGE(fr, view) {
>>          MemoryRegionSection section = {
>>              .mr = fr->mr,
>>              .address_space = as->root,
>> @@ -1539,6 +1582,7 @@ static void listener_add_address_space(MemoryListener *listener,
>>          };
>>          listener->region_add(listener, &section);
>>      }
>> +    physmap_put(map);
>>  }
>>
>>  void memory_listener_register(MemoryListener *listener, MemoryRegion *filter)
>> @@ -1570,12 +1614,14 @@ void memory_listener_unregister(MemoryListener *listener)
>>  void set_system_memory_map(MemoryRegion *mr)
>>  {
>>      address_space_memory.root = mr;
>> +    address_space_memory.view_id = 0;
>>      memory_region_update_topology(NULL);
>>  }
>>
>>  void set_system_io_map(MemoryRegion *mr)
>>  {
>>      address_space_io.root = mr;
>> +    address_space_io.view_id = 1;
>>      memory_region_update_topology(NULL);
>>  }
>>
>> diff --git a/memory.h b/memory.h
>> index 357edd8..18442d4 100644
>> --- a/memory.h
>> +++ b/memory.h
>> @@ -256,7 +256,7 @@ typedef struct MemoryListener MemoryListener;
>>   * Use with memory_listener_register() and memory_listener_unregister().
>>   */
>>  struct MemoryListener {
>> -    void (*begin)(MemoryListener *listener);
>> +    void (*begin)(MemoryListener *listener, PhysMap *next);
>>      void (*commit)(MemoryListener *listener);
>>      void (*region_add)(MemoryListener *listener, MemoryRegionSection *section);
>>      void (*region_del)(MemoryListener *listener, MemoryRegionSection *section);
>> @@ -829,6 +829,13 @@ void mtree_info(fprintf_function mon_printf, void *f);
>>
>>  void memory_region_get(MemoryRegion *mr);
>>  void memory_region_put(MemoryRegion *mr);
>> +void physmap_reclaimer_enqueue(void *opaque, ReleaseHandler *release);
>> +void physmap_get(PhysMap *map);
>> +void physmap_put(PhysMap *map);
>> +PhysMap *cur_map_get(void);
>> +PhysMap *alloc_next_map(void);
>> +void cur_map_update(PhysMap *next);
>> +void physmap_init(void);
>>  #endif
>>
>>  #endif
>> diff --git a/vl.c b/vl.c
>> index 1329c30..12af523 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -3346,6 +3346,7 @@ int main(int argc, char **argv, char **envp)
>>      if (ram_size == 0) {
>>          ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
>>      }
>> +    physmap_init();
>>
>>      configure_accelerator();
>>
>> diff --git a/xen-all.c b/xen-all.c
>> index 59f2323..41d82fd 100644
>> --- a/xen-all.c
>> +++ b/xen-all.c
>> @@ -452,7 +452,7 @@ static void xen_set_memory(struct MemoryListener *listener,
>>      }
>>  }
>>
>> -static void xen_begin(MemoryListener *listener)
>> +static void xen_begin(MemoryListener *listener, PhysMap *next)
>>  {
>>  }
>>
>> --
>> 1.7.4.4
>>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 03/15] qom: introduce reclaimer to release obj
  2012-08-08  9:15         ` [Qemu-devel] " Avi Kivity
@ 2012-08-09  7:33           ` liu ping fan
  -1 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:33 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Wed, Aug 8, 2012 at 5:15 PM, Avi Kivity <avi@redhat.com> wrote:
> On 08/08/2012 12:07 PM, Paolo Bonzini wrote:
>> Il 08/08/2012 11:05, Avi Kivity ha scritto:
>>>> > From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>> >
>>>> > Collect unused object and release them at caller demand.
>>>> >
>>> Please explain the motivation for this patch.
>>
>> It's poor man RCU, I think?
>
> I thought that it was to defer destructors (finalizers) to a more
> suitable context.  But why is the unref context unsuitable?
>
Yes, it is to defer destructors.
See 0009-memory-prepare-flatview-and-radix-tree-for-rcu-style.patch
When MemoryRegion is _del_subregion from mem in updater, it may be
still in use by reader -- radix or flatview, so defer its destructors
to the reclaimer --phys_map_release(PhysMap *map)
If we have rcu, it could be elegant to do this.

I think, I should write the commit comment here too,  not until the
followed patch.

Regards, pingfan
> I don't see how it relates to RCU, where is the C and the U?
>
> Anyway the list eagerly awaits the explanation.
>
> --
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 03/15] qom: introduce reclaimer to release obj
@ 2012-08-09  7:33           ` liu ping fan
  0 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:33 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Wed, Aug 8, 2012 at 5:15 PM, Avi Kivity <avi@redhat.com> wrote:
> On 08/08/2012 12:07 PM, Paolo Bonzini wrote:
>> Il 08/08/2012 11:05, Avi Kivity ha scritto:
>>>> > From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>> >
>>>> > Collect unused object and release them at caller demand.
>>>> >
>>> Please explain the motivation for this patch.
>>
>> It's poor man RCU, I think?
>
> I thought that it was to defer destructors (finalizers) to a more
> suitable context.  But why is the unref context unsuitable?
>
Yes, it is to defer destructors.
See 0009-memory-prepare-flatview-and-radix-tree-for-rcu-style.patch
When MemoryRegion is _del_subregion from mem in updater, it may be
still in use by reader -- radix or flatview, so defer its destructors
to the reclaimer --phys_map_release(PhysMap *map)
If we have rcu, it could be elegant to do this.

I think, I should write the commit comment here too,  not until the
followed patch.

Regards, pingfan
> I don't see how it relates to RCU, where is the C and the U?
>
> Anyway the list eagerly awaits the explanation.
>
> --
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 03/15] qom: introduce reclaimer to release obj
  2012-08-08  9:35     ` [Qemu-devel] " Paolo Bonzini
@ 2012-08-09  7:38       ` liu ping fan
  -1 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, kvm, Anthony Liguori, Avi Kivity, Jan Kiszka,
	Marcelo Tosatti, Stefan Hajnoczi, Blue Swirl,
	Andreas Färber

On Wed, Aug 8, 2012 at 5:35 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> Collect unused object and release them at caller demand.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  include/qemu/reclaimer.h |   28 ++++++++++++++++++++++
>>  main-loop.c              |    5 ++++
>>  qemu-tool.c              |    5 ++++
>>  qom/Makefile.objs        |    2 +-
>>  qom/reclaimer.c          |   58 ++++++++++++++++++++++++++++++++++++++++++++++
>>  5 files changed, 97 insertions(+), 1 deletions(-)
>>  create mode 100644 include/qemu/reclaimer.h
>>  create mode 100644 qom/reclaimer.c
>>
>> diff --git a/include/qemu/reclaimer.h b/include/qemu/reclaimer.h
>> new file mode 100644
>> index 0000000..9307e93
>> --- /dev/null
>> +++ b/include/qemu/reclaimer.h
>> @@ -0,0 +1,28 @@
>> +/*
>> + * QEMU reclaimer
>> + *
>> + * Copyright IBM, Corp. 2012
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#ifndef QEMU_RECLAIMER
>> +#define QEMU_RECLAIMER
>> +
>> +typedef void ReleaseHandler(void *opaque);
>> +typedef struct Chunk {
>> +    QLIST_ENTRY(Chunk) list;
>> +    void *opaque;
>> +    ReleaseHandler *release;
>> +} Chunk;
>> +
>> +typedef struct ChunkHead {
>> +        struct Chunk *lh_first;
>> +} ChunkHead;
>> +
>> +void reclaimer_enqueue(ChunkHead *head, void *opaque, ReleaseHandler *release);
>> +void reclaimer_worker(ChunkHead *head);
>> +void qemu_reclaimer_enqueue(void *opaque, ReleaseHandler *release);
>> +void qemu_reclaimer(void);
>
> So "enqueue" is call_rcu and qemu_reclaimer marks a quiescent state +
> empties the pending call_rcu.
>
> But what's the difference between the two pairs of APIs?
>
I add the new one for V2 to reclaim the resource for mem view. And
yes, as you point out, I will delete the 2nd API, for it can be
substituted by the 1st one easily.

>> +#endif
>> diff --git a/main-loop.c b/main-loop.c
>> index eb3b6e6..be9d095 100644
>> --- a/main-loop.c
>> +++ b/main-loop.c
>> @@ -26,6 +26,7 @@
>>  #include "qemu-timer.h"
>>  #include "slirp/slirp.h"
>>  #include "main-loop.h"
>> +#include "qemu/reclaimer.h"
>>
>>  #ifndef _WIN32
>>
>> @@ -505,5 +506,9 @@ int main_loop_wait(int nonblocking)
>>         them.  */
>>      qemu_bh_poll();
>>
>> +    /* ref to device from iohandler/bh/timer do not obey the rules, so delay
>> +     * reclaiming until now.
>> +     */
>> +    qemu_reclaimer();
>>      return ret;
>>  }
>> diff --git a/qemu-tool.c b/qemu-tool.c
>> index 318c5fc..f5fe319 100644
>> --- a/qemu-tool.c
>> +++ b/qemu-tool.c
>> @@ -21,6 +21,7 @@
>>  #include "main-loop.h"
>>  #include "qemu_socket.h"
>>  #include "slirp/libslirp.h"
>> +#include "qemu/reclaimer.h"
>>
>>  #include <sys/time.h>
>>
>> @@ -75,6 +76,10 @@ void qemu_mutex_unlock_iothread(void)
>>  {
>>  }
>>
>> +void qemu_reclaimer(void)
>> +{
>> +}
>> +
>>  int use_icount;
>>
>>  void qemu_clock_warp(QEMUClock *clock)
>> diff --git a/qom/Makefile.objs b/qom/Makefile.objs
>> index 5ef060a..a579261 100644
>> --- a/qom/Makefile.objs
>> +++ b/qom/Makefile.objs
>> @@ -1,4 +1,4 @@
>> -qom-obj-y = object.o container.o qom-qobject.o
>> +qom-obj-y = object.o container.o qom-qobject.o reclaimer.o
>>  qom-obj-twice-y = cpu.o
>>  common-obj-y = $(qom-obj-twice-y)
>>  user-obj-y = $(qom-obj-twice-y)
>> diff --git a/qom/reclaimer.c b/qom/reclaimer.c
>> new file mode 100644
>> index 0000000..6cb53e3
>> --- /dev/null
>> +++ b/qom/reclaimer.c
>> @@ -0,0 +1,58 @@
>> +/*
>> + * QEMU reclaimer
>> + *
>> + * Copyright IBM, Corp. 2012
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#include "qemu-common.h"
>> +#include "qemu-thread.h"
>> +#include "main-loop.h"
>> +#include "qemu-queue.h"
>> +#include "qemu/reclaimer.h"
>> +
>> +static struct QemuMutex reclaimer_lock;
>> +static QLIST_HEAD(rcl, Chunk) reclaimer_list;
>> +
>> +void reclaimer_enqueue(ChunkHead *head, void *opaque, ReleaseHandler *release)
>> +{
>> +    Chunk *r = g_malloc0(sizeof(Chunk));
>> +    r->opaque = opaque;
>> +    r->release = release;
>> +    QLIST_INSERT_HEAD_RCU(head, r, list);
>> +}
>
> No lock?
Yes, need!  I will think it more closely.

Thanks and regards,
pingfan
>
>> +void reclaimer_worker(ChunkHead *head)
>> +{
>> +    Chunk *cur, *next;
>> +
>> +    QLIST_FOREACH_SAFE(cur, head, list, next) {
>> +        QLIST_REMOVE(cur, list);
>> +        cur->release(cur->opaque);
>> +        g_free(cur);
>> +    }
>
> QLIST_REMOVE needs a lock too, so using the lockless
> QLIST_INSERT_HEAD_RCU is not necessary.
>
>> +}
>> +
>> +void qemu_reclaimer_enqueue(void *opaque, ReleaseHandler *release)
>> +{
>> +    Chunk *r = g_malloc0(sizeof(Chunk));
>> +    r->opaque = opaque;
>> +    r->release = release;
>> +    qemu_mutex_lock(&reclaimer_lock);
>> +    QLIST_INSERT_HEAD_RCU(&reclaimer_list, r, list);
>> +    qemu_mutex_unlock(&reclaimer_lock);
>> +}
>> +
>> +
>> +void qemu_reclaimer(void)
>> +{
>> +    Chunk *cur, *next;
>> +
>> +    QLIST_FOREACH_SAFE(cur, &reclaimer_list, list, next) {
>> +        QLIST_REMOVE(cur, list);
>> +        cur->release(cur->opaque);
>> +        g_free(cur);
>> +    }
>
> Same here.
>
>> +}
>>
>
>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 03/15] qom: introduce reclaimer to release obj
@ 2012-08-09  7:38       ` liu ping fan
  0 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-09  7:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

On Wed, Aug 8, 2012 at 5:35 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> Collect unused object and release them at caller demand.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  include/qemu/reclaimer.h |   28 ++++++++++++++++++++++
>>  main-loop.c              |    5 ++++
>>  qemu-tool.c              |    5 ++++
>>  qom/Makefile.objs        |    2 +-
>>  qom/reclaimer.c          |   58 ++++++++++++++++++++++++++++++++++++++++++++++
>>  5 files changed, 97 insertions(+), 1 deletions(-)
>>  create mode 100644 include/qemu/reclaimer.h
>>  create mode 100644 qom/reclaimer.c
>>
>> diff --git a/include/qemu/reclaimer.h b/include/qemu/reclaimer.h
>> new file mode 100644
>> index 0000000..9307e93
>> --- /dev/null
>> +++ b/include/qemu/reclaimer.h
>> @@ -0,0 +1,28 @@
>> +/*
>> + * QEMU reclaimer
>> + *
>> + * Copyright IBM, Corp. 2012
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#ifndef QEMU_RECLAIMER
>> +#define QEMU_RECLAIMER
>> +
>> +typedef void ReleaseHandler(void *opaque);
>> +typedef struct Chunk {
>> +    QLIST_ENTRY(Chunk) list;
>> +    void *opaque;
>> +    ReleaseHandler *release;
>> +} Chunk;
>> +
>> +typedef struct ChunkHead {
>> +        struct Chunk *lh_first;
>> +} ChunkHead;
>> +
>> +void reclaimer_enqueue(ChunkHead *head, void *opaque, ReleaseHandler *release);
>> +void reclaimer_worker(ChunkHead *head);
>> +void qemu_reclaimer_enqueue(void *opaque, ReleaseHandler *release);
>> +void qemu_reclaimer(void);
>
> So "enqueue" is call_rcu and qemu_reclaimer marks a quiescent state +
> empties the pending call_rcu.
>
> But what's the difference between the two pairs of APIs?
>
I add the new one for V2 to reclaim the resource for mem view. And
yes, as you point out, I will delete the 2nd API, for it can be
substituted by the 1st one easily.

>> +#endif
>> diff --git a/main-loop.c b/main-loop.c
>> index eb3b6e6..be9d095 100644
>> --- a/main-loop.c
>> +++ b/main-loop.c
>> @@ -26,6 +26,7 @@
>>  #include "qemu-timer.h"
>>  #include "slirp/slirp.h"
>>  #include "main-loop.h"
>> +#include "qemu/reclaimer.h"
>>
>>  #ifndef _WIN32
>>
>> @@ -505,5 +506,9 @@ int main_loop_wait(int nonblocking)
>>         them.  */
>>      qemu_bh_poll();
>>
>> +    /* ref to device from iohandler/bh/timer do not obey the rules, so delay
>> +     * reclaiming until now.
>> +     */
>> +    qemu_reclaimer();
>>      return ret;
>>  }
>> diff --git a/qemu-tool.c b/qemu-tool.c
>> index 318c5fc..f5fe319 100644
>> --- a/qemu-tool.c
>> +++ b/qemu-tool.c
>> @@ -21,6 +21,7 @@
>>  #include "main-loop.h"
>>  #include "qemu_socket.h"
>>  #include "slirp/libslirp.h"
>> +#include "qemu/reclaimer.h"
>>
>>  #include <sys/time.h>
>>
>> @@ -75,6 +76,10 @@ void qemu_mutex_unlock_iothread(void)
>>  {
>>  }
>>
>> +void qemu_reclaimer(void)
>> +{
>> +}
>> +
>>  int use_icount;
>>
>>  void qemu_clock_warp(QEMUClock *clock)
>> diff --git a/qom/Makefile.objs b/qom/Makefile.objs
>> index 5ef060a..a579261 100644
>> --- a/qom/Makefile.objs
>> +++ b/qom/Makefile.objs
>> @@ -1,4 +1,4 @@
>> -qom-obj-y = object.o container.o qom-qobject.o
>> +qom-obj-y = object.o container.o qom-qobject.o reclaimer.o
>>  qom-obj-twice-y = cpu.o
>>  common-obj-y = $(qom-obj-twice-y)
>>  user-obj-y = $(qom-obj-twice-y)
>> diff --git a/qom/reclaimer.c b/qom/reclaimer.c
>> new file mode 100644
>> index 0000000..6cb53e3
>> --- /dev/null
>> +++ b/qom/reclaimer.c
>> @@ -0,0 +1,58 @@
>> +/*
>> + * QEMU reclaimer
>> + *
>> + * Copyright IBM, Corp. 2012
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#include "qemu-common.h"
>> +#include "qemu-thread.h"
>> +#include "main-loop.h"
>> +#include "qemu-queue.h"
>> +#include "qemu/reclaimer.h"
>> +
>> +static struct QemuMutex reclaimer_lock;
>> +static QLIST_HEAD(rcl, Chunk) reclaimer_list;
>> +
>> +void reclaimer_enqueue(ChunkHead *head, void *opaque, ReleaseHandler *release)
>> +{
>> +    Chunk *r = g_malloc0(sizeof(Chunk));
>> +    r->opaque = opaque;
>> +    r->release = release;
>> +    QLIST_INSERT_HEAD_RCU(head, r, list);
>> +}
>
> No lock?
Yes, need!  I will think it more closely.

Thanks and regards,
pingfan
>
>> +void reclaimer_worker(ChunkHead *head)
>> +{
>> +    Chunk *cur, *next;
>> +
>> +    QLIST_FOREACH_SAFE(cur, head, list, next) {
>> +        QLIST_REMOVE(cur, list);
>> +        cur->release(cur->opaque);
>> +        g_free(cur);
>> +    }
>
> QLIST_REMOVE needs a lock too, so using the lockless
> QLIST_INSERT_HEAD_RCU is not necessary.
>
>> +}
>> +
>> +void qemu_reclaimer_enqueue(void *opaque, ReleaseHandler *release)
>> +{
>> +    Chunk *r = g_malloc0(sizeof(Chunk));
>> +    r->opaque = opaque;
>> +    r->release = release;
>> +    qemu_mutex_lock(&reclaimer_lock);
>> +    QLIST_INSERT_HEAD_RCU(&reclaimer_list, r, list);
>> +    qemu_mutex_unlock(&reclaimer_lock);
>> +}
>> +
>> +
>> +void qemu_reclaimer(void)
>> +{
>> +    Chunk *cur, *next;
>> +
>> +    QLIST_FOREACH_SAFE(cur, &reclaimer_list, list, next) {
>> +        QLIST_REMOVE(cur, list);
>> +        cur->release(cur->opaque);
>> +        g_free(cur);
>> +    }
>
> Same here.
>
>> +}
>>
>
>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 15/15] e1000: using new interface--unmap to unplug
  2012-08-09  7:28       ` [Qemu-devel] " liu ping fan
@ 2012-08-09  7:40         ` Paolo Bonzini
  -1 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-09  7:40 UTC (permalink / raw)
  To: liu ping fan
  Cc: qemu-devel, kvm, Anthony Liguori, Avi Kivity, Jan Kiszka,
	Marcelo Tosatti, Stefan Hajnoczi, Blue Swirl,
	Andreas Färber

Il 09/08/2012 09:28, liu ping fan ha scritto:
>>> >> +static void
>>> >> +pci_e1000_unmap(PCIDevice *p)
>>> >> +{
>>> >> +    /* DO NOT FREE anything!until refcnt=0 */
>>> >> +    /* isolate from memory view */
>>> >> +}
>> >
>> > At least you need to call the superclass method.
>> >
> Refer to  0013-hotplug-introduce-qdev_unplug_complete-to-remove-dev.patch,
> we have the following sequence
> qdev_unmap->pci_unmap_device->pci_e1000_unmap.  So pci_e1000_unmap
> need not to do anything.

But then this patch is unnecessary, isn't it?

Paolo


^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 15/15] e1000: using new interface--unmap to unplug
@ 2012-08-09  7:40         ` Paolo Bonzini
  0 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-09  7:40 UTC (permalink / raw)
  To: liu ping fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

Il 09/08/2012 09:28, liu ping fan ha scritto:
>>> >> +static void
>>> >> +pci_e1000_unmap(PCIDevice *p)
>>> >> +{
>>> >> +    /* DO NOT FREE anything!until refcnt=0 */
>>> >> +    /* isolate from memory view */
>>> >> +}
>> >
>> > At least you need to call the superclass method.
>> >
> Refer to  0013-hotplug-introduce-qdev_unplug_complete-to-remove-dev.patch,
> we have the following sequence
> qdev_unmap->pci_unmap_device->pci_e1000_unmap.  So pci_e1000_unmap
> need not to do anything.

But then this patch is unnecessary, isn't it?

Paolo

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 11/15] lock: introduce global lock for device tree
  2012-08-09  7:28       ` [Qemu-devel] " liu ping fan
@ 2012-08-09  7:41         ` Paolo Bonzini
  -1 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-09  7:41 UTC (permalink / raw)
  To: liu ping fan
  Cc: qemu-devel, kvm, Anthony Liguori, Avi Kivity, Jan Kiszka,
	Marcelo Tosatti, Stefan Hajnoczi, Blue Swirl,
	Andreas Färber

Il 09/08/2012 09:28, liu ping fan ha scritto:
> On Wed, Aug 8, 2012 at 5:41 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>
>>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>> ---
>>>  cpus.c      |   12 ++++++++++++
>>>  main-loop.h |    3 +++
>>>  2 files changed, 15 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/cpus.c b/cpus.c
>>> index b182b3d..a734b36 100644
>>> --- a/cpus.c
>>> +++ b/cpus.c
>>> @@ -611,6 +611,7 @@ static void qemu_tcg_init_cpu_signals(void)
>>>  }
>>>  #endif /* _WIN32 */
>>>
>>> +QemuMutex qemu_device_tree_mutex;
>>>  QemuMutex qemu_global_mutex;
>>>  static QemuCond qemu_io_proceeded_cond;
>>>  static bool iothread_requesting_mutex;
>>> @@ -634,6 +635,7 @@ void qemu_init_cpu_loop(void)
>>>      qemu_cond_init(&qemu_work_cond);
>>>      qemu_cond_init(&qemu_io_proceeded_cond);
>>>      qemu_mutex_init(&qemu_global_mutex);
>>> +    qemu_mutex_init(&qemu_device_tree_mutex);
>>>
>>>      qemu_thread_get_self(&io_thread);
>>>  }
>>> @@ -911,6 +913,16 @@ void qemu_mutex_unlock_iothread(void)
>>>      qemu_mutex_unlock(&qemu_global_mutex);
>>>  }
>>>
>>> +void qemu_lock_devtree(void)
>>> +{
>>> +    qemu_mutex_lock(&qemu_device_tree_mutex);
>>> +}
>>> +
>>> +void qemu_unlock_devtree(void)
>>> +{
>>> +    qemu_mutex_unlock(&qemu_device_tree_mutex);
>>> +}
>>
>> We don't need the wrappers.  They are there for the big lock just
>> because TCG needs extra work for iothread_requesting_mutex.
>>
> Sorry, could you give more detail about TCG, what is extra work.

void qemu_mutex_lock_iothread(void)
{
    if (!tcg_enabled()) {
        qemu_mutex_lock(&qemu_global_mutex);
    } else {
        iothread_requesting_mutex = true;
        if (qemu_mutex_trylock(&qemu_global_mutex)) {
            qemu_cpu_kick_thread(first_cpu);
            qemu_mutex_lock(&qemu_global_mutex);
        }
        iothread_requesting_mutex = false;
        qemu_cond_broadcast(&qemu_io_proceeded_cond);
    }
}

You do not need any of the code in the "else" branch for the device tree
mutex, so you do not need wrappers.



^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 11/15] lock: introduce global lock for device tree
@ 2012-08-09  7:41         ` Paolo Bonzini
  0 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-09  7:41 UTC (permalink / raw)
  To: liu ping fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

Il 09/08/2012 09:28, liu ping fan ha scritto:
> On Wed, Aug 8, 2012 at 5:41 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 08/08/2012 08:25, Liu Ping Fan ha scritto:
>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>
>>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>> ---
>>>  cpus.c      |   12 ++++++++++++
>>>  main-loop.h |    3 +++
>>>  2 files changed, 15 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/cpus.c b/cpus.c
>>> index b182b3d..a734b36 100644
>>> --- a/cpus.c
>>> +++ b/cpus.c
>>> @@ -611,6 +611,7 @@ static void qemu_tcg_init_cpu_signals(void)
>>>  }
>>>  #endif /* _WIN32 */
>>>
>>> +QemuMutex qemu_device_tree_mutex;
>>>  QemuMutex qemu_global_mutex;
>>>  static QemuCond qemu_io_proceeded_cond;
>>>  static bool iothread_requesting_mutex;
>>> @@ -634,6 +635,7 @@ void qemu_init_cpu_loop(void)
>>>      qemu_cond_init(&qemu_work_cond);
>>>      qemu_cond_init(&qemu_io_proceeded_cond);
>>>      qemu_mutex_init(&qemu_global_mutex);
>>> +    qemu_mutex_init(&qemu_device_tree_mutex);
>>>
>>>      qemu_thread_get_self(&io_thread);
>>>  }
>>> @@ -911,6 +913,16 @@ void qemu_mutex_unlock_iothread(void)
>>>      qemu_mutex_unlock(&qemu_global_mutex);
>>>  }
>>>
>>> +void qemu_lock_devtree(void)
>>> +{
>>> +    qemu_mutex_lock(&qemu_device_tree_mutex);
>>> +}
>>> +
>>> +void qemu_unlock_devtree(void)
>>> +{
>>> +    qemu_mutex_unlock(&qemu_device_tree_mutex);
>>> +}
>>
>> We don't need the wrappers.  They are there for the big lock just
>> because TCG needs extra work for iothread_requesting_mutex.
>>
> Sorry, could you give more detail about TCG, what is extra work.

void qemu_mutex_lock_iothread(void)
{
    if (!tcg_enabled()) {
        qemu_mutex_lock(&qemu_global_mutex);
    } else {
        iothread_requesting_mutex = true;
        if (qemu_mutex_trylock(&qemu_global_mutex)) {
            qemu_cpu_kick_thread(first_cpu);
            qemu_mutex_lock(&qemu_global_mutex);
        }
        iothread_requesting_mutex = false;
        qemu_cond_broadcast(&qemu_io_proceeded_cond);
    }
}

You do not need any of the code in the "else" branch for the device tree
mutex, so you do not need wrappers.

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 03/15] qom: introduce reclaimer to release obj
  2012-08-09  7:33           ` [Qemu-devel] " liu ping fan
@ 2012-08-09  7:49             ` Paolo Bonzini
  -1 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-09  7:49 UTC (permalink / raw)
  To: liu ping fan
  Cc: Avi Kivity, kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

Il 09/08/2012 09:33, liu ping fan ha scritto:
> Yes, it is to defer destructors.
> See 0009-memory-prepare-flatview-and-radix-tree-for-rcu-style.patch
> When MemoryRegion is _del_subregion from mem in updater, it may be
> still in use by reader -- radix or flatview, so defer its destructors
> to the reclaimer --phys_map_release(PhysMap *map)

How are you sure that the reader is already out of its critical section
by the time the reclaimer runs?

> If we have rcu, it could be elegant to do this.

Yeah, I think inventing primitives is dangerous and difficult to review;
and it may be difficult to replace it with proper call_rcu.

You should probably make a proof-of-concept using liburcu.  Then we can
decide how to implement RCU in a way that is portable enough for QEMU's
needs.

Paolo

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 03/15] qom: introduce reclaimer to release obj
@ 2012-08-09  7:49             ` Paolo Bonzini
  0 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-09  7:49 UTC (permalink / raw)
  To: liu ping fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

Il 09/08/2012 09:33, liu ping fan ha scritto:
> Yes, it is to defer destructors.
> See 0009-memory-prepare-flatview-and-radix-tree-for-rcu-style.patch
> When MemoryRegion is _del_subregion from mem in updater, it may be
> still in use by reader -- radix or flatview, so defer its destructors
> to the reclaimer --phys_map_release(PhysMap *map)

How are you sure that the reader is already out of its critical section
by the time the reclaimer runs?

> If we have rcu, it could be elegant to do this.

Yeah, I think inventing primitives is dangerous and difficult to review;
and it may be difficult to replace it with proper call_rcu.

You should probably make a proof-of-concept using liburcu.  Then we can
decide how to implement RCU in a way that is portable enough for QEMU's
needs.

Paolo

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 13/15] hotplug: introduce qdev_unplug_complete() to remove device from views
  2012-08-09  7:28       ` [Qemu-devel] " liu ping fan
@ 2012-08-09  8:00         ` Paolo Bonzini
  -1 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-09  8:00 UTC (permalink / raw)
  To: liu ping fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

Il 09/08/2012 09:28, liu ping fan ha scritto:
>> >     VCPU thread                    I/O thread
>> > =====================================================================
>> >     get MMIO request
>> >     rcu_read_lock()
>> >     walk memory map
>> >                                    qdev_unmap()
>> >                                    lock_devtree()
>> >                                    ...
>> >                                    unlock_devtree
>> >                                    unref dev -> refcnt=0, free enqueued
>> >     ref()
> No ref() for dev here, while we have ref to flatview+radix in my patches.
> I use rcu to protect radix+flatview+mr refered. As to dev, its ref has
> inc when it is added into mem view -- that is
> memory_region_add_subregion -> memory_region_get() {
> if(atomic_add_and_return()) dev->ref++  }.
> So not until reclaimer of mem view, the dev's ref is hold by mem view.
> In a short word, rcu protect mem view, while device is protected by refcnt.

But the RCU critical section should not include the whole processing of
MMIO, only the walk of the memory map.

And in general I think this is a bit too tricky... I understand not
adding refcounting to all of bottom halves, timers, etc., but if you are
using a device you should have explicit ref/unref pairs.

Paolo

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 13/15] hotplug: introduce qdev_unplug_complete() to remove device from views
@ 2012-08-09  8:00         ` Paolo Bonzini
  0 siblings, 0 replies; 154+ messages in thread
From: Paolo Bonzini @ 2012-08-09  8:00 UTC (permalink / raw)
  To: liu ping fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

Il 09/08/2012 09:28, liu ping fan ha scritto:
>> >     VCPU thread                    I/O thread
>> > =====================================================================
>> >     get MMIO request
>> >     rcu_read_lock()
>> >     walk memory map
>> >                                    qdev_unmap()
>> >                                    lock_devtree()
>> >                                    ...
>> >                                    unlock_devtree
>> >                                    unref dev -> refcnt=0, free enqueued
>> >     ref()
> No ref() for dev here, while we have ref to flatview+radix in my patches.
> I use rcu to protect radix+flatview+mr refered. As to dev, its ref has
> inc when it is added into mem view -- that is
> memory_region_add_subregion -> memory_region_get() {
> if(atomic_add_and_return()) dev->ref++  }.
> So not until reclaimer of mem view, the dev's ref is hold by mem view.
> In a short word, rcu protect mem view, while device is protected by refcnt.

But the RCU critical section should not include the whole processing of
MMIO, only the walk of the memory map.

And in general I think this is a bit too tricky... I understand not
adding refcounting to all of bottom halves, timers, etc., but if you are
using a device you should have explicit ref/unref pairs.

Paolo

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 03/15] qom: introduce reclaimer to release obj
  2012-08-09  7:49             ` [Qemu-devel] " Paolo Bonzini
@ 2012-08-09  8:18               ` Avi Kivity
  -1 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-09  8:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, liu ping fan, qemu-devel,
	Blue Swirl, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

On 08/09/2012 10:49 AM, Paolo Bonzini wrote:
> Il 09/08/2012 09:33, liu ping fan ha scritto:
>> Yes, it is to defer destructors.
>> See 0009-memory-prepare-flatview-and-radix-tree-for-rcu-style.patch
>> When MemoryRegion is _del_subregion from mem in updater, it may be
>> still in use by reader -- radix or flatview, so defer its destructors
>> to the reclaimer --phys_map_release(PhysMap *map)
> 
> How are you sure that the reader is already out of its critical section
> by the time the reclaimer runs?
> 
>> If we have rcu, it could be elegant to do this.
> 
> Yeah, I think inventing primitives is dangerous and difficult to review;
> and it may be difficult to replace it with proper call_rcu.
> 
> You should probably make a proof-of-concept using liburcu.  Then we can
> decide how to implement RCU in a way that is portable enough for QEMU's
> needs.

IMO we should start with a simple mutex (which will cover only the
lookup and map rebuild).  This should reduce the contention to basically
nothing (still leaving a cache line bounce).  If a profile shows the
cache line bounce hurting us, or perhaps contention in ultralarge
guests, then we should switch to rcu.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 03/15] qom: introduce reclaimer to release obj
@ 2012-08-09  8:18               ` Avi Kivity
  0 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-09  8:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, liu ping fan, qemu-devel,
	Blue Swirl, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

On 08/09/2012 10:49 AM, Paolo Bonzini wrote:
> Il 09/08/2012 09:33, liu ping fan ha scritto:
>> Yes, it is to defer destructors.
>> See 0009-memory-prepare-flatview-and-radix-tree-for-rcu-style.patch
>> When MemoryRegion is _del_subregion from mem in updater, it may be
>> still in use by reader -- radix or flatview, so defer its destructors
>> to the reclaimer --phys_map_release(PhysMap *map)
> 
> How are you sure that the reader is already out of its critical section
> by the time the reclaimer runs?
> 
>> If we have rcu, it could be elegant to do this.
> 
> Yeah, I think inventing primitives is dangerous and difficult to review;
> and it may be difficult to replace it with proper call_rcu.
> 
> You should probably make a proof-of-concept using liburcu.  Then we can
> decide how to implement RCU in a way that is portable enough for QEMU's
> needs.

IMO we should start with a simple mutex (which will cover only the
lookup and map rebuild).  This should reduce the contention to basically
nothing (still leaving a cache line bounce).  If a profile shows the
cache line bounce hurting us, or perhaps contention in ultralarge
guests, then we should switch to rcu.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 04/15] memory: MemoryRegion topology must be stable when updating
  2012-08-09  7:28       ` [Qemu-devel] " liu ping fan
@ 2012-08-09  8:24         ` Avi Kivity
  -1 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-09  8:24 UTC (permalink / raw)
  To: liu ping fan
  Cc: qemu-devel, kvm, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber

On 08/09/2012 10:28 AM, liu ping fan wrote:
>>
>> Seems to me that nothing in memory.c can susceptible to races.  It must
>> already be called under the big qemu lock, and with the exception of
>> mutators (memory_region_set_*), changes aren't directly visible.
>>
> Yes, what I want to do is "prepare unplug out of protection of global
> lock".  When io-dispatch and mmio-dispatch are all out of big lock, we
> will run into the following scene:
>     In vcpu context A, qdev_unplug_complete()-> delete subregion;
>     In context B, write pci bar --> pci mapping update    -> add subregion

Why do you want unlocked unplug?  Unplug is rare and complicated; there
are no performance considerations on one hand, and difficulty of testing
for lock correctness on the other.  I think it is better if it remains
protected by the global lock.

> 
>> I think it's sufficient to take the mem_map_lock at the beginning of
>> core_begin() and drop it at the end of core_commit().  That means all
>> updates of volatile state, phys_map, are protected.
>>
> The mem_map_lock is to protect both address_space_io and
> address_space_memory. When without the protection of big lock,
> competing will raise among the updaters
> (memory_region_{add,del}_subregion and the readers
> generate_memory_topology()->render_memory_region().

These should all run under the big qemu lock, for the same reasons.
They are rare and not performance sensitive.  Only phys_map reads are
performance sensitive.

> 
> If just in core_begin/commit, we will duplicate it for
> xx_begin/commit, right?  

No.  Other listeners will be protected by the global lock.

> And at the same time, mr->subregions is
> exposed under SMP without big lock.
> 

Who accesses it?

IMO locking should look like:

  phys_map: mem_map_lock
  dispatch callbacks: device specific lock (or big qemu lock for
unconverted devices)
  everything else: big qemu lock



-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 04/15] memory: MemoryRegion topology must be stable when updating
@ 2012-08-09  8:24         ` Avi Kivity
  0 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-09  8:24 UTC (permalink / raw)
  To: liu ping fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On 08/09/2012 10:28 AM, liu ping fan wrote:
>>
>> Seems to me that nothing in memory.c can susceptible to races.  It must
>> already be called under the big qemu lock, and with the exception of
>> mutators (memory_region_set_*), changes aren't directly visible.
>>
> Yes, what I want to do is "prepare unplug out of protection of global
> lock".  When io-dispatch and mmio-dispatch are all out of big lock, we
> will run into the following scene:
>     In vcpu context A, qdev_unplug_complete()-> delete subregion;
>     In context B, write pci bar --> pci mapping update    -> add subregion

Why do you want unlocked unplug?  Unplug is rare and complicated; there
are no performance considerations on one hand, and difficulty of testing
for lock correctness on the other.  I think it is better if it remains
protected by the global lock.

> 
>> I think it's sufficient to take the mem_map_lock at the beginning of
>> core_begin() and drop it at the end of core_commit().  That means all
>> updates of volatile state, phys_map, are protected.
>>
> The mem_map_lock is to protect both address_space_io and
> address_space_memory. When without the protection of big lock,
> competing will raise among the updaters
> (memory_region_{add,del}_subregion and the readers
> generate_memory_topology()->render_memory_region().

These should all run under the big qemu lock, for the same reasons.
They are rare and not performance sensitive.  Only phys_map reads are
performance sensitive.

> 
> If just in core_begin/commit, we will duplicate it for
> xx_begin/commit, right?  

No.  Other listeners will be protected by the global lock.

> And at the same time, mr->subregions is
> exposed under SMP without big lock.
> 

Who accesses it?

IMO locking should look like:

  phys_map: mem_map_lock
  dispatch callbacks: device specific lock (or big qemu lock for
unconverted devices)
  everything else: big qemu lock



-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 11/15] lock: introduce global lock for device tree
  2012-08-09  7:27       ` [Qemu-devel] " liu ping fan
@ 2012-08-09  8:31         ` Avi Kivity
  -1 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-09  8:31 UTC (permalink / raw)
  To: liu ping fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On 08/09/2012 10:27 AM, liu ping fan wrote:
> On Wed, Aug 8, 2012 at 5:42 PM, Avi Kivity <avi@redhat.com> wrote:
>> On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>
>>
>> Please explain the motivation.  AFAICT, the big qemu lock is sufficient.
>>
> Oh, this is one of the series locks for the removal of big qemu lock.

Why do you want to remove the big qemu lock?

Even now it is not heavily contended.  We should focus on fixing the
cases where is it contended, instead of removing it completely, which is
sure to make further development harder and is likely to introduce
locking bugs.

> The degradation of big lock will take several steps, including to
> introduce device's private lock. Till then, when the device add path
> from iothread and the remove path in io-dispatch is out of the big
> qemu lock.  We need this extra lock.
> 
> These series is too big, so I send out the 1st phase for review.

Even the first phase is too big.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 11/15] lock: introduce global lock for device tree
@ 2012-08-09  8:31         ` Avi Kivity
  0 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-09  8:31 UTC (permalink / raw)
  To: liu ping fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On 08/09/2012 10:27 AM, liu ping fan wrote:
> On Wed, Aug 8, 2012 at 5:42 PM, Avi Kivity <avi@redhat.com> wrote:
>> On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>
>>
>> Please explain the motivation.  AFAICT, the big qemu lock is sufficient.
>>
> Oh, this is one of the series locks for the removal of big qemu lock.

Why do you want to remove the big qemu lock?

Even now it is not heavily contended.  We should focus on fixing the
cases where is it contended, instead of removing it completely, which is
sure to make further development harder and is likely to introduce
locking bugs.

> The degradation of big lock will take several steps, including to
> introduce device's private lock. Till then, when the device add path
> from iothread and the remove path in io-dispatch is out of the big
> qemu lock.  We need this extra lock.
> 
> These series is too big, so I send out the 1st phase for review.

Even the first phase is too big.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 06/15] memory: use refcnt to manage MemoryRegion
  2012-08-09  7:27       ` [Qemu-devel] " liu ping fan
@ 2012-08-09  8:38         ` Avi Kivity
  -1 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-09  8:38 UTC (permalink / raw)
  To: liu ping fan
  Cc: qemu-devel, kvm, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber

On 08/09/2012 10:27 AM, liu ping fan wrote:
> On Wed, Aug 8, 2012 at 5:20 PM, Avi Kivity <avi@redhat.com> wrote:
>> On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>
>>> Using refcnt for mr, so we can separate mr's life cycle management
>>> from refered object.
>>>   When mr->ref 0->1, inc the refered object.
>>>   When mr->ref 1->0, dec the refered object.
>>>
>>> The refered object can be DeviceStae, another mr, or other opaque.
>>
>> Please explain the motivation more fully.
>>
> Actually, the aim is to mange the reference of an object, used by mem view.
> DeviceState can be referred by different system, when it comes to the
> view of subsystem, we hold dev's ref. And any indirect reference will
> just mr->ref++, not dev's.
> This can help us avoid the down-walk through the referred chain, like
> alias----> mr ---> DeviceState.

That is a lot of complexity, for no gain.  Manipulating memory regions
is a slow path, and can be done under the bit qemu lock without any
complications.

> 
> In the previous discussion, you have suggest add dev->ref++ in
> core_region_add.  But I think, if we can move it to higher layer --
> memory_region_{add,del}_subregion, so we can avoid to duplicate do
> this in other xx_region_add.

Why would other memory listeners be impacted?  They all operate under
the big qemu lock.  If they start using devices outside the lock, then
they need to take a reference.

> As a payment for this, we need to handle alias which can be avoid at
> core_region_add().  And mr's ref can help to avoid
>  the down-walk.

The payment is two systems of reference counts.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 06/15] memory: use refcnt to manage MemoryRegion
@ 2012-08-09  8:38         ` Avi Kivity
  0 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-09  8:38 UTC (permalink / raw)
  To: liu ping fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On 08/09/2012 10:27 AM, liu ping fan wrote:
> On Wed, Aug 8, 2012 at 5:20 PM, Avi Kivity <avi@redhat.com> wrote:
>> On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>
>>> Using refcnt for mr, so we can separate mr's life cycle management
>>> from refered object.
>>>   When mr->ref 0->1, inc the refered object.
>>>   When mr->ref 1->0, dec the refered object.
>>>
>>> The refered object can be DeviceStae, another mr, or other opaque.
>>
>> Please explain the motivation more fully.
>>
> Actually, the aim is to mange the reference of an object, used by mem view.
> DeviceState can be referred by different system, when it comes to the
> view of subsystem, we hold dev's ref. And any indirect reference will
> just mr->ref++, not dev's.
> This can help us avoid the down-walk through the referred chain, like
> alias----> mr ---> DeviceState.

That is a lot of complexity, for no gain.  Manipulating memory regions
is a slow path, and can be done under the bit qemu lock without any
complications.

> 
> In the previous discussion, you have suggest add dev->ref++ in
> core_region_add.  But I think, if we can move it to higher layer --
> memory_region_{add,del}_subregion, so we can avoid to duplicate do
> this in other xx_region_add.

Why would other memory listeners be impacted?  They all operate under
the big qemu lock.  If they start using devices outside the lock, then
they need to take a reference.

> As a payment for this, we need to handle alias which can be avoid at
> core_region_add().  And mr's ref can help to avoid
>  the down-walk.

The payment is two systems of reference counts.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 04/15] memory: MemoryRegion topology must be stable when updating
  2012-08-09  7:28       ` [Qemu-devel] " liu ping fan
@ 2012-08-09 17:09         ` Blue Swirl
  -1 siblings, 0 replies; 154+ messages in thread
From: Blue Swirl @ 2012-08-09 17:09 UTC (permalink / raw)
  To: liu ping fan
  Cc: qemu-devel, kvm, Anthony Liguori, Avi Kivity, Jan Kiszka,
	Marcelo Tosatti, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Thu, Aug 9, 2012 at 7:28 AM, liu ping fan <qemulist@gmail.com> wrote:
> On Thu, Aug 9, 2012 at 3:17 AM, Blue Swirl <blauwirbel@gmail.com> wrote:
>> On Wed, Aug 8, 2012 at 6:25 AM, Liu Ping Fan <qemulist@gmail.com> wrote:
>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>
>>> Using mem_map_lock to protect among updaters. So we can get the intact
>>> snapshot of mem topology -- FlatView & radix-tree.
>>>
>>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>> ---
>>>  exec.c   |    3 +++
>>>  memory.c |   22 ++++++++++++++++++++++
>>>  memory.h |    2 ++
>>>  3 files changed, 27 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/exec.c b/exec.c
>>> index 8244d54..0e29ef9 100644
>>> --- a/exec.c
>>> +++ b/exec.c
>>> @@ -210,6 +210,8 @@ static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
>>>     The bottom level has pointers to MemoryRegionSections.  */
>>>  static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
>>>
>>> +QemuMutex mem_map_lock;
>>> +
>>>  static void io_mem_init(void);
>>>  static void memory_map_init(void);
>>>
>>> @@ -637,6 +639,7 @@ void cpu_exec_init_all(void)
>>>  #if !defined(CONFIG_USER_ONLY)
>>>      memory_map_init();
>>>      io_mem_init();
>>> +    qemu_mutex_init(&mem_map_lock);
>>
>> I'd move this and the mutex to memory.c since there are no other uses.
>> The mutex could be static then.
>>
> But the init entry is in exec.c, not memory.c.

Memory subsystem does not have an init function of its own, this can
be the start of it.

>
> Regards,
> pingfan
>
>>>  #endif
>>>  }
>>>
>>> diff --git a/memory.c b/memory.c
>>> index aab4a31..5986532 100644
>>> --- a/memory.c
>>> +++ b/memory.c
>>> @@ -761,7 +761,9 @@ void memory_region_transaction_commit(void)
>>>      assert(memory_region_transaction_depth);
>>>      --memory_region_transaction_depth;
>>>      if (!memory_region_transaction_depth && memory_region_update_pending) {
>>> +        qemu_mutex_lock(&mem_map_lock);
>>>          memory_region_update_topology(NULL);
>>> +        qemu_mutex_unlock(&mem_map_lock);
>>>      }
>>>  }
>>>
>>> @@ -1069,8 +1071,10 @@ void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
>>>  {
>>>      uint8_t mask = 1 << client;
>>>
>>> +    qemu_mutex_lock(&mem_map_lock);
>>>      mr->dirty_log_mask = (mr->dirty_log_mask & ~mask) | (log * mask);
>>>      memory_region_update_topology(mr);
>>> +    qemu_mutex_unlock(&mem_map_lock);
>>>  }
>>>
>>>  bool memory_region_get_dirty(MemoryRegion *mr, target_phys_addr_t addr,
>>> @@ -1103,8 +1107,10 @@ void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
>>>  void memory_region_set_readonly(MemoryRegion *mr, bool readonly)
>>>  {
>>>      if (mr->readonly != readonly) {
>>> +        qemu_mutex_lock(&mem_map_lock);
>>>          mr->readonly = readonly;
>>>          memory_region_update_topology(mr);
>>> +        qemu_mutex_unlock(&mem_map_lock);
>>>      }
>>>  }
>>>
>>> @@ -1112,7 +1118,9 @@ void memory_region_rom_device_set_readable(MemoryRegion *mr, bool readable)
>>>  {
>>>      if (mr->readable != readable) {
>>>          mr->readable = readable;
>>> +        qemu_mutex_lock(&mem_map_lock);
>>>          memory_region_update_topology(mr);
>>> +        qemu_mutex_unlock(&mem_map_lock);
>>>      }
>>>  }
>>>
>>> @@ -1206,6 +1214,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
>>>      };
>>>      unsigned i;
>>>
>>> +    qemu_mutex_lock(&mem_map_lock);
>>>      for (i = 0; i < mr->ioeventfd_nb; ++i) {
>>>          if (memory_region_ioeventfd_before(mrfd, mr->ioeventfds[i])) {
>>>              break;
>>> @@ -1218,6 +1227,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
>>>              sizeof(*mr->ioeventfds) * (mr->ioeventfd_nb-1 - i));
>>>      mr->ioeventfds[i] = mrfd;
>>>      memory_region_update_topology(mr);
>>> +    qemu_mutex_unlock(&mem_map_lock);
>>>  }
>>>
>>>  void memory_region_del_eventfd(MemoryRegion *mr,
>>> @@ -1236,6 +1246,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
>>>      };
>>>      unsigned i;
>>>
>>> +    qemu_mutex_lock(&mem_map_lock);
>>>      for (i = 0; i < mr->ioeventfd_nb; ++i) {
>>>          if (memory_region_ioeventfd_equal(mrfd, mr->ioeventfds[i])) {
>>>              break;
>>> @@ -1248,6 +1259,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
>>>      mr->ioeventfds = g_realloc(mr->ioeventfds,
>>>                                    sizeof(*mr->ioeventfds)*mr->ioeventfd_nb + 1);
>>>      memory_region_update_topology(mr);
>>> +    qemu_mutex_unlock(&mem_map_lock);
>>>  }
>>>
>>>  static void memory_region_add_subregion_common(MemoryRegion *mr,
>>> @@ -1259,6 +1271,8 @@ static void memory_region_add_subregion_common(MemoryRegion *mr,
>>>      assert(!subregion->parent);
>>>      subregion->parent = mr;
>>>      subregion->addr = offset;
>>> +
>>> +    qemu_mutex_lock(&mem_map_lock);
>>>      QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
>>>          if (subregion->may_overlap || other->may_overlap) {
>>>              continue;
>>> @@ -1289,6 +1303,7 @@ static void memory_region_add_subregion_common(MemoryRegion *mr,
>>>      QTAILQ_INSERT_TAIL(&mr->subregions, subregion, subregions_link);
>>>  done:
>>>      memory_region_update_topology(mr);
>>> +    qemu_mutex_unlock(&mem_map_lock);
>>>  }
>>>
>>>
>>> @@ -1316,8 +1331,11 @@ void memory_region_del_subregion(MemoryRegion *mr,
>>>  {
>>>      assert(subregion->parent == mr);
>>>      subregion->parent = NULL;
>>> +
>>> +    qemu_mutex_lock(&mem_map_lock);
>>>      QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
>>>      memory_region_update_topology(mr);
>>> +    qemu_mutex_unlock(&mem_map_lock);
>>>  }
>>>
>>>  void memory_region_set_enabled(MemoryRegion *mr, bool enabled)
>>> @@ -1325,8 +1343,10 @@ void memory_region_set_enabled(MemoryRegion *mr, bool enabled)
>>>      if (enabled == mr->enabled) {
>>>          return;
>>>      }
>>> +    qemu_mutex_lock(&mem_map_lock);
>>>      mr->enabled = enabled;
>>>      memory_region_update_topology(NULL);
>>> +    qemu_mutex_unlock(&mem_map_lock);
>>>  }
>>>
>>>  void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr)
>>> @@ -1361,7 +1381,9 @@ void memory_region_set_alias_offset(MemoryRegion *mr, target_phys_addr_t offset)
>>>          return;
>>>      }
>>>
>>> +    qemu_mutex_lock(&mem_map_lock);
>>>      memory_region_update_topology(mr);
>>> +    qemu_mutex_unlock(&mem_map_lock);
>>>  }
>>>
>>>  ram_addr_t memory_region_get_ram_addr(MemoryRegion *mr)
>>> diff --git a/memory.h b/memory.h
>>> index 740c48e..fe6aefa 100644
>>> --- a/memory.h
>>> +++ b/memory.h
>>> @@ -25,6 +25,7 @@
>>>  #include "iorange.h"
>>>  #include "ioport.h"
>>>  #include "int128.h"
>>> +#include "qemu-thread.h"
>>>
>>>  typedef struct MemoryRegionOps MemoryRegionOps;
>>>  typedef struct MemoryRegion MemoryRegion;
>>> @@ -207,6 +208,7 @@ struct MemoryListener {
>>>      QTAILQ_ENTRY(MemoryListener) link;
>>>  };
>>>
>>> +extern QemuMutex mem_map_lock;
>>>  /**
>>>   * memory_region_init: Initialize a memory region
>>>   *
>>> --
>>> 1.7.4.4
>>>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 04/15] memory: MemoryRegion topology must be stable when updating
@ 2012-08-09 17:09         ` Blue Swirl
  0 siblings, 0 replies; 154+ messages in thread
From: Blue Swirl @ 2012-08-09 17:09 UTC (permalink / raw)
  To: liu ping fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Thu, Aug 9, 2012 at 7:28 AM, liu ping fan <qemulist@gmail.com> wrote:
> On Thu, Aug 9, 2012 at 3:17 AM, Blue Swirl <blauwirbel@gmail.com> wrote:
>> On Wed, Aug 8, 2012 at 6:25 AM, Liu Ping Fan <qemulist@gmail.com> wrote:
>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>
>>> Using mem_map_lock to protect among updaters. So we can get the intact
>>> snapshot of mem topology -- FlatView & radix-tree.
>>>
>>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>> ---
>>>  exec.c   |    3 +++
>>>  memory.c |   22 ++++++++++++++++++++++
>>>  memory.h |    2 ++
>>>  3 files changed, 27 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/exec.c b/exec.c
>>> index 8244d54..0e29ef9 100644
>>> --- a/exec.c
>>> +++ b/exec.c
>>> @@ -210,6 +210,8 @@ static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
>>>     The bottom level has pointers to MemoryRegionSections.  */
>>>  static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
>>>
>>> +QemuMutex mem_map_lock;
>>> +
>>>  static void io_mem_init(void);
>>>  static void memory_map_init(void);
>>>
>>> @@ -637,6 +639,7 @@ void cpu_exec_init_all(void)
>>>  #if !defined(CONFIG_USER_ONLY)
>>>      memory_map_init();
>>>      io_mem_init();
>>> +    qemu_mutex_init(&mem_map_lock);
>>
>> I'd move this and the mutex to memory.c since there are no other uses.
>> The mutex could be static then.
>>
> But the init entry is in exec.c, not memory.c.

Memory subsystem does not have an init function of its own, this can
be the start of it.

>
> Regards,
> pingfan
>
>>>  #endif
>>>  }
>>>
>>> diff --git a/memory.c b/memory.c
>>> index aab4a31..5986532 100644
>>> --- a/memory.c
>>> +++ b/memory.c
>>> @@ -761,7 +761,9 @@ void memory_region_transaction_commit(void)
>>>      assert(memory_region_transaction_depth);
>>>      --memory_region_transaction_depth;
>>>      if (!memory_region_transaction_depth && memory_region_update_pending) {
>>> +        qemu_mutex_lock(&mem_map_lock);
>>>          memory_region_update_topology(NULL);
>>> +        qemu_mutex_unlock(&mem_map_lock);
>>>      }
>>>  }
>>>
>>> @@ -1069,8 +1071,10 @@ void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
>>>  {
>>>      uint8_t mask = 1 << client;
>>>
>>> +    qemu_mutex_lock(&mem_map_lock);
>>>      mr->dirty_log_mask = (mr->dirty_log_mask & ~mask) | (log * mask);
>>>      memory_region_update_topology(mr);
>>> +    qemu_mutex_unlock(&mem_map_lock);
>>>  }
>>>
>>>  bool memory_region_get_dirty(MemoryRegion *mr, target_phys_addr_t addr,
>>> @@ -1103,8 +1107,10 @@ void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
>>>  void memory_region_set_readonly(MemoryRegion *mr, bool readonly)
>>>  {
>>>      if (mr->readonly != readonly) {
>>> +        qemu_mutex_lock(&mem_map_lock);
>>>          mr->readonly = readonly;
>>>          memory_region_update_topology(mr);
>>> +        qemu_mutex_unlock(&mem_map_lock);
>>>      }
>>>  }
>>>
>>> @@ -1112,7 +1118,9 @@ void memory_region_rom_device_set_readable(MemoryRegion *mr, bool readable)
>>>  {
>>>      if (mr->readable != readable) {
>>>          mr->readable = readable;
>>> +        qemu_mutex_lock(&mem_map_lock);
>>>          memory_region_update_topology(mr);
>>> +        qemu_mutex_unlock(&mem_map_lock);
>>>      }
>>>  }
>>>
>>> @@ -1206,6 +1214,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
>>>      };
>>>      unsigned i;
>>>
>>> +    qemu_mutex_lock(&mem_map_lock);
>>>      for (i = 0; i < mr->ioeventfd_nb; ++i) {
>>>          if (memory_region_ioeventfd_before(mrfd, mr->ioeventfds[i])) {
>>>              break;
>>> @@ -1218,6 +1227,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
>>>              sizeof(*mr->ioeventfds) * (mr->ioeventfd_nb-1 - i));
>>>      mr->ioeventfds[i] = mrfd;
>>>      memory_region_update_topology(mr);
>>> +    qemu_mutex_unlock(&mem_map_lock);
>>>  }
>>>
>>>  void memory_region_del_eventfd(MemoryRegion *mr,
>>> @@ -1236,6 +1246,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
>>>      };
>>>      unsigned i;
>>>
>>> +    qemu_mutex_lock(&mem_map_lock);
>>>      for (i = 0; i < mr->ioeventfd_nb; ++i) {
>>>          if (memory_region_ioeventfd_equal(mrfd, mr->ioeventfds[i])) {
>>>              break;
>>> @@ -1248,6 +1259,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
>>>      mr->ioeventfds = g_realloc(mr->ioeventfds,
>>>                                    sizeof(*mr->ioeventfds)*mr->ioeventfd_nb + 1);
>>>      memory_region_update_topology(mr);
>>> +    qemu_mutex_unlock(&mem_map_lock);
>>>  }
>>>
>>>  static void memory_region_add_subregion_common(MemoryRegion *mr,
>>> @@ -1259,6 +1271,8 @@ static void memory_region_add_subregion_common(MemoryRegion *mr,
>>>      assert(!subregion->parent);
>>>      subregion->parent = mr;
>>>      subregion->addr = offset;
>>> +
>>> +    qemu_mutex_lock(&mem_map_lock);
>>>      QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
>>>          if (subregion->may_overlap || other->may_overlap) {
>>>              continue;
>>> @@ -1289,6 +1303,7 @@ static void memory_region_add_subregion_common(MemoryRegion *mr,
>>>      QTAILQ_INSERT_TAIL(&mr->subregions, subregion, subregions_link);
>>>  done:
>>>      memory_region_update_topology(mr);
>>> +    qemu_mutex_unlock(&mem_map_lock);
>>>  }
>>>
>>>
>>> @@ -1316,8 +1331,11 @@ void memory_region_del_subregion(MemoryRegion *mr,
>>>  {
>>>      assert(subregion->parent == mr);
>>>      subregion->parent = NULL;
>>> +
>>> +    qemu_mutex_lock(&mem_map_lock);
>>>      QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
>>>      memory_region_update_topology(mr);
>>> +    qemu_mutex_unlock(&mem_map_lock);
>>>  }
>>>
>>>  void memory_region_set_enabled(MemoryRegion *mr, bool enabled)
>>> @@ -1325,8 +1343,10 @@ void memory_region_set_enabled(MemoryRegion *mr, bool enabled)
>>>      if (enabled == mr->enabled) {
>>>          return;
>>>      }
>>> +    qemu_mutex_lock(&mem_map_lock);
>>>      mr->enabled = enabled;
>>>      memory_region_update_topology(NULL);
>>> +    qemu_mutex_unlock(&mem_map_lock);
>>>  }
>>>
>>>  void memory_region_set_address(MemoryRegion *mr, target_phys_addr_t addr)
>>> @@ -1361,7 +1381,9 @@ void memory_region_set_alias_offset(MemoryRegion *mr, target_phys_addr_t offset)
>>>          return;
>>>      }
>>>
>>> +    qemu_mutex_lock(&mem_map_lock);
>>>      memory_region_update_topology(mr);
>>> +    qemu_mutex_unlock(&mem_map_lock);
>>>  }
>>>
>>>  ram_addr_t memory_region_get_ram_addr(MemoryRegion *mr)
>>> diff --git a/memory.h b/memory.h
>>> index 740c48e..fe6aefa 100644
>>> --- a/memory.h
>>> +++ b/memory.h
>>> @@ -25,6 +25,7 @@
>>>  #include "iorange.h"
>>>  #include "ioport.h"
>>>  #include "int128.h"
>>> +#include "qemu-thread.h"
>>>
>>>  typedef struct MemoryRegionOps MemoryRegionOps;
>>>  typedef struct MemoryRegion MemoryRegion;
>>> @@ -207,6 +208,7 @@ struct MemoryListener {
>>>      QTAILQ_ENTRY(MemoryListener) link;
>>>  };
>>>
>>> +extern QemuMutex mem_map_lock;
>>>  /**
>>>   * memory_region_init: Initialize a memory region
>>>   *
>>> --
>>> 1.7.4.4
>>>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 13/15] hotplug: introduce qdev_unplug_complete() to remove device from views
  2012-08-09  8:00         ` [Qemu-devel] " Paolo Bonzini
@ 2012-08-10  6:42           ` liu ping fan
  -1 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-10  6:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, kvm, Anthony Liguori, Avi Kivity, Jan Kiszka,
	Marcelo Tosatti, Stefan Hajnoczi, Blue Swirl,
	Andreas Färber

On Thu, Aug 9, 2012 at 4:00 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 09/08/2012 09:28, liu ping fan ha scritto:
>>> >     VCPU thread                    I/O thread
>>> > =====================================================================
>>> >     get MMIO request
>>> >     rcu_read_lock()
>>> >     walk memory map
>>> >                                    qdev_unmap()
>>> >                                    lock_devtree()
>>> >                                    ...
>>> >                                    unlock_devtree
>>> >                                    unref dev -> refcnt=0, free enqueued
>>> >     ref()
>> No ref() for dev here, while we have ref to flatview+radix in my patches.
>> I use rcu to protect radix+flatview+mr refered. As to dev, its ref has
>> inc when it is added into mem view -- that is
>> memory_region_add_subregion -> memory_region_get() {
>> if(atomic_add_and_return()) dev->ref++  }.
>> So not until reclaimer of mem view, the dev's ref is hold by mem view.
>> In a short word, rcu protect mem view, while device is protected by refcnt.
>
> But the RCU critical section should not include the whole processing of
> MMIO, only the walk of the memory map.
>
Yes, you are right.  And I think cur_map_get() can be broken into the
style "lock,  ref++, phys_page_find(); unlock".  easily.

> And in general I think this is a bit too tricky... I understand not
> adding refcounting to all of bottom halves, timers, etc., but if you are
> using a device you should have explicit ref/unref pairs.
>
Actually, there are pairs -- when dev enter mem view, inc ref; and
when it leave, dec ref.
But as Avi has pointed out, the mr->refcnt introduce complicate and no
gain. So I will discard this design

Thanks and regards,
pingfan

> Paolo

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 13/15] hotplug: introduce qdev_unplug_complete() to remove device from views
@ 2012-08-10  6:42           ` liu ping fan
  0 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-10  6:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

On Thu, Aug 9, 2012 at 4:00 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 09/08/2012 09:28, liu ping fan ha scritto:
>>> >     VCPU thread                    I/O thread
>>> > =====================================================================
>>> >     get MMIO request
>>> >     rcu_read_lock()
>>> >     walk memory map
>>> >                                    qdev_unmap()
>>> >                                    lock_devtree()
>>> >                                    ...
>>> >                                    unlock_devtree
>>> >                                    unref dev -> refcnt=0, free enqueued
>>> >     ref()
>> No ref() for dev here, while we have ref to flatview+radix in my patches.
>> I use rcu to protect radix+flatview+mr refered. As to dev, its ref has
>> inc when it is added into mem view -- that is
>> memory_region_add_subregion -> memory_region_get() {
>> if(atomic_add_and_return()) dev->ref++  }.
>> So not until reclaimer of mem view, the dev's ref is hold by mem view.
>> In a short word, rcu protect mem view, while device is protected by refcnt.
>
> But the RCU critical section should not include the whole processing of
> MMIO, only the walk of the memory map.
>
Yes, you are right.  And I think cur_map_get() can be broken into the
style "lock,  ref++, phys_page_find(); unlock".  easily.

> And in general I think this is a bit too tricky... I understand not
> adding refcounting to all of bottom halves, timers, etc., but if you are
> using a device you should have explicit ref/unref pairs.
>
Actually, there are pairs -- when dev enter mem view, inc ref; and
when it leave, dec ref.
But as Avi has pointed out, the mr->refcnt introduce complicate and no
gain. So I will discard this design

Thanks and regards,
pingfan

> Paolo

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 15/15] e1000: using new interface--unmap to unplug
  2012-08-09  7:40         ` [Qemu-devel] " Paolo Bonzini
@ 2012-08-10  6:43           ` liu ping fan
  -1 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-10  6:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, kvm, Anthony Liguori, Avi Kivity, Jan Kiszka,
	Marcelo Tosatti, Stefan Hajnoczi, Blue Swirl,
	Andreas Färber

On Thu, Aug 9, 2012 at 3:40 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 09/08/2012 09:28, liu ping fan ha scritto:
>>>> >> +static void
>>>> >> +pci_e1000_unmap(PCIDevice *p)
>>>> >> +{
>>>> >> +    /* DO NOT FREE anything!until refcnt=0 */
>>>> >> +    /* isolate from memory view */
>>>> >> +}
>>> >
>>> > At least you need to call the superclass method.
>>> >
>> Refer to  0013-hotplug-introduce-qdev_unplug_complete-to-remove-dev.patch,
>> we have the following sequence
>> qdev_unmap->pci_unmap_device->pci_e1000_unmap.  So pci_e1000_unmap
>> need not to do anything.
>
> But then this patch is unnecessary, isn't it?
>
Yes,  should remove from next version

Regards,
pingfan


> Paolo
>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 15/15] e1000: using new interface--unmap to unplug
@ 2012-08-10  6:43           ` liu ping fan
  0 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-10  6:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

On Thu, Aug 9, 2012 at 3:40 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 09/08/2012 09:28, liu ping fan ha scritto:
>>>> >> +static void
>>>> >> +pci_e1000_unmap(PCIDevice *p)
>>>> >> +{
>>>> >> +    /* DO NOT FREE anything!until refcnt=0 */
>>>> >> +    /* isolate from memory view */
>>>> >> +}
>>> >
>>> > At least you need to call the superclass method.
>>> >
>> Refer to  0013-hotplug-introduce-qdev_unplug_complete-to-remove-dev.patch,
>> we have the following sequence
>> qdev_unmap->pci_unmap_device->pci_e1000_unmap.  So pci_e1000_unmap
>> need not to do anything.
>
> But then this patch is unnecessary, isn't it?
>
Yes,  should remove from next version

Regards,
pingfan


> Paolo
>

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 03/15] qom: introduce reclaimer to release obj
  2012-08-09  8:18               ` [Qemu-devel] " Avi Kivity
@ 2012-08-10  6:43                 ` liu ping fan
  -1 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-10  6:43 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Paolo Bonzini, kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

On Thu, Aug 9, 2012 at 4:18 PM, Avi Kivity <avi@redhat.com> wrote:
> On 08/09/2012 10:49 AM, Paolo Bonzini wrote:
>> Il 09/08/2012 09:33, liu ping fan ha scritto:
>>> Yes, it is to defer destructors.
>>> See 0009-memory-prepare-flatview-and-radix-tree-for-rcu-style.patch
>>> When MemoryRegion is _del_subregion from mem in updater, it may be
>>> still in use by reader -- radix or flatview, so defer its destructors
>>> to the reclaimer --phys_map_release(PhysMap *map)
>>
>> How are you sure that the reader is already out of its critical section
>> by the time the reclaimer runs?
>>
>>> If we have rcu, it could be elegant to do this.
>>
>> Yeah, I think inventing primitives is dangerous and difficult to review;
>> and it may be difficult to replace it with proper call_rcu.
>>
>> You should probably make a proof-of-concept using liburcu.  Then we can
>> decide how to implement RCU in a way that is portable enough for QEMU's
>> needs.
>
> IMO we should start with a simple mutex (which will cover only the
> lookup and map rebuild).  This should reduce the contention to basically
> nothing (still leaving a cache line bounce).  If a profile shows the
> cache line bounce hurting us, or perhaps contention in ultralarge
> guests, then we should switch to rcu.
>
Agree, I think this will pin us on the major issue -- mmio perfermance

Regards,
pingfan
>
> --
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 03/15] qom: introduce reclaimer to release obj
@ 2012-08-10  6:43                 ` liu ping fan
  0 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-10  6:43 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Thu, Aug 9, 2012 at 4:18 PM, Avi Kivity <avi@redhat.com> wrote:
> On 08/09/2012 10:49 AM, Paolo Bonzini wrote:
>> Il 09/08/2012 09:33, liu ping fan ha scritto:
>>> Yes, it is to defer destructors.
>>> See 0009-memory-prepare-flatview-and-radix-tree-for-rcu-style.patch
>>> When MemoryRegion is _del_subregion from mem in updater, it may be
>>> still in use by reader -- radix or flatview, so defer its destructors
>>> to the reclaimer --phys_map_release(PhysMap *map)
>>
>> How are you sure that the reader is already out of its critical section
>> by the time the reclaimer runs?
>>
>>> If we have rcu, it could be elegant to do this.
>>
>> Yeah, I think inventing primitives is dangerous and difficult to review;
>> and it may be difficult to replace it with proper call_rcu.
>>
>> You should probably make a proof-of-concept using liburcu.  Then we can
>> decide how to implement RCU in a way that is portable enough for QEMU's
>> needs.
>
> IMO we should start with a simple mutex (which will cover only the
> lookup and map rebuild).  This should reduce the contention to basically
> nothing (still leaving a cache line bounce).  If a profile shows the
> cache line bounce hurting us, or perhaps contention in ultralarge
> guests, then we should switch to rcu.
>
Agree, I think this will pin us on the major issue -- mmio perfermance

Regards,
pingfan
>
> --
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 06/15] memory: use refcnt to manage MemoryRegion
  2012-08-09  8:38         ` [Qemu-devel] " Avi Kivity
@ 2012-08-10  6:44           ` liu ping fan
  -1 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-10  6:44 UTC (permalink / raw)
  To: Avi Kivity
  Cc: qemu-devel, kvm, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber

On Thu, Aug 9, 2012 at 4:38 PM, Avi Kivity <avi@redhat.com> wrote:
> On 08/09/2012 10:27 AM, liu ping fan wrote:
>> On Wed, Aug 8, 2012 at 5:20 PM, Avi Kivity <avi@redhat.com> wrote:
>>> On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
>>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>>
>>>> Using refcnt for mr, so we can separate mr's life cycle management
>>>> from refered object.
>>>>   When mr->ref 0->1, inc the refered object.
>>>>   When mr->ref 1->0, dec the refered object.
>>>>
>>>> The refered object can be DeviceStae, another mr, or other opaque.
>>>
>>> Please explain the motivation more fully.
>>>
>> Actually, the aim is to mange the reference of an object, used by mem view.
>> DeviceState can be referred by different system, when it comes to the
>> view of subsystem, we hold dev's ref. And any indirect reference will
>> just mr->ref++, not dev's.
>> This can help us avoid the down-walk through the referred chain, like
>> alias----> mr ---> DeviceState.
>
> That is a lot of complexity, for no gain.  Manipulating memory regions
> is a slow path, and can be done under the bit qemu lock without any
> complications.
>
OK. I will discard this design.
>>
>> In the previous discussion, you have suggest add dev->ref++ in
>> core_region_add.  But I think, if we can move it to higher layer --
>> memory_region_{add,del}_subregion, so we can avoid to duplicate do
>> this in other xx_region_add.
>
> Why would other memory listeners be impacted?  They all operate under
> the big qemu lock.  If they start using devices outside the lock, then
> they need to take a reference.
>
Yes, if unplug path in the protection of big lock.
And just one extra question, for ram-unplug scene, how do we protect from:
  updater:  ram-unplug -->qemu free() --> brk() invalidate this vaddr interval
  reader:  vhost-thread copy data from the interval
I guess something like lock/ref used by them, but can not find such
mechanism in vhost_set_memory() to protect the scene against
vhost_worker()

Thanks and regards,
pingfan

>> As a payment for this, we need to handle alias which can be avoid at
>> core_region_add().  And mr's ref can help to avoid
>>  the down-walk.
>
> The payment is two systems of reference counts.
>
> --
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 06/15] memory: use refcnt to manage MemoryRegion
@ 2012-08-10  6:44           ` liu ping fan
  0 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-10  6:44 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Thu, Aug 9, 2012 at 4:38 PM, Avi Kivity <avi@redhat.com> wrote:
> On 08/09/2012 10:27 AM, liu ping fan wrote:
>> On Wed, Aug 8, 2012 at 5:20 PM, Avi Kivity <avi@redhat.com> wrote:
>>> On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
>>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>>
>>>> Using refcnt for mr, so we can separate mr's life cycle management
>>>> from refered object.
>>>>   When mr->ref 0->1, inc the refered object.
>>>>   When mr->ref 1->0, dec the refered object.
>>>>
>>>> The refered object can be DeviceStae, another mr, or other opaque.
>>>
>>> Please explain the motivation more fully.
>>>
>> Actually, the aim is to mange the reference of an object, used by mem view.
>> DeviceState can be referred by different system, when it comes to the
>> view of subsystem, we hold dev's ref. And any indirect reference will
>> just mr->ref++, not dev's.
>> This can help us avoid the down-walk through the referred chain, like
>> alias----> mr ---> DeviceState.
>
> That is a lot of complexity, for no gain.  Manipulating memory regions
> is a slow path, and can be done under the bit qemu lock without any
> complications.
>
OK. I will discard this design.
>>
>> In the previous discussion, you have suggest add dev->ref++ in
>> core_region_add.  But I think, if we can move it to higher layer --
>> memory_region_{add,del}_subregion, so we can avoid to duplicate do
>> this in other xx_region_add.
>
> Why would other memory listeners be impacted?  They all operate under
> the big qemu lock.  If they start using devices outside the lock, then
> they need to take a reference.
>
Yes, if unplug path in the protection of big lock.
And just one extra question, for ram-unplug scene, how do we protect from:
  updater:  ram-unplug -->qemu free() --> brk() invalidate this vaddr interval
  reader:  vhost-thread copy data from the interval
I guess something like lock/ref used by them, but can not find such
mechanism in vhost_set_memory() to protect the scene against
vhost_worker()

Thanks and regards,
pingfan

>> As a payment for this, we need to handle alias which can be avoid at
>> core_region_add().  And mr's ref can help to avoid
>>  the down-walk.
>
> The payment is two systems of reference counts.
>
> --
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 04/15] memory: MemoryRegion topology must be stable when updating
  2012-08-09  8:24         ` [Qemu-devel] " Avi Kivity
@ 2012-08-10  6:44           ` liu ping fan
  -1 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-10  6:44 UTC (permalink / raw)
  To: Avi Kivity
  Cc: qemu-devel, kvm, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber

On Thu, Aug 9, 2012 at 4:24 PM, Avi Kivity <avi@redhat.com> wrote:
> On 08/09/2012 10:28 AM, liu ping fan wrote:
>>>
>>> Seems to me that nothing in memory.c can susceptible to races.  It must
>>> already be called under the big qemu lock, and with the exception of
>>> mutators (memory_region_set_*), changes aren't directly visible.
>>>
>> Yes, what I want to do is "prepare unplug out of protection of global
>> lock".  When io-dispatch and mmio-dispatch are all out of big lock, we
>> will run into the following scene:
>>     In vcpu context A, qdev_unplug_complete()-> delete subregion;
>>     In context B, write pci bar --> pci mapping update    -> add subregion
>
> Why do you want unlocked unplug?  Unplug is rare and complicated; there
> are no performance considerations on one hand, and difficulty of testing
> for lock correctness on the other.  I think it is better if it remains
> protected by the global lock.
>
Oh, yes! I deviate quite far from the origin aim, and introduce some
unnecessary complicate.

>>
>>> I think it's sufficient to take the mem_map_lock at the beginning of
>>> core_begin() and drop it at the end of core_commit().  That means all
>>> updates of volatile state, phys_map, are protected.
>>>
>> The mem_map_lock is to protect both address_space_io and
>> address_space_memory. When without the protection of big lock,
>> competing will raise among the updaters
>> (memory_region_{add,del}_subregion and the readers
>> generate_memory_topology()->render_memory_region().
>
> These should all run under the big qemu lock, for the same reasons.
> They are rare and not performance sensitive.  Only phys_map reads are
> performance sensitive.
>
OK, I see. Leave the big lock as it is, except for mmio, we will not
worry about it.
>>
>> If just in core_begin/commit, we will duplicate it for
>> xx_begin/commit, right?
>
> No.  Other listeners will be protected by the global lock.
>
Yes, if leave the big lock as it is.
>> And at the same time, mr->subregions is
>> exposed under SMP without big lock.
>>
>
> Who accesses it?
>
Again, I assume the updaters out of the protection of the big lock

> IMO locking should look like:
>
>   phys_map: mem_map_lock
>   dispatch callbacks: device specific lock (or big qemu lock for
> unconverted devices)
>   everything else: big qemu lock
>
I See. Thank you for the review. And I will eliminate the unnecessary
complicate and effort for the next version

Regards,
pingfan
>
>
> --
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 04/15] memory: MemoryRegion topology must be stable when updating
@ 2012-08-10  6:44           ` liu ping fan
  0 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-10  6:44 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Thu, Aug 9, 2012 at 4:24 PM, Avi Kivity <avi@redhat.com> wrote:
> On 08/09/2012 10:28 AM, liu ping fan wrote:
>>>
>>> Seems to me that nothing in memory.c can susceptible to races.  It must
>>> already be called under the big qemu lock, and with the exception of
>>> mutators (memory_region_set_*), changes aren't directly visible.
>>>
>> Yes, what I want to do is "prepare unplug out of protection of global
>> lock".  When io-dispatch and mmio-dispatch are all out of big lock, we
>> will run into the following scene:
>>     In vcpu context A, qdev_unplug_complete()-> delete subregion;
>>     In context B, write pci bar --> pci mapping update    -> add subregion
>
> Why do you want unlocked unplug?  Unplug is rare and complicated; there
> are no performance considerations on one hand, and difficulty of testing
> for lock correctness on the other.  I think it is better if it remains
> protected by the global lock.
>
Oh, yes! I deviate quite far from the origin aim, and introduce some
unnecessary complicate.

>>
>>> I think it's sufficient to take the mem_map_lock at the beginning of
>>> core_begin() and drop it at the end of core_commit().  That means all
>>> updates of volatile state, phys_map, are protected.
>>>
>> The mem_map_lock is to protect both address_space_io and
>> address_space_memory. When without the protection of big lock,
>> competing will raise among the updaters
>> (memory_region_{add,del}_subregion and the readers
>> generate_memory_topology()->render_memory_region().
>
> These should all run under the big qemu lock, for the same reasons.
> They are rare and not performance sensitive.  Only phys_map reads are
> performance sensitive.
>
OK, I see. Leave the big lock as it is, except for mmio, we will not
worry about it.
>>
>> If just in core_begin/commit, we will duplicate it for
>> xx_begin/commit, right?
>
> No.  Other listeners will be protected by the global lock.
>
Yes, if leave the big lock as it is.
>> And at the same time, mr->subregions is
>> exposed under SMP without big lock.
>>
>
> Who accesses it?
>
Again, I assume the updaters out of the protection of the big lock

> IMO locking should look like:
>
>   phys_map: mem_map_lock
>   dispatch callbacks: device specific lock (or big qemu lock for
> unconverted devices)
>   everything else: big qemu lock
>
I See. Thank you for the review. And I will eliminate the unnecessary
complicate and effort for the next version

Regards,
pingfan
>
>
> --
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 09/15] memory: prepare flatview and radix-tree for rcu style access
  2012-08-08  9:41     ` [Qemu-devel] " Avi Kivity
@ 2012-08-11  1:58       ` liu ping fan
  -1 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-11  1:58 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Wed, Aug 8, 2012 at 5:41 PM, Avi Kivity <avi@redhat.com> wrote:
> On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> Flatview and radix view are all under the protection of pointer.
>> And this make sure the change of them seem to be atomic!
>>
>> The mr accessed by radix-tree leaf or flatview will be reclaimed
>> after the prev PhysMap not in use any longer
>>
>
> IMO this cleverness should come much later.  Let's first take care of
> dropping the big qemu lock, then make swithcing memory maps more efficient.
>
> The initial paths could look like:
>
>   lookup:
>      take mem_map_lock
>      lookup
>      take ref
>      drop mem_map_lock
>
>   update:
>      take mem_map_lock (in core_begin)
>      do updates
>      drop memo_map_lock
>
> Later we can replace mem_map_lock with either a rwlock or (real) rcu.
>
>
>>
>>  #if !defined(CONFIG_USER_ONLY)
>>
>> -static void phys_map_node_reserve(unsigned nodes)
>> +static void phys_map_node_reserve(PhysMap *map, unsigned nodes)
>>  {
>> -    if (phys_map_nodes_nb + nodes > phys_map_nodes_nb_alloc) {
>> +    if (map->phys_map_nodes_nb + nodes > map->phys_map_nodes_nb_alloc) {
>>          typedef PhysPageEntry Node[L2_SIZE];
>> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc * 2, 16);
>> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc,
>> -                                      phys_map_nodes_nb + nodes);
>> -        phys_map_nodes = g_renew(Node, phys_map_nodes,
>> -                                 phys_map_nodes_nb_alloc);
>> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc * 2,
>> +                                                                        16);
>> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc,
>> +                                      map->phys_map_nodes_nb + nodes);
>> +        map->phys_map_nodes = g_renew(Node, map->phys_map_nodes,
>> +                                 map->phys_map_nodes_nb_alloc);
>>      }
>>  }
>
> Please have a patch that just adds the map parameter to all these
> functions.  This makes the later patch, that adds the copy, easier to read.
>
>> +
>> +void cur_map_update(PhysMap *next)
>> +{
>> +    qemu_mutex_lock(&cur_map_lock);
>> +    physmap_put(cur_map);
>> +    cur_map = next;
>> +    smp_mb();
>> +    qemu_mutex_unlock(&cur_map_lock);
>> +}
>
> IMO this can be mem_map_lock.
>
> If we take my previous suggestion:
>
>   lookup:
>      take mem_map_lock
>      lookup
>      take ref
>      drop mem_map_lock
>
>   update:
>      take mem_map_lock (in core_begin)
>      do updates
>      drop memo_map_lock
>
> And update it to
>
>
>   update:
>      prepare next_map (in core_begin)
>      do updates
>      take mem_map_lock (in core_commit)
>      switch maps
>      drop mem_map_lock
>      free old map
>
>
> Note the lookup path copies the MemoryRegionSection instead of
> referencing it.  Thus we can destroy the old map without worrying; the
> only pointers will point to MemoryRegions, which will be protected by
> the refcounts on their Objects.
>
Just find there may be a leak here. If mrs points to subpage, then the
subpage_t  could be crashed by destroy.
To avoid such situation, we can walk down the chain to pin us on the
Object based mr, but then we must expose the address convert in
subpage_read() right here. Right?

Regards,
pingfan

> This can be easily switched to rcu:
>
>   update:
>      prepare next_map (in core_begin)
>      do updates
>      switch maps - rcu_assign_pointer
>      call_rcu(free old map) (or synchronize_rcu; free old maps)
>
> Again, this should be done after the simplictic patch that enables
> parallel lookup but keeps just one map.
>
>
>
> --
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 09/15] memory: prepare flatview and radix-tree for rcu style access
@ 2012-08-11  1:58       ` liu ping fan
  0 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-11  1:58 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Wed, Aug 8, 2012 at 5:41 PM, Avi Kivity <avi@redhat.com> wrote:
> On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> Flatview and radix view are all under the protection of pointer.
>> And this make sure the change of them seem to be atomic!
>>
>> The mr accessed by radix-tree leaf or flatview will be reclaimed
>> after the prev PhysMap not in use any longer
>>
>
> IMO this cleverness should come much later.  Let's first take care of
> dropping the big qemu lock, then make swithcing memory maps more efficient.
>
> The initial paths could look like:
>
>   lookup:
>      take mem_map_lock
>      lookup
>      take ref
>      drop mem_map_lock
>
>   update:
>      take mem_map_lock (in core_begin)
>      do updates
>      drop memo_map_lock
>
> Later we can replace mem_map_lock with either a rwlock or (real) rcu.
>
>
>>
>>  #if !defined(CONFIG_USER_ONLY)
>>
>> -static void phys_map_node_reserve(unsigned nodes)
>> +static void phys_map_node_reserve(PhysMap *map, unsigned nodes)
>>  {
>> -    if (phys_map_nodes_nb + nodes > phys_map_nodes_nb_alloc) {
>> +    if (map->phys_map_nodes_nb + nodes > map->phys_map_nodes_nb_alloc) {
>>          typedef PhysPageEntry Node[L2_SIZE];
>> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc * 2, 16);
>> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc,
>> -                                      phys_map_nodes_nb + nodes);
>> -        phys_map_nodes = g_renew(Node, phys_map_nodes,
>> -                                 phys_map_nodes_nb_alloc);
>> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc * 2,
>> +                                                                        16);
>> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc,
>> +                                      map->phys_map_nodes_nb + nodes);
>> +        map->phys_map_nodes = g_renew(Node, map->phys_map_nodes,
>> +                                 map->phys_map_nodes_nb_alloc);
>>      }
>>  }
>
> Please have a patch that just adds the map parameter to all these
> functions.  This makes the later patch, that adds the copy, easier to read.
>
>> +
>> +void cur_map_update(PhysMap *next)
>> +{
>> +    qemu_mutex_lock(&cur_map_lock);
>> +    physmap_put(cur_map);
>> +    cur_map = next;
>> +    smp_mb();
>> +    qemu_mutex_unlock(&cur_map_lock);
>> +}
>
> IMO this can be mem_map_lock.
>
> If we take my previous suggestion:
>
>   lookup:
>      take mem_map_lock
>      lookup
>      take ref
>      drop mem_map_lock
>
>   update:
>      take mem_map_lock (in core_begin)
>      do updates
>      drop memo_map_lock
>
> And update it to
>
>
>   update:
>      prepare next_map (in core_begin)
>      do updates
>      take mem_map_lock (in core_commit)
>      switch maps
>      drop mem_map_lock
>      free old map
>
>
> Note the lookup path copies the MemoryRegionSection instead of
> referencing it.  Thus we can destroy the old map without worrying; the
> only pointers will point to MemoryRegions, which will be protected by
> the refcounts on their Objects.
>
Just find there may be a leak here. If mrs points to subpage, then the
subpage_t  could be crashed by destroy.
To avoid such situation, we can walk down the chain to pin us on the
Object based mr, but then we must expose the address convert in
subpage_read() right here. Right?

Regards,
pingfan

> This can be easily switched to rcu:
>
>   update:
>      prepare next_map (in core_begin)
>      do updates
>      switch maps - rcu_assign_pointer
>      call_rcu(free old map) (or synchronize_rcu; free old maps)
>
> Again, this should be done after the simplictic patch that enables
> parallel lookup but keeps just one map.
>
>
>
> --
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 09/15] memory: prepare flatview and radix-tree for rcu style access
  2012-08-11  1:58       ` [Qemu-devel] " liu ping fan
@ 2012-08-11 10:06         ` liu ping fan
  -1 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-11 10:06 UTC (permalink / raw)
  To: Avi Kivity
  Cc: qemu-devel, kvm, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	Stefan Hajnoczi, Paolo Bonzini, Blue Swirl, Andreas Färber

On Sat, Aug 11, 2012 at 9:58 AM, liu ping fan <qemulist@gmail.com> wrote:
> On Wed, Aug 8, 2012 at 5:41 PM, Avi Kivity <avi@redhat.com> wrote:
>> On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>
>>> Flatview and radix view are all under the protection of pointer.
>>> And this make sure the change of them seem to be atomic!
>>>
>>> The mr accessed by radix-tree leaf or flatview will be reclaimed
>>> after the prev PhysMap not in use any longer
>>>
>>
>> IMO this cleverness should come much later.  Let's first take care of
>> dropping the big qemu lock, then make swithcing memory maps more efficient.
>>
>> The initial paths could look like:
>>
>>   lookup:
>>      take mem_map_lock
>>      lookup
>>      take ref
>>      drop mem_map_lock
>>
>>   update:
>>      take mem_map_lock (in core_begin)
>>      do updates
>>      drop memo_map_lock
>>
>> Later we can replace mem_map_lock with either a rwlock or (real) rcu.
>>
>>
>>>
>>>  #if !defined(CONFIG_USER_ONLY)
>>>
>>> -static void phys_map_node_reserve(unsigned nodes)
>>> +static void phys_map_node_reserve(PhysMap *map, unsigned nodes)
>>>  {
>>> -    if (phys_map_nodes_nb + nodes > phys_map_nodes_nb_alloc) {
>>> +    if (map->phys_map_nodes_nb + nodes > map->phys_map_nodes_nb_alloc) {
>>>          typedef PhysPageEntry Node[L2_SIZE];
>>> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc * 2, 16);
>>> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc,
>>> -                                      phys_map_nodes_nb + nodes);
>>> -        phys_map_nodes = g_renew(Node, phys_map_nodes,
>>> -                                 phys_map_nodes_nb_alloc);
>>> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc * 2,
>>> +                                                                        16);
>>> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc,
>>> +                                      map->phys_map_nodes_nb + nodes);
>>> +        map->phys_map_nodes = g_renew(Node, map->phys_map_nodes,
>>> +                                 map->phys_map_nodes_nb_alloc);
>>>      }
>>>  }
>>
>> Please have a patch that just adds the map parameter to all these
>> functions.  This makes the later patch, that adds the copy, easier to read.
>>
>>> +
>>> +void cur_map_update(PhysMap *next)
>>> +{
>>> +    qemu_mutex_lock(&cur_map_lock);
>>> +    physmap_put(cur_map);
>>> +    cur_map = next;
>>> +    smp_mb();
>>> +    qemu_mutex_unlock(&cur_map_lock);
>>> +}
>>
>> IMO this can be mem_map_lock.
>>
>> If we take my previous suggestion:
>>
>>   lookup:
>>      take mem_map_lock
>>      lookup
>>      take ref
>>      drop mem_map_lock
>>
>>   update:
>>      take mem_map_lock (in core_begin)
>>      do updates
>>      drop memo_map_lock
>>
>> And update it to
>>
>>
>>   update:
>>      prepare next_map (in core_begin)
>>      do updates
>>      take mem_map_lock (in core_commit)
>>      switch maps
>>      drop mem_map_lock
>>      free old map
>>
>>
>> Note the lookup path copies the MemoryRegionSection instead of
>> referencing it.  Thus we can destroy the old map without worrying; the
>> only pointers will point to MemoryRegions, which will be protected by
>> the refcounts on their Objects.
>>
> Just find there may be a leak here. If mrs points to subpage, then the
> subpage_t  could be crashed by destroy.
> To avoid such situation, we can walk down the chain to pin us on the
> Object based mr, but then we must expose the address convert in
> subpage_read() right here. Right?
>
Oh, just read the code logic and I think walk down the chain is
enough. And subpage_read/write() is bypass, so no need for fold the
addr translation.

Regards,
pingfan

> Regards,
> pingfan
>
>> This can be easily switched to rcu:
>>
>>   update:
>>      prepare next_map (in core_begin)
>>      do updates
>>      switch maps - rcu_assign_pointer
>>      call_rcu(free old map) (or synchronize_rcu; free old maps)
>>
>> Again, this should be done after the simplictic patch that enables
>> parallel lookup but keeps just one map.
>>
>>
>>
>> --
>> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 09/15] memory: prepare flatview and radix-tree for rcu style access
@ 2012-08-11 10:06         ` liu ping fan
  0 siblings, 0 replies; 154+ messages in thread
From: liu ping fan @ 2012-08-11 10:06 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Sat, Aug 11, 2012 at 9:58 AM, liu ping fan <qemulist@gmail.com> wrote:
> On Wed, Aug 8, 2012 at 5:41 PM, Avi Kivity <avi@redhat.com> wrote:
>> On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>
>>> Flatview and radix view are all under the protection of pointer.
>>> And this make sure the change of them seem to be atomic!
>>>
>>> The mr accessed by radix-tree leaf or flatview will be reclaimed
>>> after the prev PhysMap not in use any longer
>>>
>>
>> IMO this cleverness should come much later.  Let's first take care of
>> dropping the big qemu lock, then make swithcing memory maps more efficient.
>>
>> The initial paths could look like:
>>
>>   lookup:
>>      take mem_map_lock
>>      lookup
>>      take ref
>>      drop mem_map_lock
>>
>>   update:
>>      take mem_map_lock (in core_begin)
>>      do updates
>>      drop memo_map_lock
>>
>> Later we can replace mem_map_lock with either a rwlock or (real) rcu.
>>
>>
>>>
>>>  #if !defined(CONFIG_USER_ONLY)
>>>
>>> -static void phys_map_node_reserve(unsigned nodes)
>>> +static void phys_map_node_reserve(PhysMap *map, unsigned nodes)
>>>  {
>>> -    if (phys_map_nodes_nb + nodes > phys_map_nodes_nb_alloc) {
>>> +    if (map->phys_map_nodes_nb + nodes > map->phys_map_nodes_nb_alloc) {
>>>          typedef PhysPageEntry Node[L2_SIZE];
>>> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc * 2, 16);
>>> -        phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc,
>>> -                                      phys_map_nodes_nb + nodes);
>>> -        phys_map_nodes = g_renew(Node, phys_map_nodes,
>>> -                                 phys_map_nodes_nb_alloc);
>>> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc * 2,
>>> +                                                                        16);
>>> +        map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc,
>>> +                                      map->phys_map_nodes_nb + nodes);
>>> +        map->phys_map_nodes = g_renew(Node, map->phys_map_nodes,
>>> +                                 map->phys_map_nodes_nb_alloc);
>>>      }
>>>  }
>>
>> Please have a patch that just adds the map parameter to all these
>> functions.  This makes the later patch, that adds the copy, easier to read.
>>
>>> +
>>> +void cur_map_update(PhysMap *next)
>>> +{
>>> +    qemu_mutex_lock(&cur_map_lock);
>>> +    physmap_put(cur_map);
>>> +    cur_map = next;
>>> +    smp_mb();
>>> +    qemu_mutex_unlock(&cur_map_lock);
>>> +}
>>
>> IMO this can be mem_map_lock.
>>
>> If we take my previous suggestion:
>>
>>   lookup:
>>      take mem_map_lock
>>      lookup
>>      take ref
>>      drop mem_map_lock
>>
>>   update:
>>      take mem_map_lock (in core_begin)
>>      do updates
>>      drop memo_map_lock
>>
>> And update it to
>>
>>
>>   update:
>>      prepare next_map (in core_begin)
>>      do updates
>>      take mem_map_lock (in core_commit)
>>      switch maps
>>      drop mem_map_lock
>>      free old map
>>
>>
>> Note the lookup path copies the MemoryRegionSection instead of
>> referencing it.  Thus we can destroy the old map without worrying; the
>> only pointers will point to MemoryRegions, which will be protected by
>> the refcounts on their Objects.
>>
> Just find there may be a leak here. If mrs points to subpage, then the
> subpage_t  could be crashed by destroy.
> To avoid such situation, we can walk down the chain to pin us on the
> Object based mr, but then we must expose the address convert in
> subpage_read() right here. Right?
>
Oh, just read the code logic and I think walk down the chain is
enough. And subpage_read/write() is bypass, so no need for fold the
addr translation.

Regards,
pingfan

> Regards,
> pingfan
>
>> This can be easily switched to rcu:
>>
>>   update:
>>      prepare next_map (in core_begin)
>>      do updates
>>      switch maps - rcu_assign_pointer
>>      call_rcu(free old map) (or synchronize_rcu; free old maps)
>>
>> Again, this should be done after the simplictic patch that enables
>> parallel lookup but keeps just one map.
>>
>>
>>
>> --
>> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 06/15] memory: use refcnt to manage MemoryRegion
  2012-08-10  6:44           ` [Qemu-devel] " liu ping fan
@ 2012-08-12  8:43             ` Avi Kivity
  -1 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-12  8:43 UTC (permalink / raw)
  To: liu ping fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On 08/10/2012 09:44 AM, liu ping fan wrote:
>>> In the previous discussion, you have suggest add dev->ref++ in
>>> core_region_add.  But I think, if we can move it to higher layer --
>>> memory_region_{add,del}_subregion, so we can avoid to duplicate do
>>> this in other xx_region_add.
>>
>> Why would other memory listeners be impacted?  They all operate under
>> the big qemu lock.  If they start using devices outside the lock, then
>> they need to take a reference.
>>
> Yes, if unplug path in the protection of big lock.
> And just one extra question, for ram-unplug scene, how do we protect from:
>   updater:  ram-unplug -->qemu free() --> brk() invalidate this vaddr interval
>   reader:  vhost-thread copy data from the interval
> I guess something like lock/ref used by them, but can not find such
> mechanism in vhost_set_memory() to protect the scene against
> vhost_worker()

VHOST_SET_MEM_TABLE uses synchronize_srcu() to ensure no readers are
active before returning.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 06/15] memory: use refcnt to manage MemoryRegion
@ 2012-08-12  8:43             ` Avi Kivity
  0 siblings, 0 replies; 154+ messages in thread
From: Avi Kivity @ 2012-08-12  8:43 UTC (permalink / raw)
  To: liu ping fan
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On 08/10/2012 09:44 AM, liu ping fan wrote:
>>> In the previous discussion, you have suggest add dev->ref++ in
>>> core_region_add.  But I think, if we can move it to higher layer --
>>> memory_region_{add,del}_subregion, so we can avoid to duplicate do
>>> this in other xx_region_add.
>>
>> Why would other memory listeners be impacted?  They all operate under
>> the big qemu lock.  If they start using devices outside the lock, then
>> they need to take a reference.
>>
> Yes, if unplug path in the protection of big lock.
> And just one extra question, for ram-unplug scene, how do we protect from:
>   updater:  ram-unplug -->qemu free() --> brk() invalidate this vaddr interval
>   reader:  vhost-thread copy data from the interval
> I guess something like lock/ref used by them, but can not find such
> mechanism in vhost_set_memory() to protect the scene against
> vhost_worker()

VHOST_SET_MEM_TABLE uses synchronize_srcu() to ensure no readers are
active before returning.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 04/15] memory: MemoryRegion topology must be stable when updating
  2012-08-09  7:28       ` [Qemu-devel] " liu ping fan
@ 2012-08-13 18:28         ` Marcelo Tosatti
  -1 siblings, 0 replies; 154+ messages in thread
From: Marcelo Tosatti @ 2012-08-13 18:28 UTC (permalink / raw)
  To: liu ping fan
  Cc: kvm, Jan Kiszka, qemu-devel, Blue Swirl, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Thu, Aug 09, 2012 at 03:28:44PM +0800, liu ping fan wrote:
> On Wed, Aug 8, 2012 at 5:13 PM, Avi Kivity <avi@redhat.com> wrote:
> > On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
> >> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> >>
> >> Using mem_map_lock to protect among updaters. So we can get the intact
> >> snapshot of mem topology -- FlatView & radix-tree.
> >>
> >> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> >> ---
> >>  exec.c   |    3 +++
> >>  memory.c |   22 ++++++++++++++++++++++
> >>  memory.h |    2 ++
> >>  3 files changed, 27 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/exec.c b/exec.c
> >> index 8244d54..0e29ef9 100644
> >> --- a/exec.c
> >> +++ b/exec.c
> >> @@ -210,6 +210,8 @@ static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
> >>     The bottom level has pointers to MemoryRegionSections.  */
> >>  static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
> >>
> >> +QemuMutex mem_map_lock;
> >> +
> >>  static void io_mem_init(void);
> >>  static void memory_map_init(void);
> >>
> >> @@ -637,6 +639,7 @@ void cpu_exec_init_all(void)
> >>  #if !defined(CONFIG_USER_ONLY)
> >>      memory_map_init();
> >>      io_mem_init();
> >> +    qemu_mutex_init(&mem_map_lock);
> >>  #endif
> >>  }
> >>
> >> diff --git a/memory.c b/memory.c
> >> index aab4a31..5986532 100644
> >> --- a/memory.c
> >> +++ b/memory.c
> >> @@ -761,7 +761,9 @@ void memory_region_transaction_commit(void)
> >>      assert(memory_region_transaction_depth);
> >>      --memory_region_transaction_depth;
> >>      if (!memory_region_transaction_depth && memory_region_update_pending) {
> >> +        qemu_mutex_lock(&mem_map_lock);
> >>          memory_region_update_topology(NULL);
> >> +        qemu_mutex_unlock(&mem_map_lock);
> >>      }
> >>  }
> >
> > Seems to me that nothing in memory.c can susceptible to races.  It must
> > already be called under the big qemu lock, and with the exception of
> > mutators (memory_region_set_*), changes aren't directly visible.
> >
> Yes, what I want to do is "prepare unplug out of protection of global
> lock".  When io-dispatch and mmio-dispatch are all out of big lock, we
> will run into the following scene:
>     In vcpu context A, qdev_unplug_complete()-> delete subregion;
>     In context B, write pci bar --> pci mapping update    -> add subregion

Per device lock should protect that.

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 04/15] memory: MemoryRegion topology must be stable when updating
@ 2012-08-13 18:28         ` Marcelo Tosatti
  0 siblings, 0 replies; 154+ messages in thread
From: Marcelo Tosatti @ 2012-08-13 18:28 UTC (permalink / raw)
  To: liu ping fan
  Cc: kvm, Jan Kiszka, qemu-devel, Blue Swirl, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Thu, Aug 09, 2012 at 03:28:44PM +0800, liu ping fan wrote:
> On Wed, Aug 8, 2012 at 5:13 PM, Avi Kivity <avi@redhat.com> wrote:
> > On 08/08/2012 09:25 AM, Liu Ping Fan wrote:
> >> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> >>
> >> Using mem_map_lock to protect among updaters. So we can get the intact
> >> snapshot of mem topology -- FlatView & radix-tree.
> >>
> >> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> >> ---
> >>  exec.c   |    3 +++
> >>  memory.c |   22 ++++++++++++++++++++++
> >>  memory.h |    2 ++
> >>  3 files changed, 27 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/exec.c b/exec.c
> >> index 8244d54..0e29ef9 100644
> >> --- a/exec.c
> >> +++ b/exec.c
> >> @@ -210,6 +210,8 @@ static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
> >>     The bottom level has pointers to MemoryRegionSections.  */
> >>  static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
> >>
> >> +QemuMutex mem_map_lock;
> >> +
> >>  static void io_mem_init(void);
> >>  static void memory_map_init(void);
> >>
> >> @@ -637,6 +639,7 @@ void cpu_exec_init_all(void)
> >>  #if !defined(CONFIG_USER_ONLY)
> >>      memory_map_init();
> >>      io_mem_init();
> >> +    qemu_mutex_init(&mem_map_lock);
> >>  #endif
> >>  }
> >>
> >> diff --git a/memory.c b/memory.c
> >> index aab4a31..5986532 100644
> >> --- a/memory.c
> >> +++ b/memory.c
> >> @@ -761,7 +761,9 @@ void memory_region_transaction_commit(void)
> >>      assert(memory_region_transaction_depth);
> >>      --memory_region_transaction_depth;
> >>      if (!memory_region_transaction_depth && memory_region_update_pending) {
> >> +        qemu_mutex_lock(&mem_map_lock);
> >>          memory_region_update_topology(NULL);
> >> +        qemu_mutex_unlock(&mem_map_lock);
> >>      }
> >>  }
> >
> > Seems to me that nothing in memory.c can susceptible to races.  It must
> > already be called under the big qemu lock, and with the exception of
> > mutators (memory_region_set_*), changes aren't directly visible.
> >
> Yes, what I want to do is "prepare unplug out of protection of global
> lock".  When io-dispatch and mmio-dispatch are all out of big lock, we
> will run into the following scene:
>     In vcpu context A, qdev_unplug_complete()-> delete subregion;
>     In context B, write pci bar --> pci mapping update    -> add subregion

Per device lock should protect that.

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 13/15] hotplug: introduce qdev_unplug_complete() to remove device from views
  2012-08-09  8:00         ` [Qemu-devel] " Paolo Bonzini
@ 2012-08-13 18:51           ` Marcelo Tosatti
  -1 siblings, 0 replies; 154+ messages in thread
From: Marcelo Tosatti @ 2012-08-13 18:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, Jan Kiszka, liu ping fan, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

On Thu, Aug 09, 2012 at 10:00:16AM +0200, Paolo Bonzini wrote:
> Il 09/08/2012 09:28, liu ping fan ha scritto:
> >> >     VCPU thread                    I/O thread
> >> > =====================================================================
> >> >     get MMIO request
> >> >     rcu_read_lock()
> >> >     walk memory map
> >> >                                    qdev_unmap()
> >> >                                    lock_devtree()
> >> >                                    ...
> >> >                                    unlock_devtree
> >> >                                    unref dev -> refcnt=0, free enqueued
> >> >     ref()
> > No ref() for dev here, while we have ref to flatview+radix in my patches.
> > I use rcu to protect radix+flatview+mr refered. As to dev, its ref has
> > inc when it is added into mem view -- that is
> > memory_region_add_subregion -> memory_region_get() {
> > if(atomic_add_and_return()) dev->ref++  }.
> > So not until reclaimer of mem view, the dev's ref is hold by mem view.
> > In a short word, rcu protect mem view, while device is protected by refcnt.

The idea, written on that plan, was:

- RCU protects memory maps.
- Object reference protects device in the window between 4. and 5.

The unplug/remove path should:

1) Lock memmap_lock for write (if not using RCU).
2) Remove any memmap entries (which is possible due to write lock on
memmap_lock. Alternatively wait for an RCU grace period). Device should
not be visible after that.
3) Lock dev->lock.
4) Wait until references are removed (no new references can be made
since device is not visible).
5) Remove device. 

So its a combination of both dev->lock and reference counter.

Note: a first step can be only parallel execution of MMIO lookups
(actually that is a very good first target). dev->lock above would be 
qemu_big_lock in that first stage, then _only devices which are 
performance sensitive need to be converted_.

> But the RCU critical section should not include the whole processing of
> MMIO, only the walk of the memory map.

Yes.

> And in general I think this is a bit too tricky... I understand not
> adding refcounting to all of bottom halves, timers, etc., but if you are
> using a device you should have explicit ref/unref pairs.
> 
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 13/15] hotplug: introduce qdev_unplug_complete() to remove device from views
@ 2012-08-13 18:51           ` Marcelo Tosatti
  0 siblings, 0 replies; 154+ messages in thread
From: Marcelo Tosatti @ 2012-08-13 18:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, Jan Kiszka, liu ping fan, qemu-devel, Blue Swirl,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi,
	Andreas Färber

On Thu, Aug 09, 2012 at 10:00:16AM +0200, Paolo Bonzini wrote:
> Il 09/08/2012 09:28, liu ping fan ha scritto:
> >> >     VCPU thread                    I/O thread
> >> > =====================================================================
> >> >     get MMIO request
> >> >     rcu_read_lock()
> >> >     walk memory map
> >> >                                    qdev_unmap()
> >> >                                    lock_devtree()
> >> >                                    ...
> >> >                                    unlock_devtree
> >> >                                    unref dev -> refcnt=0, free enqueued
> >> >     ref()
> > No ref() for dev here, while we have ref to flatview+radix in my patches.
> > I use rcu to protect radix+flatview+mr refered. As to dev, its ref has
> > inc when it is added into mem view -- that is
> > memory_region_add_subregion -> memory_region_get() {
> > if(atomic_add_and_return()) dev->ref++  }.
> > So not until reclaimer of mem view, the dev's ref is hold by mem view.
> > In a short word, rcu protect mem view, while device is protected by refcnt.

The idea, written on that plan, was:

- RCU protects memory maps.
- Object reference protects device in the window between 4. and 5.

The unplug/remove path should:

1) Lock memmap_lock for write (if not using RCU).
2) Remove any memmap entries (which is possible due to write lock on
memmap_lock. Alternatively wait for an RCU grace period). Device should
not be visible after that.
3) Lock dev->lock.
4) Wait until references are removed (no new references can be made
since device is not visible).
5) Remove device. 

So its a combination of both dev->lock and reference counter.

Note: a first step can be only parallel execution of MMIO lookups
(actually that is a very good first target). dev->lock above would be 
qemu_big_lock in that first stage, then _only devices which are 
performance sensitive need to be converted_.

> But the RCU critical section should not include the whole processing of
> MMIO, only the walk of the memory map.

Yes.

> And in general I think this is a bit too tricky... I understand not
> adding refcounting to all of bottom halves, timers, etc., but if you are
> using a device you should have explicit ref/unref pairs.
> 
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 13/15] hotplug: introduce qdev_unplug_complete() to remove device from views
  2012-08-10  6:42           ` [Qemu-devel] " liu ping fan
@ 2012-08-13 18:53             ` Marcelo Tosatti
  -1 siblings, 0 replies; 154+ messages in thread
From: Marcelo Tosatti @ 2012-08-13 18:53 UTC (permalink / raw)
  To: liu ping fan
  Cc: kvm, Jan Kiszka, qemu-devel, Blue Swirl, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Fri, Aug 10, 2012 at 02:42:58PM +0800, liu ping fan wrote:
> On Thu, Aug 9, 2012 at 4:00 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > Il 09/08/2012 09:28, liu ping fan ha scritto:
> >>> >     VCPU thread                    I/O thread
> >>> > =====================================================================
> >>> >     get MMIO request
> >>> >     rcu_read_lock()
> >>> >     walk memory map
> >>> >                                    qdev_unmap()
> >>> >                                    lock_devtree()
> >>> >                                    ...
> >>> >                                    unlock_devtree
> >>> >                                    unref dev -> refcnt=0, free enqueued
> >>> >     ref()
> >> No ref() for dev here, while we have ref to flatview+radix in my patches.
> >> I use rcu to protect radix+flatview+mr refered. As to dev, its ref has
> >> inc when it is added into mem view -- that is
> >> memory_region_add_subregion -> memory_region_get() {
> >> if(atomic_add_and_return()) dev->ref++  }.
> >> So not until reclaimer of mem view, the dev's ref is hold by mem view.
> >> In a short word, rcu protect mem view, while device is protected by refcnt.
> >
> > But the RCU critical section should not include the whole processing of
> > MMIO, only the walk of the memory map.
> >
> Yes, you are right.  And I think cur_map_get() can be broken into the
> style "lock,  ref++, phys_page_find(); unlock".  easily.
> 
> > And in general I think this is a bit too tricky... I understand not
> > adding refcounting to all of bottom halves, timers, etc., but if you are
> > using a device you should have explicit ref/unref pairs.
> >
> Actually, there are pairs -- when dev enter mem view, inc ref; and
> when it leave, dec ref.
> But as Avi has pointed out, the mr->refcnt introduce complicate and no
> gain. So I will discard this design

The plan that you refer that has been relatively well thought out, IIRC. 
Instead of designing something, i would try to understand/improve on
that.

> Thanks and regards,
> pingfan
> 
> > Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 13/15] hotplug: introduce qdev_unplug_complete() to remove device from views
@ 2012-08-13 18:53             ` Marcelo Tosatti
  0 siblings, 0 replies; 154+ messages in thread
From: Marcelo Tosatti @ 2012-08-13 18:53 UTC (permalink / raw)
  To: liu ping fan
  Cc: kvm, Jan Kiszka, qemu-devel, Blue Swirl, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Fri, Aug 10, 2012 at 02:42:58PM +0800, liu ping fan wrote:
> On Thu, Aug 9, 2012 at 4:00 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > Il 09/08/2012 09:28, liu ping fan ha scritto:
> >>> >     VCPU thread                    I/O thread
> >>> > =====================================================================
> >>> >     get MMIO request
> >>> >     rcu_read_lock()
> >>> >     walk memory map
> >>> >                                    qdev_unmap()
> >>> >                                    lock_devtree()
> >>> >                                    ...
> >>> >                                    unlock_devtree
> >>> >                                    unref dev -> refcnt=0, free enqueued
> >>> >     ref()
> >> No ref() for dev here, while we have ref to flatview+radix in my patches.
> >> I use rcu to protect radix+flatview+mr refered. As to dev, its ref has
> >> inc when it is added into mem view -- that is
> >> memory_region_add_subregion -> memory_region_get() {
> >> if(atomic_add_and_return()) dev->ref++  }.
> >> So not until reclaimer of mem view, the dev's ref is hold by mem view.
> >> In a short word, rcu protect mem view, while device is protected by refcnt.
> >
> > But the RCU critical section should not include the whole processing of
> > MMIO, only the walk of the memory map.
> >
> Yes, you are right.  And I think cur_map_get() can be broken into the
> style "lock,  ref++, phys_page_find(); unlock".  easily.
> 
> > And in general I think this is a bit too tricky... I understand not
> > adding refcounting to all of bottom halves, timers, etc., but if you are
> > using a device you should have explicit ref/unref pairs.
> >
> Actually, there are pairs -- when dev enter mem view, inc ref; and
> when it leave, dec ref.
> But as Avi has pointed out, the mr->refcnt introduce complicate and no
> gain. So I will discard this design

The plan that you refer that has been relatively well thought out, IIRC. 
Instead of designing something, i would try to understand/improve on
that.

> Thanks and regards,
> pingfan
> 
> > Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [PATCH 14/15] qom: object_unref call reclaimer
  2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
@ 2012-08-13 18:56     ` Marcelo Tosatti
  -1 siblings, 0 replies; 154+ messages in thread
From: Marcelo Tosatti @ 2012-08-13 18:56 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, qemu-devel, Blue Swirl, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Wed, Aug 08, 2012 at 02:25:55PM +0800, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> iohandler/bh/timer may use DeviceState when its refcnt=0,
> postpone the reclaimer till they have done with it.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  qom/object.c |    9 ++++++++-
>  1 files changed, 8 insertions(+), 1 deletions(-)
> 
> diff --git a/qom/object.c b/qom/object.c
> index 822bdb7..1452b1b 100644
> --- a/qom/object.c
> +++ b/qom/object.c
> @@ -23,6 +23,8 @@
>  #include "qbool.h"
>  #include "qint.h"
>  #include "qstring.h"
> +#include "hw/qdev.h"
> +#include "qemu/reclaimer.h"
>  
>  #define MAX_INTERFACES 32
>  
> @@ -646,7 +648,12 @@ void object_unref(Object *obj)
>  {
>      g_assert(atomic_read(&obj->ref) > 0);
>      if (atomic_dec_and_test(&obj->ref)) {
> -        object_finalize(obj);
> +        /* fixme, maybe introduce obj->finalze to make this more elegant */
> +        if (object_dynamic_cast(obj, TYPE_DEVICE) != NULL) {
> +            qemu_reclaimer_enqueue(obj, object_finalize);
> +        } else {
> +            object_finalize(obj);
> +        }
>      }

As mentioned, removal under the original plan is combination of
per-device lock and object reference. An initial step is parallel mmio
only, with everything else under the big lock.

Again, _only_ parallel mmio lookup as a first target is a good goal,
IMO.

^ permalink raw reply	[flat|nested] 154+ messages in thread

* Re: [Qemu-devel] [PATCH 14/15] qom: object_unref call reclaimer
@ 2012-08-13 18:56     ` Marcelo Tosatti
  0 siblings, 0 replies; 154+ messages in thread
From: Marcelo Tosatti @ 2012-08-13 18:56 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: kvm, Jan Kiszka, qemu-devel, Blue Swirl, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini,
	Andreas Färber

On Wed, Aug 08, 2012 at 02:25:55PM +0800, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> iohandler/bh/timer may use DeviceState when its refcnt=0,
> postpone the reclaimer till they have done with it.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  qom/object.c |    9 ++++++++-
>  1 files changed, 8 insertions(+), 1 deletions(-)
> 
> diff --git a/qom/object.c b/qom/object.c
> index 822bdb7..1452b1b 100644
> --- a/qom/object.c
> +++ b/qom/object.c
> @@ -23,6 +23,8 @@
>  #include "qbool.h"
>  #include "qint.h"
>  #include "qstring.h"
> +#include "hw/qdev.h"
> +#include "qemu/reclaimer.h"
>  
>  #define MAX_INTERFACES 32
>  
> @@ -646,7 +648,12 @@ void object_unref(Object *obj)
>  {
>      g_assert(atomic_read(&obj->ref) > 0);
>      if (atomic_dec_and_test(&obj->ref)) {
> -        object_finalize(obj);
> +        /* fixme, maybe introduce obj->finalze to make this more elegant */
> +        if (object_dynamic_cast(obj, TYPE_DEVICE) != NULL) {
> +            qemu_reclaimer_enqueue(obj, object_finalize);
> +        } else {
> +            object_finalize(obj);
> +        }
>      }

As mentioned, removal under the original plan is combination of
per-device lock and object reference. An initial step is parallel mmio
only, with everything else under the big lock.

Again, _only_ parallel mmio lookup as a first target is a good goal,
IMO.

^ permalink raw reply	[flat|nested] 154+ messages in thread

end of thread, other threads:[~2012-08-13 18:57 UTC | newest]

Thread overview: 154+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-08  6:25 [PATCH 0/15 v2] prepare unplug out of protection of global lock Liu Ping Fan
2012-08-08  6:25 ` [Qemu-devel] " Liu Ping Fan
2012-08-08  6:25 ` [PATCH 01/15] atomic: introduce atomic operations Liu Ping Fan
2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
2012-08-08  8:55   ` Paolo Bonzini
2012-08-08  8:55     ` [Qemu-devel] " Paolo Bonzini
2012-08-08  9:02   ` Avi Kivity
2012-08-08  9:02     ` [Qemu-devel] " Avi Kivity
2012-08-08  9:05     ` 陳韋任 (Wei-Ren Chen)
2012-08-08  9:05       ` 陳韋任 (Wei-Ren Chen)
2012-08-08  9:15       ` Avi Kivity
2012-08-08  9:15         ` [Qemu-devel] " Avi Kivity
2012-08-08  9:21   ` Peter Maydell
2012-08-08  9:21     ` Peter Maydell
2012-08-08 13:09     ` Stefan Hajnoczi
2012-08-08 13:09       ` Stefan Hajnoczi
2012-08-08 13:18       ` Paolo Bonzini
2012-08-08 13:18         ` Paolo Bonzini
2012-08-08 13:32         ` Peter Maydell
2012-08-08 13:32           ` [Qemu-devel] " Peter Maydell
2012-08-08 13:49           ` Paolo Bonzini
2012-08-08 13:49             ` [Qemu-devel] " Paolo Bonzini
2012-08-08 14:00             ` Avi Kivity
2012-08-08 14:00               ` [Qemu-devel] " Avi Kivity
2012-08-08  6:25 ` [PATCH 02/15] qom: using atomic ops to re-implement object_ref Liu Ping Fan
2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
2012-08-08  6:25 ` [PATCH 03/15] qom: introduce reclaimer to release obj Liu Ping Fan
2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
2012-08-08  9:05   ` Avi Kivity
2012-08-08  9:05     ` [Qemu-devel] " Avi Kivity
2012-08-08  9:07     ` Paolo Bonzini
2012-08-08  9:07       ` [Qemu-devel] " Paolo Bonzini
2012-08-08  9:15       ` Avi Kivity
2012-08-08  9:15         ` [Qemu-devel] " Avi Kivity
2012-08-09  7:33         ` liu ping fan
2012-08-09  7:33           ` [Qemu-devel] " liu ping fan
2012-08-09  7:49           ` Paolo Bonzini
2012-08-09  7:49             ` [Qemu-devel] " Paolo Bonzini
2012-08-09  8:18             ` Avi Kivity
2012-08-09  8:18               ` [Qemu-devel] " Avi Kivity
2012-08-10  6:43               ` liu ping fan
2012-08-10  6:43                 ` [Qemu-devel] " liu ping fan
2012-08-08  9:35   ` Paolo Bonzini
2012-08-08  9:35     ` [Qemu-devel] " Paolo Bonzini
2012-08-09  7:38     ` liu ping fan
2012-08-09  7:38       ` [Qemu-devel] " liu ping fan
2012-08-08  6:25 ` [PATCH 04/15] memory: MemoryRegion topology must be stable when updating Liu Ping Fan
2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
2012-08-08  9:13   ` Avi Kivity
2012-08-08  9:13     ` [Qemu-devel] " Avi Kivity
2012-08-09  7:28     ` liu ping fan
2012-08-09  7:28       ` [Qemu-devel] " liu ping fan
2012-08-09  8:24       ` Avi Kivity
2012-08-09  8:24         ` [Qemu-devel] " Avi Kivity
2012-08-10  6:44         ` liu ping fan
2012-08-10  6:44           ` [Qemu-devel] " liu ping fan
2012-08-13 18:28       ` Marcelo Tosatti
2012-08-13 18:28         ` [Qemu-devel] " Marcelo Tosatti
2012-08-08 19:17   ` Blue Swirl
2012-08-08 19:17     ` [Qemu-devel] " Blue Swirl
2012-08-09  7:28     ` liu ping fan
2012-08-09  7:28       ` [Qemu-devel] " liu ping fan
2012-08-09 17:09       ` Blue Swirl
2012-08-09 17:09         ` [Qemu-devel] " Blue Swirl
2012-08-08  6:25 ` [PATCH 05/15] memory: introduce life_ops to MemoryRegion Liu Ping Fan
2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
2012-08-08  9:18   ` Avi Kivity
2012-08-08  9:18     ` [Qemu-devel] " Avi Kivity
2012-08-08  6:25 ` [PATCH 06/15] memory: use refcnt to manage MemoryRegion Liu Ping Fan
2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
2012-08-08  9:20   ` Avi Kivity
2012-08-08  9:20     ` [Qemu-devel] " Avi Kivity
2012-08-09  7:27     ` liu ping fan
2012-08-09  7:27       ` [Qemu-devel] " liu ping fan
2012-08-09  8:38       ` Avi Kivity
2012-08-09  8:38         ` [Qemu-devel] " Avi Kivity
2012-08-10  6:44         ` liu ping fan
2012-08-10  6:44           ` [Qemu-devel] " liu ping fan
2012-08-12  8:43           ` Avi Kivity
2012-08-12  8:43             ` [Qemu-devel] " Avi Kivity
2012-08-08  6:25 ` [PATCH 07/15] memory: inc/dec mr's ref when adding/removing from mem view Liu Ping Fan
2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
2012-08-08  6:25 ` [PATCH 08/15] memory: introduce PhysMap to present snapshot of toploygy Liu Ping Fan
2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
2012-08-08  9:27   ` Avi Kivity
2012-08-08  9:27     ` [Qemu-devel] " Avi Kivity
2012-08-08 19:18   ` Blue Swirl
2012-08-08 19:18     ` [Qemu-devel] " Blue Swirl
2012-08-09  7:29     ` liu ping fan
2012-08-09  7:29       ` [Qemu-devel] " liu ping fan
2012-08-08  6:25 ` [PATCH 09/15] memory: prepare flatview and radix-tree for rcu style access Liu Ping Fan
2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
2012-08-08  9:41   ` Avi Kivity
2012-08-08  9:41     ` [Qemu-devel] " Avi Kivity
2012-08-11  1:58     ` liu ping fan
2012-08-11  1:58       ` [Qemu-devel] " liu ping fan
2012-08-11 10:06       ` liu ping fan
2012-08-11 10:06         ` [Qemu-devel] " liu ping fan
2012-08-08 19:23   ` Blue Swirl
2012-08-08 19:23     ` [Qemu-devel] " Blue Swirl
2012-08-09  7:29     ` liu ping fan
2012-08-09  7:29       ` [Qemu-devel] " liu ping fan
2012-08-08  6:25 ` [PATCH 10/15] memory: change tcg related code to using PhysMap Liu Ping Fan
2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
2012-08-08  6:25 ` [PATCH 11/15] lock: introduce global lock for device tree Liu Ping Fan
2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
2012-08-08  9:41   ` Paolo Bonzini
2012-08-08  9:41     ` [Qemu-devel] " Paolo Bonzini
2012-08-09  7:28     ` liu ping fan
2012-08-09  7:28       ` [Qemu-devel] " liu ping fan
2012-08-09  7:41       ` Paolo Bonzini
2012-08-09  7:41         ` [Qemu-devel] " Paolo Bonzini
2012-08-08  9:42   ` Avi Kivity
2012-08-08  9:42     ` [Qemu-devel] " Avi Kivity
2012-08-09  7:27     ` liu ping fan
2012-08-09  7:27       ` [Qemu-devel] " liu ping fan
2012-08-09  8:31       ` Avi Kivity
2012-08-09  8:31         ` [Qemu-devel] " Avi Kivity
2012-08-08  6:25 ` [PATCH 12/15] qdev: using devtree lock to protect device's accessing Liu Ping Fan
2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
2012-08-08  9:33   ` Peter Maydell
2012-08-08  9:33     ` [Qemu-devel] " Peter Maydell
2012-08-08  6:25 ` [PATCH 13/15] hotplug: introduce qdev_unplug_complete() to remove device from views Liu Ping Fan
2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
2012-08-08  9:52   ` Paolo Bonzini
2012-08-08  9:52     ` [Qemu-devel] " Paolo Bonzini
2012-08-08 10:07     ` Avi Kivity
2012-08-08 10:07       ` [Qemu-devel] " Avi Kivity
2012-08-09  7:28     ` liu ping fan
2012-08-09  7:28       ` [Qemu-devel] " liu ping fan
2012-08-09  8:00       ` Paolo Bonzini
2012-08-09  8:00         ` [Qemu-devel] " Paolo Bonzini
2012-08-10  6:42         ` liu ping fan
2012-08-10  6:42           ` [Qemu-devel] " liu ping fan
2012-08-13 18:53           ` Marcelo Tosatti
2012-08-13 18:53             ` [Qemu-devel] " Marcelo Tosatti
2012-08-13 18:51         ` Marcelo Tosatti
2012-08-13 18:51           ` [Qemu-devel] " Marcelo Tosatti
2012-08-08  6:25 ` [PATCH 14/15] qom: object_unref call reclaimer Liu Ping Fan
2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
2012-08-08  9:40   ` Paolo Bonzini
2012-08-08  9:40     ` [Qemu-devel] " Paolo Bonzini
2012-08-13 18:56   ` Marcelo Tosatti
2012-08-13 18:56     ` [Qemu-devel] " Marcelo Tosatti
2012-08-08  6:25 ` [PATCH 15/15] e1000: using new interface--unmap to unplug Liu Ping Fan
2012-08-08  6:25   ` [Qemu-devel] " Liu Ping Fan
2012-08-08  9:56   ` Paolo Bonzini
2012-08-08  9:56     ` [Qemu-devel] " Paolo Bonzini
2012-08-09  7:28     ` liu ping fan
2012-08-09  7:28       ` [Qemu-devel] " liu ping fan
2012-08-09  7:40       ` Paolo Bonzini
2012-08-09  7:40         ` [Qemu-devel] " Paolo Bonzini
2012-08-10  6:43         ` liu ping fan
2012-08-10  6:43           ` [Qemu-devel] " liu ping fan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.