[Qemu-devel] [PATCH v6 0/8] push mmio dispatch out of big lock

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v6 0/8]  push mmio dispatch out of big lock
@ 2012-11-05  5:38 Liu Ping Fan
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 1/8] atomic: introduce atomic operations Liu Ping Fan
                   ` (8 more replies)
  0 siblings, 9 replies; 28+ messages in thread
From: Liu Ping Fan @ 2012-11-05  5:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Jan Kiszka, Marcelo Tosatti, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

v1:
https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg03312.html

v2:
http://lists.gnu.org/archive/html/qemu-devel/2012-08/msg01275.html

v3:
http://lists.nongnu.org/archive/html/qemu-devel/2012-09/msg01474.html

v4:
http://lists.gnu.org/archive/html/qemu-devel/2012-10/msg03857.html

v5:
https://lists.gnu.org/archive/html/qemu-devel/2012-10/msg04867.html

changes v5->v6:
 Apply fine-grain lock for all address space.
 Introduce separated interface to allow mmio dispatcher called with/without big lock.

Liu Ping Fan (8):
  atomic: introduce atomic operations
  qom: apply atomic on object's refcount
  hotplug: introduce qdev_unplug_complete() to remove device from views
  pci: remove pci device from mem view when unplug
  memory: introduce local lock for address space
  memory: make mmio dispatch able to be out of biglock
  memory: introduce tls context to trace nested mmio request issue
  vcpu: push mmio dispatcher out of big lock

 cpu-common.h          |    3 +
 docs/memory.txt       |    4 +
 exec.c                |  219 +++++++++++++++++++++++++++++++++++++++++++++----
 hw/acpi_piix4.c       |    2 +-
 hw/pci.c              |   13 +++-
 hw/pci.h              |    1 +
 hw/qdev.c             |   26 ++++++
 hw/qdev.h             |    3 +-
 include/qemu/atomic.h |   63 ++++++++++++++
 include/qemu/object.h |    3 +-
 kvm-all.c             |    6 +-
 memory-internal.h     |    1 +
 memory.c              |    1 +
 memory.h              |    5 +
 qemu-thread.h         |    7 ++
 qom/object.c          |   11 +--
 16 files changed, 340 insertions(+), 28 deletions(-)
 create mode 100644 include/qemu/atomic.h

-- 
1.7.4.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v6 1/8] atomic: introduce atomic operations
  2012-11-05  5:38 [Qemu-devel] [PATCH v6 0/8] push mmio dispatch out of big lock Liu Ping Fan
@ 2012-11-05  5:38 ` Liu Ping Fan
  2012-11-12  9:54   ` Paolo Bonzini
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 2/8] qom: apply atomic on object's refcount Liu Ping Fan
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Liu Ping Fan @ 2012-11-05  5:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Jan Kiszka, Marcelo Tosatti, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

If out of global lock, we will be challenged by SMP in low level,
so need atomic ops.

This file is a wrapper of GCC atomic builtin.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 include/qemu/atomic.h |   63 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 63 insertions(+), 0 deletions(-)
 create mode 100644 include/qemu/atomic.h

diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
new file mode 100644
index 0000000..a9e6d35
--- /dev/null
+++ b/include/qemu/atomic.h
@@ -0,0 +1,63 @@
+/*
+ * Simple wrapper of gcc Atomic-Builtins
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#ifndef __QEMU_ATOMIC_H
+#define __QEMU_ATOMIC_H
+
+typedef struct Atomic {
+    volatile int counter;
+} Atomic;
+
+static inline void atomic_set(Atomic *v, int i)
+{
+    v->counter = i;
+}
+
+static inline int atomic_read(Atomic *v)
+{
+    return v->counter;
+}
+
+static inline int atomic_return_and_add(int i, Atomic *v)
+{
+    int ret;
+
+    ret = __sync_fetch_and_add(&v->counter, i);
+    return ret;
+}
+
+static inline int atomic_return_and_sub(int i, Atomic *v)
+{
+    int ret;
+
+    ret = __sync_fetch_and_sub(&v->counter, i);
+    return ret;
+}
+
+/**
+ *  * atomic_inc - increment atomic variable
+ *   * @v: pointer of type Atomic
+ *    *
+ *     * Atomically increments @v by 1.
+ *      */
+static inline void atomic_inc(Atomic *v)
+{
+    __sync_fetch_and_add(&v->counter, 1);
+}
+
+/**
+ *  * atomic_dec - decrement atomic variable
+ *   * @v: pointer of type Atomic
+ *    *
+ *     * Atomically decrements @v by 1.
+ *      */
+static inline void atomic_dec(Atomic *v)
+{
+    __sync_fetch_and_sub(&v->counter, 1);
+}
+
+#endif
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v6 2/8] qom: apply atomic on object's refcount
  2012-11-05  5:38 [Qemu-devel] [PATCH v6 0/8] push mmio dispatch out of big lock Liu Ping Fan
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 1/8] atomic: introduce atomic operations Liu Ping Fan
@ 2012-11-05  5:38 ` Liu Ping Fan
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 3/8] hotplug: introduce qdev_unplug_complete() to remove device from views Liu Ping Fan
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Liu Ping Fan @ 2012-11-05  5:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Jan Kiszka, Marcelo Tosatti, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 include/qemu/object.h |    3 ++-
 qom/object.c          |   11 +++++------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/qemu/object.h b/include/qemu/object.h
index cc75fee..0c02614 100644
--- a/include/qemu/object.h
+++ b/include/qemu/object.h
@@ -18,6 +18,7 @@
 #include <stdint.h>
 #include <stdbool.h>
 #include "qemu-queue.h"
+#include "qemu/atomic.h"
 
 struct Visitor;
 struct Error;
@@ -262,7 +263,7 @@ struct Object
     /*< private >*/
     ObjectClass *class;
     QTAILQ_HEAD(, ObjectProperty) properties;
-    uint32_t ref;
+    Atomic ref;
     Object *parent;
 };
 
diff --git a/qom/object.c b/qom/object.c
index e3e9242..34ec2a1 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -383,7 +383,7 @@ void object_finalize(void *data)
     object_deinit(obj, ti);
     object_property_del_all(obj);
 
-    g_assert(obj->ref == 0);
+    g_assert(atomic_read(&obj->ref) == 0);
 }
 
 Object *object_new_with_type(Type type)
@@ -410,7 +410,7 @@ Object *object_new(const char *typename)
 void object_delete(Object *obj)
 {
     object_unparent(obj);
-    g_assert(obj->ref == 1);
+    g_assert(atomic_read(&obj->ref) == 1);
     object_unref(obj);
     g_free(obj);
 }
@@ -600,16 +600,15 @@ GSList *object_class_get_list(const char *implements_type,
 
 void object_ref(Object *obj)
 {
-    obj->ref++;
+    atomic_inc(&obj->ref);
 }
 
 void object_unref(Object *obj)
 {
-    g_assert(obj->ref > 0);
-    obj->ref--;
+    g_assert(atomic_read(&obj->ref) > 0);
 
     /* parent always holds a reference to its children */
-    if (obj->ref == 0) {
+    if (atomic_return_and_sub(1, &obj->ref) == 1) {
         object_finalize(obj);
     }
 }
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v6 3/8] hotplug: introduce qdev_unplug_complete() to remove device from views
  2012-11-05  5:38 [Qemu-devel] [PATCH v6 0/8] push mmio dispatch out of big lock Liu Ping Fan
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 1/8] atomic: introduce atomic operations Liu Ping Fan
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 2/8] qom: apply atomic on object's refcount Liu Ping Fan
@ 2012-11-05  5:38 ` Liu Ping Fan
  2012-11-12  9:27   ` Paolo Bonzini
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 4/8] pci: remove pci device from mem view when unplug Liu Ping Fan
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Liu Ping Fan @ 2012-11-05  5:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Jan Kiszka, Marcelo Tosatti, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

When device unplug has been ack by guest, we first remove it from memory
to prevent incoming access from dispatcher. Then we isolate it from
device composition tree

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/qdev.c |   26 ++++++++++++++++++++++++++
 hw/qdev.h |    3 ++-
 2 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/hw/qdev.c b/hw/qdev.c
index 9b9aba3..681e133 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -98,6 +98,14 @@ void qdev_set_parent_bus(DeviceState *dev, BusState *bus)
     bus_add_child(bus, dev);
 }
 
+static void qdev_unset_parent(DeviceState *dev)
+{
+    BusState *b = dev->parent_bus;
+
+    object_unparent(OBJECT(dev));
+    bus_remove_child(b, dev);
+}
+
 /* Create a new device.  This only initializes the device state structure
    and allows properties to be set.  qdev_init should be called to
    initialize the actual device emulation.  */
@@ -187,6 +195,24 @@ void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
     dev->alias_required_for_version = required_for_version;
 }
 
+static int qdev_unmap(DeviceState *dev)
+{
+    DeviceClass *dc =  DEVICE_GET_CLASS(dev);
+    if (dc->unmap) {
+        dc->unmap(dev);
+    }
+    return 0;
+}
+
+void qdev_unplug_complete(DeviceState *dev, Error **errp)
+{
+    /* isolate from mem view */
+    qdev_unmap(dev);
+    /* isolate from device tree */
+    qdev_unset_parent(dev);
+    object_unref(OBJECT(dev));
+}
+
 void qdev_unplug(DeviceState *dev, Error **errp)
 {
     DeviceClass *dc = DEVICE_GET_CLASS(dev);
diff --git a/hw/qdev.h b/hw/qdev.h
index c6ac636..71eb9ca 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -47,7 +47,7 @@ typedef struct DeviceClass {
 
     /* callbacks */
     void (*reset)(DeviceState *dev);
-
+    void (*unmap)(DeviceState *dev);
     /* device state */
     const VMStateDescription *vmsd;
 
@@ -160,6 +160,7 @@ void qdev_init_nofail(DeviceState *dev);
 void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
                                  int required_for_version);
 void qdev_unplug(DeviceState *dev, Error **errp);
+void qdev_unplug_complete(DeviceState *dev, Error **errp);
 void qdev_free(DeviceState *dev);
 int qdev_simple_unplug_cb(DeviceState *dev);
 void qdev_machine_creation_done(void);
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v6 4/8] pci: remove pci device from mem view when unplug
  2012-11-05  5:38 [Qemu-devel] [PATCH v6 0/8] push mmio dispatch out of big lock Liu Ping Fan
                   ` (2 preceding siblings ...)
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 3/8] hotplug: introduce qdev_unplug_complete() to remove device from views Liu Ping Fan
@ 2012-11-05  5:38 ` Liu Ping Fan
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 5/8] memory: introduce local lock for address space Liu Ping Fan
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Liu Ping Fan @ 2012-11-05  5:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Jan Kiszka, Marcelo Tosatti, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/acpi_piix4.c |    2 +-
 hw/pci.c        |   13 ++++++++++++-
 hw/pci.h        |    1 +
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 15275cf..b45a016 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -306,7 +306,7 @@ static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots)
             if (pc->no_hotplug) {
                 slot_free = false;
             } else {
-                qdev_free(qdev);
+                qdev_unplug_complete(qdev, NULL);
             }
         }
     }
diff --git a/hw/pci.c b/hw/pci.c
index 7eeaac0..9ba589e 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -869,7 +869,6 @@ static int pci_unregister_device(DeviceState *dev)
     PCIDevice *pci_dev = PCI_DEVICE(dev);
     PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev);
 
-    pci_unregister_io_regions(pci_dev);
     pci_del_option_rom(pci_dev);
 
     if (pc->exit) {
@@ -880,6 +879,17 @@ static int pci_unregister_device(DeviceState *dev)
     return 0;
 }
 
+static void pci_unmap_device(DeviceState *dev)
+{
+    PCIDevice *pci_dev = PCI_DEVICE(dev);
+    PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev);
+
+    pci_unregister_io_regions(pci_dev);
+    if (pc->unmap) {
+        pc->unmap(pci_dev);
+    }
+}
+
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
                       uint8_t type, MemoryRegion *memory)
 {
@@ -2105,6 +2115,7 @@ static void pci_device_class_init(ObjectClass *klass, void *data)
     DeviceClass *k = DEVICE_CLASS(klass);
     k->init = pci_qdev_init;
     k->unplug = pci_unplug_device;
+    k->unmap = pci_unmap_device;
     k->exit = pci_unregister_device;
     k->bus_type = TYPE_PCI_BUS;
     k->props = pci_props;
diff --git a/hw/pci.h b/hw/pci.h
index 1f902f5..898cc5e 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -154,6 +154,7 @@ typedef struct PCIDeviceClass {
     DeviceClass parent_class;
 
     int (*init)(PCIDevice *dev);
+    void (*unmap)(PCIDevice *dev);
     PCIUnregisterFunc *exit;
     PCIConfigReadFunc *config_read;
     PCIConfigWriteFunc *config_write;
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v6 5/8] memory: introduce local lock for address space
  2012-11-05  5:38 [Qemu-devel] [PATCH v6 0/8] push mmio dispatch out of big lock Liu Ping Fan
                   ` (3 preceding siblings ...)
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 4/8] pci: remove pci device from mem view when unplug Liu Ping Fan
@ 2012-11-05  5:38 ` Liu Ping Fan
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 6/8] memory: make mmio dispatch able to be out of biglock Liu Ping Fan
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Liu Ping Fan @ 2012-11-05  5:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Jan Kiszka, Marcelo Tosatti, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

For those address spaces which want to be able out of big lock, they
will be protected by their own local.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 memory.c |    1 +
 memory.h |    3 +++
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/memory.c b/memory.c
index 2f68d67..18425fd 100644
--- a/memory.c
+++ b/memory.c
@@ -1535,6 +1535,7 @@ void memory_listener_unregister(MemoryListener *listener)
 void address_space_init(AddressSpace *as, MemoryRegion *root)
 {
     memory_region_transaction_begin();
+    qemu_mutex_init(&as->lock);
     as->root = root;
     as->current_map = g_new(FlatView, 1);
     flatview_init(as->current_map);
diff --git a/memory.h b/memory.h
index 79393f1..13a9e3e 100644
--- a/memory.h
+++ b/memory.h
@@ -22,6 +22,7 @@
 #include "cpu-common.h"
 #include "targphys.h"
 #include "qemu-queue.h"
+#include "qemu-thread.h"
 #include "iorange.h"
 #include "ioport.h"
 #include "int128.h"
@@ -164,6 +165,7 @@ typedef struct AddressSpace AddressSpace;
  */
 struct AddressSpace {
     /* All fields are private. */
+    QemuMutex lock;
     const char *name;
     MemoryRegion *root;
     struct FlatView *current_map;
@@ -801,6 +803,7 @@ void mtree_info(fprintf_function mon_printf, void *f);
  *
  * @as: an uninitialized #AddressSpace
  * @root: a #MemoryRegion that routes addesses for the address space
+ * @lock: if true, the physmap protected by local lock, otherwise big lock
  */
 void address_space_init(AddressSpace *as, MemoryRegion *root);
 
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v6 6/8] memory: make mmio dispatch able to be out of biglock
  2012-11-05  5:38 [Qemu-devel] [PATCH v6 0/8] push mmio dispatch out of big lock Liu Ping Fan
                   ` (4 preceding siblings ...)
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 5/8] memory: introduce local lock for address space Liu Ping Fan
@ 2012-11-05  5:38 ` Liu Ping Fan
  2012-11-05  6:45   ` Jan Kiszka
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 7/8] memory: introduce tls context to trace nested mmio request issue Liu Ping Fan
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Liu Ping Fan @ 2012-11-05  5:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Jan Kiszka, Marcelo Tosatti, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Without biglock, we try to protect the mr by increase refcnt.
If we can inc refcnt, go backward and resort to biglock.

Another point is memory radix-tree can be flushed by another
thread, so we should get the copy of terminal mr to survive
from such issue.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 docs/memory.txt   |    4 ++
 exec.c            |  154 +++++++++++++++++++++++++++++++++++++++++++++++-----
 memory-internal.h |    1 +
 memory.h          |    2 +
 4 files changed, 146 insertions(+), 15 deletions(-)

diff --git a/docs/memory.txt b/docs/memory.txt
index 5bbee8e..6b3d15a 100644
--- a/docs/memory.txt
+++ b/docs/memory.txt
@@ -170,3 +170,7 @@ various constraints can be supplied to control how these callbacks are called:
  - .old_portio and .old_mmio can be used to ease porting from code using
    cpu_register_io_memory() and register_ioport().  They should not be used
    in new code.
+
+MMIO regions are provided with ->ref() and ->unref() callbacks; This pair callbacks
+are optional. When ref() return non-zero, Both MemoryRegion and its opaque are
+safe to use.
diff --git a/exec.c b/exec.c
index 750008c..fa34ef9 100644
--- a/exec.c
+++ b/exec.c
@@ -2280,7 +2280,7 @@ static void register_multipage(AddressSpaceDispatch *d, MemoryRegionSection *sec
                   section_index);
 }
 
-static void mem_add(MemoryListener *listener, MemoryRegionSection *section)
+static void mem_nop(MemoryListener *listener, MemoryRegionSection *section)
 {
     AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch, listener);
     MemoryRegionSection now = *section, remain = *section;
@@ -2314,6 +2314,26 @@ static void mem_add(MemoryListener *listener, MemoryRegionSection *section)
     }
 }
 
+static void mem_add(MemoryListener *listener, MemoryRegionSection *section)
+{
+    MemoryRegion *mr = section->mr;
+
+    if (mr->ops && mr->ops->ref) {
+        mr->ops->ref(mr);
+    }
+    mem_nop(listener, section);
+}
+
+static void mem_del(MemoryListener *listener,
+                            MemoryRegionSection *section)
+{
+    MemoryRegion *mr = section->mr;
+
+    if (mr->ops && mr->ops->unref) {
+        mr->ops->unref(mr);
+    }
+}
+
 void qemu_flush_coalesced_mmio_buffer(void)
 {
     if (kvm_enabled())
@@ -3165,11 +3185,23 @@ static void io_mem_init(void)
 static void mem_begin(MemoryListener *listener)
 {
     AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch, listener);
+    AddressSpace *as = d->as;
 
+    /* protect the updating process of mrs in memory core agaist readers */
+    qemu_mutex_lock(&as->lock);
     destroy_all_mappings(d);
     d->phys_map.ptr = PHYS_MAP_NODE_NIL;
 }
 
+static void mem_commit(MemoryListener *listener)
+{
+    AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch,
+                listener);
+    AddressSpace *as = d->as;
+
+    qemu_mutex_unlock(&as->lock);
+}
+
 static void core_begin(MemoryListener *listener)
 {
     phys_sections_clear();
@@ -3243,11 +3275,14 @@ void address_space_init_dispatch(AddressSpace *as)
     d->phys_map  = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
     d->listener = (MemoryListener) {
         .begin = mem_begin,
+        .commit = mem_commit,
         .region_add = mem_add,
-        .region_nop = mem_add,
+        .region_del = mem_del,
+        .region_nop = mem_nop,
         .priority = 0,
     };
     as->dispatch = d;
+    as->dispatch->as = as;
     memory_listener_register(&d->listener, as);
 }
 
@@ -3345,6 +3380,68 @@ static void invalidate_and_set_dirty(target_phys_addr_t addr,
     xen_modified_memory(addr, length);
 }
 
+static MemoryRegionSection *subpage_get_terminal(subpage_t *mmio,
+    target_phys_addr_t addr)
+{
+    MemoryRegionSection *section;
+    unsigned int idx = SUBPAGE_IDX(addr);
+
+    section = &phys_sections[mmio->sub_section[idx]];
+    return section;
+}
+
+static bool memory_region_section_ref(MemoryRegionSection *mrs)
+{
+    MemoryRegion *mr;
+    bool ret = false;
+
+    mr = mrs->mr;
+    if (mr->ops && mr->ops->ref) {
+        mr->ops->ref(mr);
+        ret = true;
+    }
+    return ret;
+}
+
+static void memory_region_section_unref(MemoryRegionSection *mrs)
+{
+    MemoryRegion *mr;
+
+    mr = mrs->mr;
+    if (mr->ops && mr->ops->unref) {
+        mr->ops->unref(mr);
+    }
+}
+
+static bool memory_region_section_lookup_ref(AddressSpaceDispatch *d,
+    target_phys_addr_t addr, MemoryRegionSection *mrs)
+{
+    MemoryRegionSection *section;
+    bool ret;
+
+    section = phys_page_find(d, addr >> TARGET_PAGE_BITS);
+    if (section->mr->subpage) {
+        section = subpage_get_terminal(section->mr->opaque, addr);
+    }
+    *mrs = *section;
+    ret = memory_region_section_ref(mrs);
+
+    return ret;
+}
+
+static bool address_space_section_lookup_ref(AddressSpace *as,
+    target_phys_addr_t page, MemoryRegionSection *mrs)
+{
+    bool safe_ref;
+    AddressSpaceDispatch *d = as->dispatch;
+
+    qemu_mutex_lock(&as->lock);
+    safe_ref = memory_region_section_lookup_ref(d, page, mrs);
+    qemu_mutex_unlock(&as->lock);
+
+    return safe_ref;
+}
+
 void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
                       int len, bool is_write)
 {
@@ -3353,14 +3450,29 @@ void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
     uint8_t *ptr;
     uint32_t val;
     target_phys_addr_t page;
-    MemoryRegionSection *section;
+    bool safe_ref;
+    MemoryRegionSection *section, obj_mrs;
 
     while (len > 0) {
         page = addr & TARGET_PAGE_MASK;
         l = (page + TARGET_PAGE_SIZE) - addr;
         if (l > len)
             l = len;
-        section = phys_page_find(d, page >> TARGET_PAGE_BITS);
+
+        qemu_mutex_lock(&as->lock);
+        safe_ref = memory_region_section_lookup_ref(d, page, &obj_mrs);
+        qemu_mutex_unlock(&as->lock);
+        if (!safe_ref) {
+            qemu_mutex_lock_iothread();
+            qemu_mutex_lock(&as->lock);
+            /* when 2nd try, mem map can change, need to judge it again */
+            safe_ref = memory_region_section_lookup_ref(d, page, &obj_mrs);
+            qemu_mutex_unlock(&as->lock);
+            if (safe_ref) {
+                qemu_mutex_unlock_iothread();
+            }
+        }
+        section = &obj_mrs;
 
         if (is_write) {
             if (!memory_region_is_ram(section->mr)) {
@@ -3425,9 +3537,13 @@ void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
                 qemu_put_ram_ptr(ptr);
             }
         }
+        memory_region_section_unref(&obj_mrs);
         len -= l;
         buf += l;
         addr += l;
+        if (!safe_ref) {
+            qemu_mutex_unlock_iothread();
+        }
     }
 }
 
@@ -3460,18 +3576,19 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
 void cpu_physical_memory_write_rom(target_phys_addr_t addr,
                                    const uint8_t *buf, int len)
 {
-    AddressSpaceDispatch *d = address_space_memory.dispatch;
     int l;
     uint8_t *ptr;
     target_phys_addr_t page;
-    MemoryRegionSection *section;
+    MemoryRegionSection *section, mr_obj;
 
     while (len > 0) {
         page = addr & TARGET_PAGE_MASK;
         l = (page + TARGET_PAGE_SIZE) - addr;
         if (l > len)
             l = len;
-        section = phys_page_find(d, page >> TARGET_PAGE_BITS);
+        address_space_section_lookup_ref(&address_space_memory,
+                page >> TARGET_PAGE_BITS, &mr_obj);
+        section = &mr_obj;
 
         if (!(memory_region_is_ram(section->mr) ||
               memory_region_is_romd(section->mr))) {
@@ -3489,6 +3606,7 @@ void cpu_physical_memory_write_rom(target_phys_addr_t addr,
         len -= l;
         buf += l;
         addr += l;
+        memory_region_section_unref(&mr_obj);
     }
 }
 
@@ -3550,12 +3668,11 @@ void *address_space_map(AddressSpace *as,
                         target_phys_addr_t *plen,
                         bool is_write)
 {
-    AddressSpaceDispatch *d = as->dispatch;
     target_phys_addr_t len = *plen;
     target_phys_addr_t todo = 0;
     int l;
     target_phys_addr_t page;
-    MemoryRegionSection *section;
+    MemoryRegionSection *section, mr_obj;
     ram_addr_t raddr = RAM_ADDR_MAX;
     ram_addr_t rlen;
     void *ret;
@@ -3565,7 +3682,8 @@ void *address_space_map(AddressSpace *as,
         l = (page + TARGET_PAGE_SIZE) - addr;
         if (l > len)
             l = len;
-        section = phys_page_find(d, page >> TARGET_PAGE_BITS);
+        address_space_section_lookup_ref(as, page >> TARGET_PAGE_BITS, &mr_obj);
+        section = &mr_obj;
 
         if (!(memory_region_is_ram(section->mr) && !section->readonly)) {
             if (todo || bounce.buffer) {
@@ -3579,6 +3697,7 @@ void *address_space_map(AddressSpace *as,
             }
 
             *plen = l;
+            memory_region_section_unref(&mr_obj);
             return bounce.buffer;
         }
         if (!todo) {
@@ -3589,6 +3708,7 @@ void *address_space_map(AddressSpace *as,
         len -= l;
         addr += l;
         todo += l;
+        memory_region_section_unref(&mr_obj);
     }
     rlen = todo;
     ret = qemu_ram_ptr_length(raddr, &rlen);
@@ -4197,12 +4317,16 @@ bool virtio_is_big_endian(void)
 #ifndef CONFIG_USER_ONLY
 bool cpu_physical_memory_is_io(target_phys_addr_t phys_addr)
 {
-    MemoryRegionSection *section;
+    MemoryRegionSection *section, mr_obj;
+    bool ret;
 
-    section = phys_page_find(address_space_memory.dispatch,
-                             phys_addr >> TARGET_PAGE_BITS);
-
-    return !(memory_region_is_ram(section->mr) ||
+    address_space_section_lookup_ref(&address_space_memory,
+                             phys_addr >> TARGET_PAGE_BITS, &mr_obj);
+    section = &mr_obj;
+    ret = !(memory_region_is_ram(section->mr) ||
              memory_region_is_romd(section->mr));
+    memory_region_section_unref(&mr_obj);
+
+    return ret;
 }
 #endif
diff --git a/memory-internal.h b/memory-internal.h
index b33a99d..ea06bfb 100644
--- a/memory-internal.h
+++ b/memory-internal.h
@@ -38,6 +38,7 @@ struct AddressSpaceDispatch {
      */
     PhysPageEntry phys_map;
     MemoryListener listener;
+    AddressSpace *as;
 };
 
 void address_space_init_dispatch(AddressSpace *as);
diff --git a/memory.h b/memory.h
index 13a9e3e..704d014 100644
--- a/memory.h
+++ b/memory.h
@@ -67,6 +67,8 @@ struct MemoryRegionOps {
                   target_phys_addr_t addr,
                   uint64_t data,
                   unsigned size);
+    void (*ref)(MemoryRegion *mr);
+    void (*unref)(MemoryRegion *mr);
 
     enum device_endian endianness;
     /* Guest-visible constraints: */
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v6 7/8] memory: introduce tls context to trace nested mmio request issue
  2012-11-05  5:38 [Qemu-devel] [PATCH v6 0/8] push mmio dispatch out of big lock Liu Ping Fan
                   ` (5 preceding siblings ...)
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 6/8] memory: make mmio dispatch able to be out of biglock Liu Ping Fan
@ 2012-11-05  5:38 ` Liu Ping Fan
  2012-11-05  6:57   ` Jan Kiszka
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 8/8] vcpu: push mmio dispatcher out of big lock Liu Ping Fan
  2012-11-05  7:00 ` [Qemu-devel] [PATCH v6 0/8] push mmio dispatch " Jan Kiszka
  8 siblings, 1 reply; 28+ messages in thread
From: Liu Ping Fan @ 2012-11-05  5:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Jan Kiszka, Marcelo Tosatti, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

After breaking down big lock, nested MMIO request which not targeting
at RAM can cause deadlock issue. Supposing the scene: dev_a,b with
fine-grain locks lockA/B, then ABBA dealock issue can be triggered.
We fix this by tracing and rejecting such request.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 exec.c        |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 qemu-thread.h |    7 +++++++
 2 files changed, 54 insertions(+), 0 deletions(-)

diff --git a/exec.c b/exec.c
index fa34ef9..1eb920d 100644
--- a/exec.c
+++ b/exec.c
@@ -3442,6 +3442,48 @@ static bool address_space_section_lookup_ref(AddressSpace *as,
     return safe_ref;
 }
 
+typedef struct ThreadContext {
+  DispatchType dispatch_type;
+  unsigned int mmio_req_pending;
+} ThreadContext;
+
+static __thread ThreadContext thread_context = {
+    .dispatch_type = DISPATCH_INIT,
+    .mmio_req_pending = 0
+};
+
+void qemu_thread_set_dispatch_type(DispatchType type)
+{
+    thread_context.dispatch_type = type;
+}
+
+void qemu_thread_reset_dispatch_type(void)
+{
+    thread_context.dispatch_type = DISPATCH_INIT;
+}
+
+static void address_space_check_inc_req_pending(MemoryRegionSection *section)
+{
+    bool nested = false;
+
+    /* currently, only mmio out of big lock, and need this to avoid dead lock */
+    if (thread_context.dispatch_type == DISPATCH_MMIO) {
+        nested = ++thread_context.mmio_req_pending > 1 ? true : false;
+        /* To fix, will filter iommu case */
+        if (nested && !memory_region_is_ram(section->mr)) {
+            fprintf(stderr, "mmio: nested target not RAM is not support");
+            abort();
+        }
+    }
+}
+
+static void address_space_dec_req_pending(void)
+{
+    if (thread_context.dispatch_type == DISPATCH_MMIO) {
+        thread_context.mmio_req_pending--;
+    }
+}
+
 void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
                       int len, bool is_write)
 {
@@ -3462,6 +3504,8 @@ void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
         qemu_mutex_lock(&as->lock);
         safe_ref = memory_region_section_lookup_ref(d, page, &obj_mrs);
         qemu_mutex_unlock(&as->lock);
+        address_space_check_inc_req_pending(&obj_mrs);
+
         if (!safe_ref) {
             qemu_mutex_lock_iothread();
             qemu_mutex_lock(&as->lock);
@@ -3477,6 +3521,7 @@ void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
         if (is_write) {
             if (!memory_region_is_ram(section->mr)) {
                 target_phys_addr_t addr1;
+
                 addr1 = memory_region_section_addr(section, addr);
                 /* XXX: could force cpu_single_env to NULL to avoid
                    potential bugs */
@@ -3510,6 +3555,7 @@ void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
             if (!(memory_region_is_ram(section->mr) ||
                   memory_region_is_romd(section->mr))) {
                 target_phys_addr_t addr1;
+
                 /* I/O case */
                 addr1 = memory_region_section_addr(section, addr);
                 if (l >= 4 && ((addr1 & 3) == 0)) {
@@ -3537,6 +3583,7 @@ void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
                 qemu_put_ram_ptr(ptr);
             }
         }
+        address_space_dec_req_pending();
         memory_region_section_unref(&obj_mrs);
         len -= l;
         buf += l;
diff --git a/qemu-thread.h b/qemu-thread.h
index 05fdaaf..fc9e17b 100644
--- a/qemu-thread.h
+++ b/qemu-thread.h
@@ -7,6 +7,11 @@
 typedef struct QemuMutex QemuMutex;
 typedef struct QemuCond QemuCond;
 typedef struct QemuThread QemuThread;
+typedef enum {
+  DISPATCH_INIT = 0,
+  DISPATCH_MMIO,
+  DISPATCH_IO,
+} DispatchType;
 
 #ifdef _WIN32
 #include "qemu-thread-win32.h"
@@ -46,4 +51,6 @@ void qemu_thread_get_self(QemuThread *thread);
 bool qemu_thread_is_self(QemuThread *thread);
 void qemu_thread_exit(void *retval);
 
+void qemu_thread_set_dispatch_type(DispatchType type);
+void qemu_thread_reset_dispatch_type(void);
 #endif
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v6 8/8] vcpu: push mmio dispatcher out of big lock
  2012-11-05  5:38 [Qemu-devel] [PATCH v6 0/8] push mmio dispatch out of big lock Liu Ping Fan
                   ` (6 preceding siblings ...)
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 7/8] memory: introduce tls context to trace nested mmio request issue Liu Ping Fan
@ 2012-11-05  5:38 ` Liu Ping Fan
  2012-11-05  7:00 ` [Qemu-devel] [PATCH v6 0/8] push mmio dispatch " Jan Kiszka
  8 siblings, 0 replies; 28+ messages in thread
From: Liu Ping Fan @ 2012-11-05  5:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Jan Kiszka, Marcelo Tosatti, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

To anti the recursive big lock, introduce separate interfaces to allow
address space dispatcher called with/without big lock.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 cpu-common.h |    3 +++
 exec.c       |   22 ++++++++++++++++++----
 kvm-all.c    |    6 +++++-
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/cpu-common.h b/cpu-common.h
index c0d27af..69c1d7a 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -51,6 +51,9 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev);
 
 void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
                             int len, int is_write);
+void cpu_physical_memory_nolock_rw(target_phys_addr_t addr, uint8_t *buf,
+                            int len, int is_write);
+
 static inline void cpu_physical_memory_read(target_phys_addr_t addr,
                                             void *buf, int len)
 {
diff --git a/exec.c b/exec.c
index 1eb920d..73d5242 100644
--- a/exec.c
+++ b/exec.c
@@ -3484,8 +3484,8 @@ static void address_space_dec_req_pending(void)
     }
 }
 
-void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
-                      int len, bool is_write)
+static void address_space_rw_internal(AddressSpace *as, target_phys_addr_t addr,
+                      uint8_t *buf, int len, bool is_write, bool biglock)
 {
     AddressSpaceDispatch *d = as->dispatch;
     int l;
@@ -3506,7 +3506,7 @@ void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
         qemu_mutex_unlock(&as->lock);
         address_space_check_inc_req_pending(&obj_mrs);
 
-        if (!safe_ref) {
+        if (!safe_ref && !biglock) {
             qemu_mutex_lock_iothread();
             qemu_mutex_lock(&as->lock);
             /* when 2nd try, mem map can change, need to judge it again */
@@ -3588,12 +3588,18 @@ void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
         len -= l;
         buf += l;
         addr += l;
-        if (!safe_ref) {
+        if (!safe_ref && !biglock) {
             qemu_mutex_unlock_iothread();
         }
     }
 }
 
+void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
+                      int len, bool is_write)
+{
+    address_space_rw_internal(as, addr, buf, len, is_write, true);
+}
+
 void address_space_write(AddressSpace *as, target_phys_addr_t addr,
                          const uint8_t *buf, int len)
 {
@@ -3619,6 +3625,14 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
     return address_space_rw(&address_space_memory, addr, buf, len, is_write);
 }
 
+void cpu_physical_memory_nolock_rw(target_phys_addr_t addr, uint8_t *buf,
+                            int len, int is_write)
+{
+    return address_space_rw_internal(&address_space_memory, addr, buf, len,
+           is_write, false);
+}
+
+
 /* used for ROM loading : can write in RAM and ROM */
 void cpu_physical_memory_write_rom(target_phys_addr_t addr,
                                    const uint8_t *buf, int len)
diff --git a/kvm-all.c b/kvm-all.c
index c2c6909..718f257 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1573,10 +1573,14 @@ int kvm_cpu_exec(CPUArchState *env)
             break;
         case KVM_EXIT_MMIO:
             DPRINTF("handle_mmio\n");
-            cpu_physical_memory_rw(run->mmio.phys_addr,
+            qemu_mutex_unlock_iothread();
+            qemu_thread_set_dispatch_type(DISPATCH_MMIO);
+            cpu_physical_memory_nolock_rw(run->mmio.phys_addr,
                                    run->mmio.data,
                                    run->mmio.len,
                                    run->mmio.is_write);
+            qemu_thread_reset_dispatch_type();
+            qemu_mutex_lock_iothread();
             ret = 0;
             break;
         case KVM_EXIT_IRQ_WINDOW_OPEN:
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v6 6/8] memory: make mmio dispatch able to be out of biglock
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 6/8] memory: make mmio dispatch able to be out of biglock Liu Ping Fan
@ 2012-11-05  6:45   ` Jan Kiszka
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kiszka @ 2012-11-05  6:45 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 277 bytes --]

On 2012-11-05 06:38, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Without biglock, we try to protect the mr by increase refcnt.
> If we can inc refcnt, go backward and resort to biglock.

s/can/cannot/ - could create confusion.

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v6 7/8] memory: introduce tls context to trace nested mmio request issue
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 7/8] memory: introduce tls context to trace nested mmio request issue Liu Ping Fan
@ 2012-11-05  6:57   ` Jan Kiszka
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kiszka @ 2012-11-05  6:57 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2714 bytes --]

On 2012-11-05 06:38, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> After breaking down big lock, nested MMIO request which not targeting
> at RAM can cause deadlock issue. Supposing the scene: dev_a,b with
> fine-grain locks lockA/B, then ABBA dealock issue can be triggered.
> We fix this by tracing and rejecting such request.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  exec.c        |   47 +++++++++++++++++++++++++++++++++++++++++++++++
>  qemu-thread.h |    7 +++++++
>  2 files changed, 54 insertions(+), 0 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index fa34ef9..1eb920d 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -3442,6 +3442,48 @@ static bool address_space_section_lookup_ref(AddressSpace *as,
>      return safe_ref;
>  }
>  
> +typedef struct ThreadContext {
> +  DispatchType dispatch_type;
> +  unsigned int mmio_req_pending;
> +} ThreadContext;
> +
> +static __thread ThreadContext thread_context = {
          ^^^^^^^^
Again, you will have to work on qemu-tls.h and then use DEFINE_TLS. The
above is not portable.

> +    .dispatch_type = DISPATCH_INIT,
> +    .mmio_req_pending = 0
> +};
> +
> +void qemu_thread_set_dispatch_type(DispatchType type)
> +{
> +    thread_context.dispatch_type = type;
> +}
> +
> +void qemu_thread_reset_dispatch_type(void)
> +{
> +    thread_context.dispatch_type = DISPATCH_INIT;
> +}
> +
> +static void address_space_check_inc_req_pending(MemoryRegionSection *section)
> +{
> +    bool nested = false;
> +
> +    /* currently, only mmio out of big lock, and need this to avoid dead lock */
> +    if (thread_context.dispatch_type == DISPATCH_MMIO) {
> +        nested = ++thread_context.mmio_req_pending > 1 ? true : false;
> +        /* To fix, will filter iommu case */
> +        if (nested && !memory_region_is_ram(section->mr)) {
> +            fprintf(stderr, "mmio: nested target not RAM is not support");
> +            abort();
> +        }
> +    }

This should already take PIO into account, thus all scenarios: If we are
dispatching MMIO or PIO, reject any further requests that are not
targeting RAM.

I don't think we need mmio_req_pending for this. We are not interested
in differentiating between MMIO and PIO, both will be problematic. We
just store the information if a request is going on in the TLS variable
here, not before entering cpu_physical_memory_xxx. And then we can
simply bail out if another non-RAM request is arriving, the nesting
level will never be >1.

And with bailing out I mean warn once + ignore request, not abort().
This would be a needless guest triggerable VM termination.

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v6 0/8] push mmio dispatch out of big lock
  2012-11-05  5:38 [Qemu-devel] [PATCH v6 0/8] push mmio dispatch out of big lock Liu Ping Fan
                   ` (7 preceding siblings ...)
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 8/8] vcpu: push mmio dispatcher out of big lock Liu Ping Fan
@ 2012-11-05  7:00 ` Jan Kiszka
  2012-11-09  6:23   ` liu ping fan
  8 siblings, 1 reply; 28+ messages in thread
From: Jan Kiszka @ 2012-11-05  7:00 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2066 bytes --]

On 2012-11-05 06:38, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> v1:
> https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg03312.html
> 
> v2:
> http://lists.gnu.org/archive/html/qemu-devel/2012-08/msg01275.html
> 
> v3:
> http://lists.nongnu.org/archive/html/qemu-devel/2012-09/msg01474.html
> 
> v4:
> http://lists.gnu.org/archive/html/qemu-devel/2012-10/msg03857.html
> 
> v5:
> https://lists.gnu.org/archive/html/qemu-devel/2012-10/msg04867.html
> 
> changes v5->v6:
>  Apply fine-grain lock for all address space.
>  Introduce separated interface to allow mmio dispatcher called with/without big lock.
> 
> Liu Ping Fan (8):
>   atomic: introduce atomic operations
>   qom: apply atomic on object's refcount
>   hotplug: introduce qdev_unplug_complete() to remove device from views
>   pci: remove pci device from mem view when unplug
>   memory: introduce local lock for address space
>   memory: make mmio dispatch able to be out of biglock
>   memory: introduce tls context to trace nested mmio request issue
>   vcpu: push mmio dispatcher out of big lock
> 
>  cpu-common.h          |    3 +
>  docs/memory.txt       |    4 +
>  exec.c                |  219 +++++++++++++++++++++++++++++++++++++++++++++----
>  hw/acpi_piix4.c       |    2 +-
>  hw/pci.c              |   13 +++-
>  hw/pci.h              |    1 +
>  hw/qdev.c             |   26 ++++++
>  hw/qdev.h             |    3 +-
>  include/qemu/atomic.h |   63 ++++++++++++++
>  include/qemu/object.h |    3 +-
>  kvm-all.c             |    6 +-
>  memory-internal.h     |    1 +
>  memory.c              |    1 +
>  memory.h              |    5 +
>  qemu-thread.h         |    7 ++
>  qom/object.c          |   11 +--
>  16 files changed, 340 insertions(+), 28 deletions(-)
>  create mode 100644 include/qemu/atomic.h
> 

Very good! My feeling is we are getting closer.

There are some minor style issues I'm not yet commenting on. We can go
through this once everyone is happy with the design.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v6 0/8] push mmio dispatch out of big lock
  2012-11-05  7:00 ` [Qemu-devel] [PATCH v6 0/8] push mmio dispatch " Jan Kiszka
@ 2012-11-09  6:23   ` liu ping fan
  2012-11-09  8:15     ` Jan Kiszka
  0 siblings, 1 reply; 28+ messages in thread
From: liu ping fan @ 2012-11-09  6:23 UTC (permalink / raw)
  To: Jan Kiszka, Marcelo Tosatti, Avi Kivity, Anthony Liguori
  Cc: Peter Maydell, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

Ping? Any further comments?

Thanks and regards,
Pingfan

On Mon, Nov 5, 2012 at 3:00 PM, Jan Kiszka <jan.kiszka@web.de> wrote:
> On 2012-11-05 06:38, Liu Ping Fan wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> v1:
>> https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg03312.html
>>
>> v2:
>> http://lists.gnu.org/archive/html/qemu-devel/2012-08/msg01275.html
>>
>> v3:
>> http://lists.nongnu.org/archive/html/qemu-devel/2012-09/msg01474.html
>>
>> v4:
>> http://lists.gnu.org/archive/html/qemu-devel/2012-10/msg03857.html
>>
>> v5:
>> https://lists.gnu.org/archive/html/qemu-devel/2012-10/msg04867.html
>>
>> changes v5->v6:
>>  Apply fine-grain lock for all address space.
>>  Introduce separated interface to allow mmio dispatcher called with/without big lock.
>>
>> Liu Ping Fan (8):
>>   atomic: introduce atomic operations
>>   qom: apply atomic on object's refcount
>>   hotplug: introduce qdev_unplug_complete() to remove device from views
>>   pci: remove pci device from mem view when unplug
>>   memory: introduce local lock for address space
>>   memory: make mmio dispatch able to be out of biglock
>>   memory: introduce tls context to trace nested mmio request issue
>>   vcpu: push mmio dispatcher out of big lock
>>
>>  cpu-common.h          |    3 +
>>  docs/memory.txt       |    4 +
>>  exec.c                |  219 +++++++++++++++++++++++++++++++++++++++++++++----
>>  hw/acpi_piix4.c       |    2 +-
>>  hw/pci.c              |   13 +++-
>>  hw/pci.h              |    1 +
>>  hw/qdev.c             |   26 ++++++
>>  hw/qdev.h             |    3 +-
>>  include/qemu/atomic.h |   63 ++++++++++++++
>>  include/qemu/object.h |    3 +-
>>  kvm-all.c             |    6 +-
>>  memory-internal.h     |    1 +
>>  memory.c              |    1 +
>>  memory.h              |    5 +
>>  qemu-thread.h         |    7 ++
>>  qom/object.c          |   11 +--
>>  16 files changed, 340 insertions(+), 28 deletions(-)
>>  create mode 100644 include/qemu/atomic.h
>>
>
> Very good! My feeling is we are getting closer.
>
> There are some minor style issues I'm not yet commenting on. We can go
> through this once everyone is happy with the design.
>
> Jan
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v6 0/8] push mmio dispatch out of big lock
  2012-11-09  6:23   ` liu ping fan
@ 2012-11-09  8:15     ` Jan Kiszka
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kiszka @ 2012-11-09  8:15 UTC (permalink / raw)
  To: liu ping fan
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1471 bytes --]

On 2012-11-09 07:23, liu ping fan wrote:
> Ping? Any further comments?
> 

Don't expect too much feedback these days. Folks are busy listening to
KVM forum talks, doing networking, enjoying Barcelona and curing their
hangovers. ;)

Anyway, while hacking my talk it became clearer to me that one of the
bigger issues remaining is with the ref/unref callbacks. I don't think
we want that much boilerplate code in the device models that this
approach implies.

The idea I had so far on this is to go back to registering a QOM object
reference with the access callbacks and reference it in the generic code
directly instead of letting the device models do this. We could introduce

struct MemoryRegionOps {
    uint64_t (*read)(void *opaque,
                     hwaddr addr,
                     unsigned size);
    uint64_t (*read_unlocked)(QObject *object,
                     hwaddr addr,
                     unsigned size);

    void (*write)(void *opaque,
                  hwaddr addr,
                  uint64_t data,
                  unsigned size);
    void (*write_unlocked)(QObject *object,
                  hwaddr addr,
                  uint64_t data,
                  unsigned size);

So, device models supporting the lock-less mode would implement the
*_unlocked callbacks, all the rest stay with the simple read/write
versions, leaving the new ones NULL. It's an early idea, not fully
thought through yet.

Jan

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v6 3/8] hotplug: introduce qdev_unplug_complete() to remove device from views
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 3/8] hotplug: introduce qdev_unplug_complete() to remove device from views Liu Ping Fan
@ 2012-11-12  9:27   ` Paolo Bonzini
  2012-11-13  6:12     ` liu ping fan
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2012-11-12  9:27 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Jan Kiszka

Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
> +void qdev_unplug_complete(DeviceState *dev, Error **errp)
> +{
> +    /* isolate from mem view */
> +    qdev_unmap(dev);
> +    /* isolate from device tree */
> +    qdev_unset_parent(dev);
> +    object_unref(OBJECT(dev));

This leaks the device.  I'll send a patch to fix this, please include it
in v7.

Paolo

> +}
> +

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/8] atomic: introduce atomic operations
  2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 1/8] atomic: introduce atomic operations Liu Ping Fan
@ 2012-11-12  9:54   ` Paolo Bonzini
  2012-11-13  6:48     ` liu ping fan
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2012-11-12  9:54 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Jan Kiszka

Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> If out of global lock, we will be challenged by SMP in low level,
> so need atomic ops.
> 
> This file is a wrapper of GCC atomic builtin.

I still object to this.

I know it enforces type-safety, but it is incomplete.  It doesn't
provide neither atomic accesses to pointers, nor useful operations such
as exchange.  It won't be used consistently, because in some places you
just do not have an Atomic value (see both current uses of __sync_*
builtins).

If you can make it complete, and prove it by using it where __sync_* is
used now, or just use gcc builtins directly.

Paolo

> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  include/qemu/atomic.h |   63 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 63 insertions(+), 0 deletions(-)
>  create mode 100644 include/qemu/atomic.h
> 
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> new file mode 100644
> index 0000000..a9e6d35
> --- /dev/null
> +++ b/include/qemu/atomic.h
> @@ -0,0 +1,63 @@
> +/*
> + * Simple wrapper of gcc Atomic-Builtins
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +#ifndef __QEMU_ATOMIC_H
> +#define __QEMU_ATOMIC_H
> +
> +typedef struct Atomic {
> +    volatile int counter;
> +} Atomic;
> +
> +static inline void atomic_set(Atomic *v, int i)
> +{
> +    v->counter = i;
> +}
> +
> +static inline int atomic_read(Atomic *v)
> +{
> +    return v->counter;
> +}
> +
> +static inline int atomic_return_and_add(int i, Atomic *v)
> +{
> +    int ret;
> +
> +    ret = __sync_fetch_and_add(&v->counter, i);
> +    return ret;
> +}
> +
> +static inline int atomic_return_and_sub(int i, Atomic *v)
> +{
> +    int ret;
> +
> +    ret = __sync_fetch_and_sub(&v->counter, i);
> +    return ret;
> +}
> +
> +/**
> + *  * atomic_inc - increment atomic variable
> + *   * @v: pointer of type Atomic
> + *    *
> + *     * Atomically increments @v by 1.
> + *      */
> +static inline void atomic_inc(Atomic *v)
> +{
> +    __sync_fetch_and_add(&v->counter, 1);
> +}
> +
> +/**
> + *  * atomic_dec - decrement atomic variable
> + *   * @v: pointer of type Atomic
> + *    *
> + *     * Atomically decrements @v by 1.
> + *      */
> +static inline void atomic_dec(Atomic *v)
> +{
> +    __sync_fetch_and_sub(&v->counter, 1);
> +}
> +
> +#endif
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v6 3/8] hotplug: introduce qdev_unplug_complete() to remove device from views
  2012-11-12  9:27   ` Paolo Bonzini
@ 2012-11-13  6:12     ` liu ping fan
  0 siblings, 0 replies; 28+ messages in thread
From: liu ping fan @ 2012-11-13  6:12 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Jan Kiszka

On Mon, Nov 12, 2012 at 5:27 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
>> +void qdev_unplug_complete(DeviceState *dev, Error **errp)
>> +{
>> +    /* isolate from mem view */
>> +    qdev_unmap(dev);
>> +    /* isolate from device tree */
>> +    qdev_unset_parent(dev);
>> +    object_unref(OBJECT(dev));
>
> This leaks the device.  I'll send a patch to fix this, please include it
> in v7.
>
OK. Please CC me.

Thanks
Pingfan
> Paolo
>
>> +}
>> +
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/8] atomic: introduce atomic operations
  2012-11-12  9:54   ` Paolo Bonzini
@ 2012-11-13  6:48     ` liu ping fan
  2012-11-13 10:11       ` Paolo Bonzini
  2012-11-13 10:11       ` Paolo Bonzini
  0 siblings, 2 replies; 28+ messages in thread
From: liu ping fan @ 2012-11-13  6:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Jan Kiszka

On Mon, Nov 12, 2012 at 5:54 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> If out of global lock, we will be challenged by SMP in low level,
>> so need atomic ops.
>>
>> This file is a wrapper of GCC atomic builtin.
>
> I still object to this.
>
> I know it enforces type-safety, but it is incomplete.  It doesn't

Although it is incomplete, but the rest cases are rarely used.  Linux
faces such issue, and the "int" version is enough, so I think we can
borrow experience from there.

> provide neither atomic accesses to pointers, nor useful operations such
> as exchange.  It won't be used consistently, because in some places you
> just do not have an Atomic value (see both current uses of __sync_*
> builtins).
>
> If you can make it complete, and prove it by using it where __sync_* is

For current code, __sync_* is used rarely, I think except the
barriers, only two places use it. We can substitute it easily.

Regards,
Pingfan
> used now, or just use gcc builtins directly.
>
> Paolo
>
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  include/qemu/atomic.h |   63 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 63 insertions(+), 0 deletions(-)
>>  create mode 100644 include/qemu/atomic.h
>>
>> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
>> new file mode 100644
>> index 0000000..a9e6d35
>> --- /dev/null
>> +++ b/include/qemu/atomic.h
>> @@ -0,0 +1,63 @@
>> +/*
>> + * Simple wrapper of gcc Atomic-Builtins
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +#ifndef __QEMU_ATOMIC_H
>> +#define __QEMU_ATOMIC_H
>> +
>> +typedef struct Atomic {
>> +    volatile int counter;
>> +} Atomic;
>> +
>> +static inline void atomic_set(Atomic *v, int i)
>> +{
>> +    v->counter = i;
>> +}
>> +
>> +static inline int atomic_read(Atomic *v)
>> +{
>> +    return v->counter;
>> +}
>> +
>> +static inline int atomic_return_and_add(int i, Atomic *v)
>> +{
>> +    int ret;
>> +
>> +    ret = __sync_fetch_and_add(&v->counter, i);
>> +    return ret;
>> +}
>> +
>> +static inline int atomic_return_and_sub(int i, Atomic *v)
>> +{
>> +    int ret;
>> +
>> +    ret = __sync_fetch_and_sub(&v->counter, i);
>> +    return ret;
>> +}
>> +
>> +/**
>> + *  * atomic_inc - increment atomic variable
>> + *   * @v: pointer of type Atomic
>> + *    *
>> + *     * Atomically increments @v by 1.
>> + *      */
>> +static inline void atomic_inc(Atomic *v)
>> +{
>> +    __sync_fetch_and_add(&v->counter, 1);
>> +}
>> +
>> +/**
>> + *  * atomic_dec - decrement atomic variable
>> + *   * @v: pointer of type Atomic
>> + *    *
>> + *     * Atomically decrements @v by 1.
>> + *      */
>> +static inline void atomic_dec(Atomic *v)
>> +{
>> +    __sync_fetch_and_sub(&v->counter, 1);
>> +}
>> +
>> +#endif
>>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/8] atomic: introduce atomic operations
  2012-11-13  6:48     ` liu ping fan
@ 2012-11-13 10:11       ` Paolo Bonzini
  2012-11-14  9:38         ` liu ping fan
  2012-11-13 10:11       ` Paolo Bonzini
  1 sibling, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2012-11-13 10:11 UTC (permalink / raw)
  To: liu ping fan
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Jan Kiszka

> > Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
> > > From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> > >
> > > If out of global lock, we will be challenged by SMP in low level,
> > > so need atomic ops.
> > >
> > > This file is a wrapper of GCC atomic builtin.
> >
> > I still object to this.
> >
> > I know it enforces type-safety, but it is incomplete.  It doesn't
> 
> Although it is incomplete, but the rest cases are rarely used.  Linux
> faces such issue, and the "int" version is enough, so I think we can
> borrow experience from there.

One of the two places that use __sync_* require 64-bit accesses.  My
RCU prototype required pointer-sized access, which you cannot make type-
safe without C++ templates.

> > provide neither atomic accesses to pointers, nor useful operations such
> > as exchange.  It won't be used consistently, because in some places you
> > just do not have an Atomic value (see both current uses of __sync_*
> > builtins).
> >
> > If you can make it complete, and prove it by using it where
> > __sync_* is
> 
> For current code, __sync_* is used rarely, I think except the
> barriers, only two places use it. We can substitute it easily.

No, you cannot.  See above, one doesn't use ints and the other is
guest state so migration becomes complicated if you use Atomic.  I'm
happy to be proven wrong, however.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/8] atomic: introduce atomic operations
  2012-11-13  6:48     ` liu ping fan
  2012-11-13 10:11       ` Paolo Bonzini
@ 2012-11-13 10:11       ` Paolo Bonzini
  1 sibling, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2012-11-13 10:11 UTC (permalink / raw)
  To: liu ping fan
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Jan Kiszka

> > Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
> > > From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> > >
> > > If out of global lock, we will be challenged by SMP in low level,
> > > so need atomic ops.
> > >
> > > This file is a wrapper of GCC atomic builtin.
> >
> > I still object to this.
> >
> > I know it enforces type-safety, but it is incomplete.  It doesn't
> 
> Although it is incomplete, but the rest cases are rarely used.  Linux
> faces such issue, and the "int" version is enough, so I think we can
> borrow experience from there.

One of the two places that use __sync_* require 64-bit accesses.  My
RCU prototype required pointer-sized access, which you cannot make type-
safe without C++ templates.

> > provide neither atomic accesses to pointers, nor useful operations such
> > as exchange.  It won't be used consistently, because in some places you
> > just do not have an Atomic value (see both current uses of __sync_*
> > builtins).
> >
> > If you can make it complete, and prove it by using it where
> > __sync_* is
> 
> For current code, __sync_* is used rarely, I think except the
> barriers, only two places use it. We can substitute it easily.

No, you cannot.  See above, one doesn't use ints and the other is
guest state so migration becomes complicated if you use Atomic.  I'm
happy to be proven wrong, however.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/8] atomic: introduce atomic operations
  2012-11-13 10:11       ` Paolo Bonzini
@ 2012-11-14  9:38         ` liu ping fan
  2012-11-14  9:47           ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: liu ping fan @ 2012-11-14  9:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Jan Kiszka

On Tue, Nov 13, 2012 at 6:11 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> > Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
>> > > From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> > >
>> > > If out of global lock, we will be challenged by SMP in low level,
>> > > so need atomic ops.
>> > >
>> > > This file is a wrapper of GCC atomic builtin.
>> >
>> > I still object to this.
>> >
>> > I know it enforces type-safety, but it is incomplete.  It doesn't
>>
>> Although it is incomplete, but the rest cases are rarely used.  Linux
>> faces such issue, and the "int" version is enough, so I think we can
>> borrow experience from there.
>
> One of the two places that use __sync_* require 64-bit accesses.  My
Yes, these two places are not easy to fix.

> RCU prototype required pointer-sized access, which you cannot make type-
But I think that your RCU prototype should rely on atomic of CPU, not
gcc‘s atomic.
Otherwise, it could be slow (I guess something like spinlock there).

Regards,
pingfan

> safe without C++ templates.
>
>> > provide neither atomic accesses to pointers, nor useful operations such
>> > as exchange.  It won't be used consistently, because in some places you
>> > just do not have an Atomic value (see both current uses of __sync_*
>> > builtins).
>> >
>> > If you can make it complete, and prove it by using it where
>> > __sync_* is
>>
>> For current code, __sync_* is used rarely, I think except the
>> barriers, only two places use it. We can substitute it easily.
>
> No, you cannot.  See above, one doesn't use ints and the other is
> guest state so migration becomes complicated if you use Atomic.  I'm
> happy to be proven wrong, however.
>
> Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/8] atomic: introduce atomic operations
  2012-11-14  9:38         ` liu ping fan
@ 2012-11-14  9:47           ` Paolo Bonzini
  2012-11-15  7:47             ` liu ping fan
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2012-11-14  9:47 UTC (permalink / raw)
  To: liu ping fan
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Jan Kiszka

Il 14/11/2012 10:38, liu ping fan ha scritto:
> On Tue, Nov 13, 2012 at 6:11 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>> Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
>>>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>>>
>>>>> If out of global lock, we will be challenged by SMP in low level,
>>>>> so need atomic ops.
>>>>>
>>>>> This file is a wrapper of GCC atomic builtin.
>>>>
>>>> I still object to this.
>>>>
>>>> I know it enforces type-safety, but it is incomplete.  It doesn't
>>>
>>> Although it is incomplete, but the rest cases are rarely used.  Linux
>>> faces such issue, and the "int" version is enough, so I think we can
>>> borrow experience from there.
>>
>> One of the two places that use __sync_* require 64-bit accesses.  My
> Yes, these two places are not easy to fix.

Which shows that Linux's atomic_t is not suited for QEMU, in my opinion.

>> RCU prototype required pointer-sized access, which you cannot make type-
> But I think that your RCU prototype should rely on atomic of CPU, not
> gcc‘s atomic.

What's the difference?  gcc's atomic produces the same instructions as
hand-written assembly (or should).

> Otherwise, it could be slow (I guess something like spinlock there).

Not sure what you mean.  I'm using two things: 1) write barriers; 2)
atomic_xchg of a pointer for fast, wait-free enqueuing of call_rcu
callbacks.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/8] atomic: introduce atomic operations
  2012-11-14  9:47           ` Paolo Bonzini
@ 2012-11-15  7:47             ` liu ping fan
  2012-11-15 11:24               ` Paolo Bonzini
                                 ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: liu ping fan @ 2012-11-15  7:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Jan Kiszka

On Wed, Nov 14, 2012 at 5:47 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 14/11/2012 10:38, liu ping fan ha scritto:
>> On Tue, Nov 13, 2012 at 6:11 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>> Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
>>>>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>>>>
>>>>>> If out of global lock, we will be challenged by SMP in low level,
>>>>>> so need atomic ops.
>>>>>>
>>>>>> This file is a wrapper of GCC atomic builtin.
>>>>>
>>>>> I still object to this.
>>>>>
>>>>> I know it enforces type-safety, but it is incomplete.  It doesn't
>>>>
>>>> Although it is incomplete, but the rest cases are rarely used.  Linux
>>>> faces such issue, and the "int" version is enough, so I think we can
>>>> borrow experience from there.
>>>
>>> One of the two places that use __sync_* require 64-bit accesses.  My
>> Yes, these two places are not easy to fix.
>
> Which shows that Linux's atomic_t is not suited for QEMU, in my opinion.
>
>>> RCU prototype required pointer-sized access, which you cannot make type-
>> But I think that your RCU prototype should rely on atomic of CPU, not
>> gcc‘s atomic.
>
> What's the difference?  gcc's atomic produces the same instructions as
> hand-written assembly (or should).
>
Probably I made a mistake here, in vhost,  log =
__sync_fetch_and_and(from, 0)  is used to fetch 64bits atomically in
the case  32bits qemu running on 64bits linux.  Right?   But how can
we read 32bits twice in atomic?  Seem that no instruction like "_lock
xchg" for this ops.  So I guess _sync_fetch_and_and() based on
something like spinlock.

And I think the broken issue is caused by vhost thread updates log,
while qemu read out it not atomicly, Right?

>> Otherwise, it could be slow (I guess something like spinlock there).
>
> Not sure what you mean.  I'm using two things: 1) write barriers; 2)
> atomic_xchg of a pointer for fast, wait-free enqueuing of call_rcu
> callbacks.
>
Got your main idea. Will go through your prototype. Just one more
question, why here atomic_xchg needed?  Could not the sequence "read
old pointer, set new pointer" satisfy your requirement?

Regards,
Pingfan

> Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/8] atomic: introduce atomic operations
  2012-11-15  7:47             ` liu ping fan
@ 2012-11-15 11:24               ` Paolo Bonzini
  2012-11-16  0:03               ` Richard Henderson
  2012-11-18 10:04               ` Avi Kivity
  2 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2012-11-15 11:24 UTC (permalink / raw)
  To: liu ping fan
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Jan Kiszka

Il 15/11/2012 08:47, liu ping fan ha scritto:
>>>> RCU prototype required pointer-sized access, which you cannot make type-
>>> But I think that your RCU prototype should rely on atomic of CPU, not
>>> gcc‘s atomic.
>>
>> What's the difference?  gcc's atomic produces the same instructions as
>> hand-written assembly (or should).
>>
> Probably I made a mistake here, in vhost,  log =
> __sync_fetch_and_and(from, 0)  is used to fetch 64bits atomically in
> the case  32bits qemu running on 64bits linux.  Right?   But how can
> we read 32bits twice in atomic?

You can use cmpxchg8b.

>>> Otherwise, it could be slow (I guess something like spinlock there).
>>
>> Not sure what you mean.  I'm using two things: 1) write barriers; 2)
>> atomic_xchg of a pointer for fast, wait-free enqueuing of call_rcu
>> callbacks.
>
> Got your main idea. Will go through your prototype. Just one more
> question, why here atomic_xchg needed?  Could not the sequence "read
> old pointer, set new pointer" satisfy your requirement?

No, it would be racy.  Say you have this:

     wrong                     right
  -----------------------------------------
     ref(new)                  ref(new)
     old = tail                old = new
     tail = new                xchg(&old, &tail)
     old->next = new           old->next = new
     up(&sem)                  up(&sem)

where sem and tail are global, while old and new are local.  There can
be any number of producers as in the above scheme, and one consumer.  In
the consumer, iteration of the list starts from the second element (the
first element is dummy):

    dummy = new Node;
    tail = dummy;
    for(;;) {
        down(&sem);
        node = dummy->next;
        unref(dummy);
        visit node;
        dummy = node;
    }

Then the invariants are:

- the number of elements N in the list (starting from the dummy node and
following ->next) is always >1.  Because of this, you never have to
touch head when adding an element.

- the number of excess signals in the semaphore sem is always <= N-1.
Because of this, it doesn't matter that there is an instant in time
where tail is not reachable from head.  The element is really in the
list (as far as the consumer goes) only after the semaphore is signaled.

In the wrong version, two threads could read the same pointer:

    x = tail                                  |
                         y = tail             |
    tail = new_x                              |
                         tail = new_y         |
    x->next = new_x                           |   old_tail->next = new_x
    up(&sem)                                  |
                         y->next = new_y      |   old_tail->next = new_x
                         up(&sem)             |

No node points to new_x, so you have 2 nodes reachable from the head
(the dummy node, and new_y) with 2 signals on the semaphore,
contradicting the second invariant.

This instead works:

    x = new_x                                 |
                         y = new_y            |
    xchg(&x, &tail)                           |   tail = x, x = old_tail
                         xchg(&y, &tail)      |   tail = y, y = x
    x->next = new_x                           |   old_tail->next = new_x
    up(&sem)                                  |
                         y->next = new_y      |   x->next = new_y
                         up(&sem)             |

because you have 3 nodes and 2 signals.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/8] atomic: introduce atomic operations
  2012-11-15  7:47             ` liu ping fan
  2012-11-15 11:24               ` Paolo Bonzini
@ 2012-11-16  0:03               ` Richard Henderson
  2012-11-21  5:58                 ` liu ping fan
  2012-11-18 10:04               ` Avi Kivity
  2 siblings, 1 reply; 28+ messages in thread
From: Richard Henderson @ 2012-11-16  0:03 UTC (permalink / raw)
  To: liu ping fan
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini

On 2012-11-14 23:47, liu ping fan wrote:
> Probably I made a mistake here, in vhost,  log =
> __sync_fetch_and_and(from, 0)  is used to fetch 64bits atomically in
> the case  32bits qemu running on 64bits linux.  Right?   But how can
> we read 32bits twice in atomic?  Seem that no instruction like "_lock
> xchg" for this ops.  So I guess _sync_fetch_and_and() based on
> something like spinlock.

... or for gcc 4.7 and later,

  log = __atomic_load_n(from, memory_model)

For i386, we will not perform 2 32-bit reads of course.  Paulo suggests
using cmpxchg8b, but that's a tad slow.  Instead we'll perform a 64-bit
read into either the fpu or the sse units, and from there copy the data
wherever it's needed.  Such 64-bit aligned reads are guaranteed to be
atomic for i586 (pentium) and later.

For other 32-bit architectures other possibilities exist.  Recent arm can
use its ldrexd insn.  Many of the 32-bit linux architectures have special
kernel entry points or schemes to perform atomic operations.  These are
generally based on the assumption of a single-processor system, and are
arranged to either disable interrupts or notice that no interrupt occurred,
while executing a code region.

As an ultimate fallback, yes we would use locks.  But none of the host
architectures that QEMU supports needs to do so.

r~

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/8] atomic: introduce atomic operations
  2012-11-15  7:47             ` liu ping fan
  2012-11-15 11:24               ` Paolo Bonzini
  2012-11-16  0:03               ` Richard Henderson
@ 2012-11-18 10:04               ` Avi Kivity
  2012-11-21  5:57                 ` liu ping fan
  2 siblings, 1 reply; 28+ messages in thread
From: Avi Kivity @ 2012-11-18 10:04 UTC (permalink / raw)
  To: liu ping fan
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka, Paolo Bonzini

On 11/15/2012 09:47 AM, liu ping fan wrote:
> On Wed, Nov 14, 2012 at 5:47 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 14/11/2012 10:38, liu ping fan ha scritto:
>>> On Tue, Nov 13, 2012 at 6:11 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>>> Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
>>>>>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>>>>>
>>>>>>> If out of global lock, we will be challenged by SMP in low level,
>>>>>>> so need atomic ops.
>>>>>>>
>>>>>>> This file is a wrapper of GCC atomic builtin.
>>>>>>
>>>>>> I still object to this.
>>>>>>
>>>>>> I know it enforces type-safety, but it is incomplete.  It doesn't
>>>>>
>>>>> Although it is incomplete, but the rest cases are rarely used.  Linux
>>>>> faces such issue, and the "int" version is enough, so I think we can
>>>>> borrow experience from there.
>>>>
>>>> One of the two places that use __sync_* require 64-bit accesses.  My
>>> Yes, these two places are not easy to fix.
>>
>> Which shows that Linux's atomic_t is not suited for QEMU, in my opinion.
>>
>>>> RCU prototype required pointer-sized access, which you cannot make type-
>>> But I think that your RCU prototype should rely on atomic of CPU, not
>>> gcc‘s atomic.
>>
>> What's the difference?  gcc's atomic produces the same instructions as
>> hand-written assembly (or should).
>>
> Probably I made a mistake here, in vhost,  log =
> __sync_fetch_and_and(from, 0)  is used to fetch 64bits atomically in
> the case  32bits qemu running on 64bits linux.  Right?   But how can
> we read 32bits twice in atomic?  Seem that no instruction like "_lock
> xchg" for this ops.  So I guess _sync_fetch_and_and() based on
> something like spinlock.
> 
> And I think the broken issue is caused by vhost thread updates log,
> while qemu read out it not atomicly, Right?

For the log, 32-bit sync_fetch_and_and() is sufficient.  We only need to
ensure no bits are lost, we don't need 64-bit atomicity.



-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/8] atomic: introduce atomic operations
  2012-11-18 10:04               ` Avi Kivity
@ 2012-11-21  5:57                 ` liu ping fan
  0 siblings, 0 replies; 28+ messages in thread
From: liu ping fan @ 2012-11-21  5:57 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka, Paolo Bonzini

On Sun, Nov 18, 2012 at 6:04 PM, Avi Kivity <avi@redhat.com> wrote:
> On 11/15/2012 09:47 AM, liu ping fan wrote:
>> On Wed, Nov 14, 2012 at 5:47 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> Il 14/11/2012 10:38, liu ping fan ha scritto:
>>>> On Tue, Nov 13, 2012 at 6:11 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>>>> Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
>>>>>>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>>>>>>
>>>>>>>> If out of global lock, we will be challenged by SMP in low level,
>>>>>>>> so need atomic ops.
>>>>>>>>
>>>>>>>> This file is a wrapper of GCC atomic builtin.
>>>>>>>
>>>>>>> I still object to this.
>>>>>>>
>>>>>>> I know it enforces type-safety, but it is incomplete.  It doesn't
>>>>>>
>>>>>> Although it is incomplete, but the rest cases are rarely used.  Linux
>>>>>> faces such issue, and the "int" version is enough, so I think we can
>>>>>> borrow experience from there.
>>>>>
>>>>> One of the two places that use __sync_* require 64-bit accesses.  My
>>>> Yes, these two places are not easy to fix.
>>>
>>> Which shows that Linux's atomic_t is not suited for QEMU, in my opinion.
>>>
>>>>> RCU prototype required pointer-sized access, which you cannot make type-
>>>> But I think that your RCU prototype should rely on atomic of CPU, not
>>>> gcc‘s atomic.
>>>
>>> What's the difference?  gcc's atomic produces the same instructions as
>>> hand-written assembly (or should).
>>>
>> Probably I made a mistake here, in vhost,  log =
>> __sync_fetch_and_and(from, 0)  is used to fetch 64bits atomically in
>> the case  32bits qemu running on 64bits linux.  Right?   But how can
>> we read 32bits twice in atomic?  Seem that no instruction like "_lock
>> xchg" for this ops.  So I guess _sync_fetch_and_and() based on
>> something like spinlock.
>>
>> And I think the broken issue is caused by vhost thread updates log,
>> while qemu read out it not atomicly, Right?
>
> For the log, 32-bit sync_fetch_and_and() is sufficient.  We only need to
> ensure no bits are lost, we don't need 64-bit atomicity.
>
Read vhost related code, just find it is quite different from kvm log.
And understand it.

Thanks and regards,
Pingfan
>
>
> --
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/8] atomic: introduce atomic operations
  2012-11-16  0:03               ` Richard Henderson
@ 2012-11-21  5:58                 ` liu ping fan
  0 siblings, 0 replies; 28+ messages in thread
From: liu ping fan @ 2012-11-21  5:58 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini

On Fri, Nov 16, 2012 at 8:03 AM, Richard Henderson <rth@twiddle.net> wrote:
> On 2012-11-14 23:47, liu ping fan wrote:
>> Probably I made a mistake here, in vhost,  log =
>> __sync_fetch_and_and(from, 0)  is used to fetch 64bits atomically in
>> the case  32bits qemu running on 64bits linux.  Right?   But how can
>> we read 32bits twice in atomic?  Seem that no instruction like "_lock
>> xchg" for this ops.  So I guess _sync_fetch_and_and() based on
>> something like spinlock.
>
> ... or for gcc 4.7 and later,
>
>   log = __atomic_load_n(from, memory_model)
>
> For i386, we will not perform 2 32-bit reads of course.  Paulo suggests
> using cmpxchg8b, but that's a tad slow.  Instead we'll perform a 64-bit
> read into either the fpu or the sse units, and from there copy the data
> wherever it's needed.  Such 64-bit aligned reads are guaranteed to be
> atomic for i586 (pentium) and later.
>
> For other 32-bit architectures other possibilities exist.  Recent arm can
> use its ldrexd insn.  Many of the 32-bit linux architectures have special
> kernel entry points or schemes to perform atomic operations.  These are
> generally based on the assumption of a single-processor system, and are
> arranged to either disable interrupts or notice that no interrupt occurred,
> while executing a code region.
>
> As an ultimate fallback, yes we would use locks.  But none of the host
> architectures that QEMU supports needs to do so.
>
Very appreciate for you so detailed description.

Regards,
Pingfan
>
> r~

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2012-11-21  5:58 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-05  5:38 [Qemu-devel] [PATCH v6 0/8] push mmio dispatch out of big lock Liu Ping Fan
2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 1/8] atomic: introduce atomic operations Liu Ping Fan
2012-11-12  9:54   ` Paolo Bonzini
2012-11-13  6:48     ` liu ping fan
2012-11-13 10:11       ` Paolo Bonzini
2012-11-14  9:38         ` liu ping fan
2012-11-14  9:47           ` Paolo Bonzini
2012-11-15  7:47             ` liu ping fan
2012-11-15 11:24               ` Paolo Bonzini
2012-11-16  0:03               ` Richard Henderson
2012-11-21  5:58                 ` liu ping fan
2012-11-18 10:04               ` Avi Kivity
2012-11-21  5:57                 ` liu ping fan
2012-11-13 10:11       ` Paolo Bonzini
2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 2/8] qom: apply atomic on object's refcount Liu Ping Fan
2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 3/8] hotplug: introduce qdev_unplug_complete() to remove device from views Liu Ping Fan
2012-11-12  9:27   ` Paolo Bonzini
2012-11-13  6:12     ` liu ping fan
2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 4/8] pci: remove pci device from mem view when unplug Liu Ping Fan
2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 5/8] memory: introduce local lock for address space Liu Ping Fan
2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 6/8] memory: make mmio dispatch able to be out of biglock Liu Ping Fan
2012-11-05  6:45   ` Jan Kiszka
2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 7/8] memory: introduce tls context to trace nested mmio request issue Liu Ping Fan
2012-11-05  6:57   ` Jan Kiszka
2012-11-05  5:38 ` [Qemu-devel] [PATCH v6 8/8] vcpu: push mmio dispatcher out of big lock Liu Ping Fan
2012-11-05  7:00 ` [Qemu-devel] [PATCH v6 0/8] push mmio dispatch " Jan Kiszka
2012-11-09  6:23   ` liu ping fan
2012-11-09  8:15     ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.