All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock
@ 2012-10-22  9:23 Liu Ping Fan
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 01/16] atomic: introduce atomic operations Liu Ping Fan
                   ` (16 more replies)
  0 siblings, 17 replies; 102+ messages in thread
From: Liu Ping Fan @ 2012-10-22  9:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Marcelo Tosatti, Avi Kivity, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

v1:
https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg03312.html

v2:
http://lists.gnu.org/archive/html/qemu-devel/2012-08/msg01275.html

v3:
http://lists.nongnu.org/archive/html/qemu-devel/2012-09/msg01474.html

changes v3->v4:
  Drop reclaimer which delays the release of DeviceState. Instead, use DeviceState::unmap()
to sync no other subsystem ref to the DeviceState.
  Drop the requirement for the recursive big lock. Instead, when in runtime, use tls to extract
  the caller's context info.


Todo:
Will rebased onto Avi's patch "Integrate DMA into the memory API"




Liu Ping Fan (16):
  atomic: introduce atomic operations
  qom: apply atomic on object's refcount
  hotplug: introduce qdev_unplug_complete() to remove device from views
  pci: remove pci device from mem view when unplug
  memory: introduce ref,unref interface for MemoryRegionOps
  memory: document ref, unref interface
  memory: make mmio dispatch able to be out of biglock
  QemuThread: make QemuThread as tls to store extra info
  memory: introduce mmio request pending to anti nested DMA
  memory: introduce lock ops for  MemoryRegionOps
  vcpu: push mmio dispatcher out of big lock
  e1000: apply fine lock on e1000
  e1000: add busy flag to anti broken device state
  qdev: introduce stopping state
  e1000: introduce unmap() to fix unplug issue
  e1000: implement MemoryRegionOps's ref&lock interface

 cpus.c                |   15 +++++
 docs/memory.txt       |    5 ++
 exec.c                |  169 +++++++++++++++++++++++++++++++++++++++++++------
 hw/acpi_piix4.c       |    2 +-
 hw/e1000.c            |  104 ++++++++++++++++++++++++++++---
 hw/hw.h               |    1 +
 hw/pci.c              |   13 ++++-
 hw/pci.h              |    1 +
 hw/qdev.c             |   26 ++++++++
 hw/qdev.h             |    4 +-
 include/qemu/atomic.h |   63 ++++++++++++++++++
 include/qemu/object.h |    3 +-
 kvm-all.c             |    5 ++
 memory.c              |   16 +++++-
 memory.h              |    5 ++
 qemu-thread-posix.c   |    7 ++
 qemu-thread-posix.h   |    5 ++
 qemu-thread.h         |    3 +
 qom/object.c          |   11 ++--
 vl.c                  |    6 ++
 20 files changed, 426 insertions(+), 38 deletions(-)
 create mode 100644 include/qemu/atomic.h

-- 
1.7.4.4

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Qemu-devel] [patch v4 01/16] atomic: introduce atomic operations
  2012-10-22  9:23 [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Liu Ping Fan
@ 2012-10-22  9:23 ` Liu Ping Fan
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 02/16] qom: apply atomic on object's refcount Liu Ping Fan
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 102+ messages in thread
From: Liu Ping Fan @ 2012-10-22  9:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Marcelo Tosatti, Avi Kivity, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

If out of global lock, we will be challenged by SMP in low level,
so need atomic ops.

This file is a wrapper of GCC atomic builtin.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 include/qemu/atomic.h |   63 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 63 insertions(+), 0 deletions(-)
 create mode 100644 include/qemu/atomic.h

diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
new file mode 100644
index 0000000..a9e6d35
--- /dev/null
+++ b/include/qemu/atomic.h
@@ -0,0 +1,63 @@
+/*
+ * Simple wrapper of gcc Atomic-Builtins
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#ifndef __QEMU_ATOMIC_H
+#define __QEMU_ATOMIC_H
+
+typedef struct Atomic {
+    volatile int counter;
+} Atomic;
+
+static inline void atomic_set(Atomic *v, int i)
+{
+    v->counter = i;
+}
+
+static inline int atomic_read(Atomic *v)
+{
+    return v->counter;
+}
+
+static inline int atomic_return_and_add(int i, Atomic *v)
+{
+    int ret;
+
+    ret = __sync_fetch_and_add(&v->counter, i);
+    return ret;
+}
+
+static inline int atomic_return_and_sub(int i, Atomic *v)
+{
+    int ret;
+
+    ret = __sync_fetch_and_sub(&v->counter, i);
+    return ret;
+}
+
+/**
+ *  * atomic_inc - increment atomic variable
+ *   * @v: pointer of type Atomic
+ *    *
+ *     * Atomically increments @v by 1.
+ *      */
+static inline void atomic_inc(Atomic *v)
+{
+    __sync_fetch_and_add(&v->counter, 1);
+}
+
+/**
+ *  * atomic_dec - decrement atomic variable
+ *   * @v: pointer of type Atomic
+ *    *
+ *     * Atomically decrements @v by 1.
+ *      */
+static inline void atomic_dec(Atomic *v)
+{
+    __sync_fetch_and_sub(&v->counter, 1);
+}
+
+#endif
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [patch v4 02/16] qom: apply atomic on object's refcount
  2012-10-22  9:23 [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Liu Ping Fan
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 01/16] atomic: introduce atomic operations Liu Ping Fan
@ 2012-10-22  9:23 ` Liu Ping Fan
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 03/16] hotplug: introduce qdev_unplug_complete() to remove device from views Liu Ping Fan
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 102+ messages in thread
From: Liu Ping Fan @ 2012-10-22  9:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Marcelo Tosatti, Avi Kivity, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 include/qemu/object.h |    3 ++-
 qom/object.c          |   11 +++++------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/qemu/object.h b/include/qemu/object.h
index cc75fee..0c02614 100644
--- a/include/qemu/object.h
+++ b/include/qemu/object.h
@@ -18,6 +18,7 @@
 #include <stdint.h>
 #include <stdbool.h>
 #include "qemu-queue.h"
+#include "qemu/atomic.h"
 
 struct Visitor;
 struct Error;
@@ -262,7 +263,7 @@ struct Object
     /*< private >*/
     ObjectClass *class;
     QTAILQ_HEAD(, ObjectProperty) properties;
-    uint32_t ref;
+    Atomic ref;
     Object *parent;
 };
 
diff --git a/qom/object.c b/qom/object.c
index e3e9242..34ec2a1 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -383,7 +383,7 @@ void object_finalize(void *data)
     object_deinit(obj, ti);
     object_property_del_all(obj);
 
-    g_assert(obj->ref == 0);
+    g_assert(atomic_read(&obj->ref) == 0);
 }
 
 Object *object_new_with_type(Type type)
@@ -410,7 +410,7 @@ Object *object_new(const char *typename)
 void object_delete(Object *obj)
 {
     object_unparent(obj);
-    g_assert(obj->ref == 1);
+    g_assert(atomic_read(&obj->ref) == 1);
     object_unref(obj);
     g_free(obj);
 }
@@ -600,16 +600,15 @@ GSList *object_class_get_list(const char *implements_type,
 
 void object_ref(Object *obj)
 {
-    obj->ref++;
+    atomic_inc(&obj->ref);
 }
 
 void object_unref(Object *obj)
 {
-    g_assert(obj->ref > 0);
-    obj->ref--;
+    g_assert(atomic_read(&obj->ref) > 0);
 
     /* parent always holds a reference to its children */
-    if (obj->ref == 0) {
+    if (atomic_return_and_sub(1, &obj->ref) == 1) {
         object_finalize(obj);
     }
 }
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [patch v4 03/16] hotplug: introduce qdev_unplug_complete() to remove device from views
  2012-10-22  9:23 [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Liu Ping Fan
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 01/16] atomic: introduce atomic operations Liu Ping Fan
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 02/16] qom: apply atomic on object's refcount Liu Ping Fan
@ 2012-10-22  9:23 ` Liu Ping Fan
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 04/16] pci: remove pci device from mem view when unplug Liu Ping Fan
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 102+ messages in thread
From: Liu Ping Fan @ 2012-10-22  9:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Marcelo Tosatti, Avi Kivity, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

When device unplug has been ack by guest, we first remove it from memory
to prevent incoming access from dispatcher. Then we isolate it from
device composition tree

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/qdev.c |   26 ++++++++++++++++++++++++++
 hw/qdev.h |    3 ++-
 2 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/hw/qdev.c b/hw/qdev.c
index b5a52ac..73df046 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -104,6 +104,14 @@ void qdev_set_parent_bus(DeviceState *dev, BusState *bus)
     bus_add_child(bus, dev);
 }
 
+static void qdev_unset_parent(DeviceState *dev)
+{
+    BusState *b = dev->parent_bus;
+
+    object_unparent(OBJECT(dev));
+    bus_remove_child(b, dev);
+}
+
 /* Create a new device.  This only initializes the device state structure
    and allows properties to be set.  qdev_init should be called to
    initialize the actual device emulation.  */
@@ -193,6 +201,24 @@ void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
     dev->alias_required_for_version = required_for_version;
 }
 
+static int qdev_unmap(DeviceState *dev)
+{
+    DeviceClass *dc =  DEVICE_GET_CLASS(dev);
+    if (dc->unmap) {
+        dc->unmap(dev);
+    }
+    return 0;
+}
+
+void qdev_unplug_complete(DeviceState *dev, Error **errp)
+{
+    /* isolate from mem view */
+    qdev_unmap(dev);
+    /* isolate from device tree */
+    qdev_unset_parent(dev);
+    object_unref(OBJECT(dev));
+}
+
 void qdev_unplug(DeviceState *dev, Error **errp)
 {
     DeviceClass *dc = DEVICE_GET_CLASS(dev);
diff --git a/hw/qdev.h b/hw/qdev.h
index d699194..aeae29e 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -47,7 +47,7 @@ typedef struct DeviceClass {
 
     /* callbacks */
     void (*reset)(DeviceState *dev);
-
+    void (*unmap)(DeviceState *dev);
     /* device state */
     const VMStateDescription *vmsd;
 
@@ -161,6 +161,7 @@ void qdev_init_nofail(DeviceState *dev);
 void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
                                  int required_for_version);
 void qdev_unplug(DeviceState *dev, Error **errp);
+void qdev_unplug_complete(DeviceState *dev, Error **errp);
 void qdev_free(DeviceState *dev);
 int qdev_simple_unplug_cb(DeviceState *dev);
 void qdev_machine_creation_done(void);
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [patch v4 04/16] pci: remove pci device from mem view when unplug
  2012-10-22  9:23 [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Liu Ping Fan
                   ` (2 preceding siblings ...)
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 03/16] hotplug: introduce qdev_unplug_complete() to remove device from views Liu Ping Fan
@ 2012-10-22  9:23 ` Liu Ping Fan
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps Liu Ping Fan
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 102+ messages in thread
From: Liu Ping Fan @ 2012-10-22  9:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Marcelo Tosatti, Avi Kivity, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/acpi_piix4.c |    2 +-
 hw/pci.c        |   13 ++++++++++++-
 hw/pci.h        |    1 +
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index c56220b..a78b0e3 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -305,7 +305,7 @@ static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots)
             if (pc->no_hotplug) {
                 slot_free = false;
             } else {
-                qdev_free(qdev);
+                qdev_unplug_complete(qdev, NULL);
             }
         }
     }
diff --git a/hw/pci.c b/hw/pci.c
index 4d95984..3e2a081 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -850,7 +850,6 @@ static int pci_unregister_device(DeviceState *dev)
     PCIDevice *pci_dev = PCI_DEVICE(dev);
     PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev);
 
-    pci_unregister_io_regions(pci_dev);
     pci_del_option_rom(pci_dev);
 
     if (pc->exit) {
@@ -861,6 +860,17 @@ static int pci_unregister_device(DeviceState *dev)
     return 0;
 }
 
+static void pci_unmap_device(DeviceState *dev)
+{
+    PCIDevice *pci_dev = PCI_DEVICE(dev);
+    PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev);
+
+    pci_unregister_io_regions(pci_dev);
+    if (pc->unmap) {
+        pc->unmap(pci_dev);
+    }
+}
+
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
                       uint8_t type, MemoryRegion *memory)
 {
@@ -2064,6 +2074,7 @@ static void pci_device_class_init(ObjectClass *klass, void *data)
     DeviceClass *k = DEVICE_CLASS(klass);
     k->init = pci_qdev_init;
     k->unplug = pci_unplug_device;
+    k->unmap = pci_unmap_device;
     k->exit = pci_unregister_device;
     k->bus_type = TYPE_PCI_BUS;
     k->props = pci_props;
diff --git a/hw/pci.h b/hw/pci.h
index 4b6ab3d..09bbe2b 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -154,6 +154,7 @@ typedef struct PCIDeviceClass {
     DeviceClass parent_class;
 
     int (*init)(PCIDevice *dev);
+    void (*unmap)(PCIDevice *dev);
     PCIUnregisterFunc *exit;
     PCIConfigReadFunc *config_read;
     PCIConfigWriteFunc *config_write;
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-22  9:23 [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Liu Ping Fan
                   ` (3 preceding siblings ...)
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 04/16] pci: remove pci device from mem view when unplug Liu Ping Fan
@ 2012-10-22  9:23 ` Liu Ping Fan
  2012-10-22  9:38   ` Avi Kivity
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 06/16] memory: document ref, unref interface Liu Ping Fan
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 102+ messages in thread
From: Liu Ping Fan @ 2012-10-22  9:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Marcelo Tosatti, Avi Kivity, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

This pair of interface help to decide when dispatching, whether
we can pin mr without big lock or not.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 memory.h |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/memory.h b/memory.h
index bd1bbae..9039411 100644
--- a/memory.h
+++ b/memory.h
@@ -25,6 +25,7 @@
 #include "iorange.h"
 #include "ioport.h"
 #include "int128.h"
+#include "qemu/object.h"
 
 typedef struct MemoryRegionOps MemoryRegionOps;
 typedef struct MemoryRegion MemoryRegion;
@@ -66,6 +67,8 @@ struct MemoryRegionOps {
                   target_phys_addr_t addr,
                   uint64_t data,
                   unsigned size);
+    int (*ref)(MemoryRegion *mr);
+    void (*unref)(MemoryRegion *mr);
 
     enum device_endian endianness;
     /* Guest-visible constraints: */
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [patch v4 06/16] memory: document ref, unref interface
  2012-10-22  9:23 [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Liu Ping Fan
                   ` (4 preceding siblings ...)
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps Liu Ping Fan
@ 2012-10-22  9:23 ` Liu Ping Fan
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 07/16] memory: make mmio dispatch able to be out of biglock Liu Ping Fan
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 102+ messages in thread
From: Liu Ping Fan @ 2012-10-22  9:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Marcelo Tosatti, Avi Kivity, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 docs/memory.txt |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/docs/memory.txt b/docs/memory.txt
index 5bbee8e..3f88812 100644
--- a/docs/memory.txt
+++ b/docs/memory.txt
@@ -170,3 +170,8 @@ various constraints can be supplied to control how these callbacks are called:
  - .old_portio and .old_mmio can be used to ease porting from code using
    cpu_register_io_memory() and register_ioport().  They should not be used
    in new code.
+
+MMIO regions are provided with ->ref() and ->unref() callbacks; This pair callbacks
+are optional. When ref() return non-zero, Both MemoryRegion and its opaque are
+safe to use.
+
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [patch v4 07/16] memory: make mmio dispatch able to be out of biglock
  2012-10-22  9:23 [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Liu Ping Fan
                   ` (5 preceding siblings ...)
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 06/16] memory: document ref, unref interface Liu Ping Fan
@ 2012-10-22  9:23 ` Liu Ping Fan
  2012-10-23 12:12   ` Jan Kiszka
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 08/16] QemuThread: make QemuThread as tls to store extra info Liu Ping Fan
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 102+ messages in thread
From: Liu Ping Fan @ 2012-10-22  9:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Marcelo Tosatti, Avi Kivity, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

Without biglock, we try to protect the mr by increase refcnt.
If we can inc refcnt, go backward and resort to biglock.

Another point is memory radix-tree can be flushed by another
thread, so we should get the copy of terminal mr to survive
from such issue.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 exec.c |  125 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 117 insertions(+), 8 deletions(-)

diff --git a/exec.c b/exec.c
index 5834766..91b859b 100644
--- a/exec.c
+++ b/exec.c
@@ -200,6 +200,8 @@ struct PhysPageEntry {
     uint16_t ptr : 15;
 };
 
+static QemuMutex mem_map_lock;
+
 /* Simple allocator for PhysPageEntry nodes */
 static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
 static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
@@ -212,6 +214,8 @@ static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
 
 static void io_mem_init(void);
 static void memory_map_init(void);
+static int phys_page_lookup(target_phys_addr_t addr, MemoryRegionSection *mrs);
+
 
 static MemoryRegion io_mem_watch;
 #endif
@@ -2245,6 +2249,7 @@ static void register_subpage(MemoryRegionSection *section)
     subpage_t *subpage;
     target_phys_addr_t base = section->offset_within_address_space
         & TARGET_PAGE_MASK;
+    /* Already under the protection of mem_map_lock */
     MemoryRegionSection *existing = phys_page_find(base >> TARGET_PAGE_BITS);
     MemoryRegionSection subsection = {
         .offset_within_address_space = base,
@@ -3165,6 +3170,8 @@ static void io_mem_init(void)
 
 static void core_begin(MemoryListener *listener)
 {
+    /* protect the updating process of mrs in memory core agaist readers */
+    qemu_mutex_lock(&mem_map_lock);
     destroy_all_mappings();
     phys_sections_clear();
     phys_map.ptr = PHYS_MAP_NODE_NIL;
@@ -3184,17 +3191,32 @@ static void core_commit(MemoryListener *listener)
     for(env = first_cpu; env != NULL; env = env->next_cpu) {
         tlb_flush(env, 1);
     }
+    qemu_mutex_unlock(&mem_map_lock);
 }
 
 static void core_region_add(MemoryListener *listener,
                             MemoryRegionSection *section)
 {
+    MemoryRegion *mr = section->mr;
+
+    if (mr->ops) {
+        if (mr->ops->ref) {
+            mr->ops->ref(mr);
+        }
+    }
     cpu_register_physical_memory_log(section, section->readonly);
 }
 
 static void core_region_del(MemoryListener *listener,
                             MemoryRegionSection *section)
 {
+    MemoryRegion *mr = section->mr;
+
+    if (mr->ops) {
+        if (mr->ops->unref) {
+            mr->ops->unref(mr);
+        }
+    }
 }
 
 static void core_region_nop(MemoryListener *listener,
@@ -3348,6 +3370,8 @@ static void memory_map_init(void)
     memory_region_init(system_io, "io", 65536);
     set_system_io_map(system_io);
 
+    qemu_mutex_init(&mem_map_lock);
+
     memory_listener_register(&core_memory_listener, system_memory);
     memory_listener_register(&io_memory_listener, system_io);
 }
@@ -3406,6 +3430,58 @@ int cpu_memory_rw_debug(CPUArchState *env, target_ulong addr,
 }
 
 #else
+
+static MemoryRegionSection *subpage_get_terminal(subpage_t *mmio,
+    target_phys_addr_t addr)
+{
+    MemoryRegionSection *section;
+    unsigned int idx = SUBPAGE_IDX(addr);
+
+    section = &phys_sections[mmio->sub_section[idx]];
+    return section;
+}
+
+static int memory_region_section_ref(MemoryRegionSection *mrs)
+{
+    MemoryRegion *mr;
+    int ret = 0;
+
+    mr = mrs->mr;
+    if (mr->ops) {
+        if (mr->ops->ref) {
+            ret = mr->ops->ref(mr);
+        }
+    }
+    return ret;
+}
+
+static void memory_region_section_unref(MemoryRegionSection *mrs)
+{
+    MemoryRegion *mr;
+
+    mr = mrs->mr;
+    if (mr->ops) {
+        if (mr->ops->unref) {
+            mr->ops->unref(mr);
+        }
+    }
+}
+
+static int phys_page_lookup(target_phys_addr_t addr, MemoryRegionSection *mrs)
+{
+    MemoryRegionSection *section;
+    int ret;
+
+    section = phys_page_find(addr >> TARGET_PAGE_BITS);
+    if (section->mr->subpage) {
+        section = subpage_get_terminal(section->mr->opaque, addr);
+    }
+    *mrs = *section;
+    ret = memory_region_section_ref(mrs);
+
+    return ret;
+}
+
 void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
                             int len, int is_write)
 {
@@ -3413,14 +3489,28 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
     uint8_t *ptr;
     uint32_t val;
     target_phys_addr_t page;
-    MemoryRegionSection *section;
+    MemoryRegionSection *section, obj_mrs;
+    int safe_ref;
 
     while (len > 0) {
         page = addr & TARGET_PAGE_MASK;
         l = (page + TARGET_PAGE_SIZE) - addr;
         if (l > len)
             l = len;
-        section = phys_page_find(page >> TARGET_PAGE_BITS);
+        qemu_mutex_lock(&mem_map_lock);
+        safe_ref = phys_page_lookup(page, &obj_mrs);
+        qemu_mutex_unlock(&mem_map_lock);
+        if (safe_ref == 0) {
+            qemu_mutex_lock_iothread();
+            qemu_mutex_lock(&mem_map_lock);
+            /* At the 2nd try, mem map can change, so need to judge it again */
+            safe_ref = phys_page_lookup(page, &obj_mrs);
+            qemu_mutex_unlock(&mem_map_lock);
+            if (safe_ref > 0) {
+                qemu_mutex_unlock_iothread();
+            }
+        }
+        section = &obj_mrs;
 
         if (is_write) {
             if (!memory_region_is_ram(section->mr)) {
@@ -3491,10 +3581,16 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
                 qemu_put_ram_ptr(ptr);
             }
         }
+
+        memory_region_section_unref(&obj_mrs);
         len -= l;
         buf += l;
         addr += l;
+        if (safe_ref == 0) {
+            qemu_mutex_unlock_iothread();
+        }
     }
+
 }
 
 /* used for ROM loading : can write in RAM and ROM */
@@ -3504,14 +3600,18 @@ void cpu_physical_memory_write_rom(target_phys_addr_t addr,
     int l;
     uint8_t *ptr;
     target_phys_addr_t page;
-    MemoryRegionSection *section;
+    MemoryRegionSection *section, mr_obj;
 
     while (len > 0) {
         page = addr & TARGET_PAGE_MASK;
         l = (page + TARGET_PAGE_SIZE) - addr;
         if (l > len)
             l = len;
-        section = phys_page_find(page >> TARGET_PAGE_BITS);
+
+        qemu_mutex_lock(&mem_map_lock);
+        phys_page_lookup(page, &mr_obj);
+        qemu_mutex_unlock(&mem_map_lock);
+        section = &mr_obj;
 
         if (!(memory_region_is_ram(section->mr) ||
               memory_region_is_romd(section->mr))) {
@@ -3528,6 +3628,7 @@ void cpu_physical_memory_write_rom(target_phys_addr_t addr,
         len -= l;
         buf += l;
         addr += l;
+        memory_region_section_unref(&mr_obj);
     }
 }
 
@@ -3592,7 +3693,7 @@ void *cpu_physical_memory_map(target_phys_addr_t addr,
     target_phys_addr_t todo = 0;
     int l;
     target_phys_addr_t page;
-    MemoryRegionSection *section;
+    MemoryRegionSection *section, mr_obj;
     ram_addr_t raddr = RAM_ADDR_MAX;
     ram_addr_t rlen;
     void *ret;
@@ -3602,7 +3703,10 @@ void *cpu_physical_memory_map(target_phys_addr_t addr,
         l = (page + TARGET_PAGE_SIZE) - addr;
         if (l > len)
             l = len;
-        section = phys_page_find(page >> TARGET_PAGE_BITS);
+        qemu_mutex_lock(&mem_map_lock);
+        phys_page_lookup(page, &mr_obj);
+        qemu_mutex_unlock(&mem_map_lock);
+        section = &mr_obj;
 
         if (!(memory_region_is_ram(section->mr) && !section->readonly)) {
             if (todo || bounce.buffer) {
@@ -3616,6 +3720,7 @@ void *cpu_physical_memory_map(target_phys_addr_t addr,
             }
 
             *plen = l;
+            memory_region_section_unref(&mr_obj);
             return bounce.buffer;
         }
         if (!todo) {
@@ -3630,6 +3735,7 @@ void *cpu_physical_memory_map(target_phys_addr_t addr,
     rlen = todo;
     ret = qemu_ram_ptr_length(raddr, &rlen);
     *plen = rlen;
+    memory_region_section_unref(&mr_obj);
     return ret;
 }
 
@@ -4239,9 +4345,12 @@ bool virtio_is_big_endian(void)
 #ifndef CONFIG_USER_ONLY
 bool cpu_physical_memory_is_io(target_phys_addr_t phys_addr)
 {
-    MemoryRegionSection *section;
+    MemoryRegionSection *section, mr_obj;
 
-    section = phys_page_find(phys_addr >> TARGET_PAGE_BITS);
+    qemu_mutex_lock(&mem_map_lock);
+    phys_page_lookup(phys_addr, &mr_obj);
+    qemu_mutex_unlock(&mem_map_lock);
+    section = &mr_obj;
 
     return !(memory_region_is_ram(section->mr) ||
              memory_region_is_romd(section->mr));
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [patch v4 08/16] QemuThread: make QemuThread as tls to store extra info
  2012-10-22  9:23 [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Liu Ping Fan
                   ` (6 preceding siblings ...)
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 07/16] memory: make mmio dispatch able to be out of biglock Liu Ping Fan
@ 2012-10-22  9:23 ` Liu Ping Fan
  2012-10-22  9:30   ` Jan Kiszka
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 09/16] memory: introduce mmio request pending to anti nested DMA Liu Ping Fan
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 102+ messages in thread
From: Liu Ping Fan @ 2012-10-22  9:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Marcelo Tosatti, Avi Kivity, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

If mmio dispatch out of big lock, some function's calling context (ie,
holding big lock or not) are different. We need to trace these info in
runtime, and use tls to store them.
By this method, we can avoid to require big lock recursive.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 cpus.c              |    1 +
 qemu-thread-posix.c |    7 +++++++
 qemu-thread-posix.h |    2 ++
 qemu-thread.h       |    1 +
 vl.c                |    6 ++++++
 5 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/cpus.c b/cpus.c
index e476a3c..4cd7f85 100644
--- a/cpus.c
+++ b/cpus.c
@@ -735,6 +735,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
     CPUState *cpu = ENV_GET_CPU(env);
     int r;
 
+    pthread_setspecific(qemu_thread_key, cpu->thread);
     qemu_mutex_lock(&qemu_global_mutex);
     qemu_thread_get_self(cpu->thread);
     env->thread_id = qemu_get_thread_id();
diff --git a/qemu-thread-posix.c b/qemu-thread-posix.c
index 8fbabda..f448fcb 100644
--- a/qemu-thread-posix.c
+++ b/qemu-thread-posix.c
@@ -19,6 +19,8 @@
 #include <string.h>
 #include "qemu-thread.h"
 
+pthread_key_t qemu_thread_key;
+
 static void error_exit(int err, const char *msg)
 {
     fprintf(stderr, "qemu: %s: %s\n", msg, strerror(err));
@@ -151,6 +153,11 @@ void qemu_thread_get_self(QemuThread *thread)
     thread->thread = pthread_self();
 }
 
+void qemu_thread_key_create(void)
+{
+    pthread_key_create(&qemu_thread_key, NULL);
+}
+
 bool qemu_thread_is_self(QemuThread *thread)
 {
    return pthread_equal(pthread_self(), thread->thread);
diff --git a/qemu-thread-posix.h b/qemu-thread-posix.h
index ee4618e..2607b1c 100644
--- a/qemu-thread-posix.h
+++ b/qemu-thread-posix.h
@@ -14,4 +14,6 @@ struct QemuThread {
     pthread_t thread;
 };
 
+extern pthread_key_t qemu_thread_key;
+
 #endif
diff --git a/qemu-thread.h b/qemu-thread.h
index 05fdaaf..4a6427d 100644
--- a/qemu-thread.h
+++ b/qemu-thread.h
@@ -46,4 +46,5 @@ void qemu_thread_get_self(QemuThread *thread);
 bool qemu_thread_is_self(QemuThread *thread);
 void qemu_thread_exit(void *retval);
 
+void qemu_thread_key_create(void);
 #endif
diff --git a/vl.c b/vl.c
index 7c577fa..442479a 100644
--- a/vl.c
+++ b/vl.c
@@ -149,6 +149,7 @@ int main(int argc, char **argv)
 #include "qemu-options.h"
 #include "qmp-commands.h"
 #include "main-loop.h"
+#include "qemu-thread.h"
 #ifdef CONFIG_VIRTFS
 #include "fsdev/qemu-fsdev.h"
 #endif
@@ -2342,6 +2343,7 @@ int qemu_init_main_loop(void)
     return main_loop_init();
 }
 
+
 int main(int argc, char **argv, char **envp)
 {
     int i;
@@ -3483,6 +3485,10 @@ int main(int argc, char **argv, char **envp)
         exit(1);
     }
 
+    qemu_thread_key_create();
+    QemuThread *ioctx = g_malloc0(sizeof(QemuThread));
+    pthread_setspecific(qemu_thread_key, ioctx);
+
     os_set_line_buffering();
 
     if (init_timer_alarm() < 0) {
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [patch v4 09/16] memory: introduce mmio request pending to anti nested DMA
  2012-10-22  9:23 [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Liu Ping Fan
                   ` (7 preceding siblings ...)
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 08/16] QemuThread: make QemuThread as tls to store extra info Liu Ping Fan
@ 2012-10-22  9:23 ` Liu Ping Fan
  2012-10-22 10:28   ` Avi Kivity
  2012-10-23 12:38   ` Gleb Natapov
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 10/16] memory: introduce lock ops for MemoryRegionOps Liu Ping Fan
                   ` (7 subsequent siblings)
  16 siblings, 2 replies; 102+ messages in thread
From: Liu Ping Fan @ 2012-10-22  9:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Marcelo Tosatti, Avi Kivity, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

Rejecting the nested mmio request which does not aim at RAM, so we
can avoid the potential deadlock caused by the random lock sequence
of two device's local lock.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 cpus.c              |   14 ++++++++++++++
 exec.c              |   50 ++++++++++++++++++++++++++++++++++++--------------
 hw/hw.h             |    1 +
 kvm-all.c           |    2 ++
 qemu-thread-posix.h |    3 +++
 qemu-thread.h       |    2 ++
 6 files changed, 58 insertions(+), 14 deletions(-)

diff --git a/cpus.c b/cpus.c
index 4cd7f85..365a512 100644
--- a/cpus.c
+++ b/cpus.c
@@ -729,6 +729,18 @@ static void qemu_kvm_wait_io_event(CPUArchState *env)
     qemu_wait_io_event_common(env);
 }
 
+int get_context_type(void)
+{
+    QemuThread *t = pthread_getspecific(qemu_thread_key);
+    return t->context_type;
+}
+
+void set_context_type(int type)
+{
+    QemuThread *t = pthread_getspecific(qemu_thread_key);
+    t->context_type = type;
+}
+
 static void *qemu_kvm_cpu_thread_fn(void *arg)
 {
     CPUArchState *env = arg;
@@ -736,6 +748,8 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
     int r;
 
     pthread_setspecific(qemu_thread_key, cpu->thread);
+    set_context_type(0);
+
     qemu_mutex_lock(&qemu_global_mutex);
     qemu_thread_get_self(cpu->thread);
     env->thread_id = qemu_get_thread_id();
diff --git a/exec.c b/exec.c
index 91b859b..a0327a1 100644
--- a/exec.c
+++ b/exec.c
@@ -3490,7 +3490,9 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
     uint32_t val;
     target_phys_addr_t page;
     MemoryRegionSection *section, obj_mrs;
-    int safe_ref;
+    int safe_ref, nested_dma = 0;
+    QemuThread *thread = pthread_getspecific(qemu_thread_key);
+    int context = thread->context_type;
 
     while (len > 0) {
         page = addr & TARGET_PAGE_MASK;
@@ -3500,7 +3502,8 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
         qemu_mutex_lock(&mem_map_lock);
         safe_ref = phys_page_lookup(page, &obj_mrs);
         qemu_mutex_unlock(&mem_map_lock);
-        if (safe_ref == 0) {
+
+        if (safe_ref == 0 && context == 1) {
             qemu_mutex_lock_iothread();
             qemu_mutex_lock(&mem_map_lock);
             /* At the 2nd try, mem map can change, so need to judge it again */
@@ -3511,7 +3514,9 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
             }
         }
         section = &obj_mrs;
-
+        if (context == 1) {
+            nested_dma = thread->mmio_request_pending++ > 1 ? 1 : 0;
+        }
         if (is_write) {
             if (!memory_region_is_ram(section->mr)) {
                 target_phys_addr_t addr1;
@@ -3521,17 +3526,23 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
                 if (l >= 4 && ((addr1 & 3) == 0)) {
                     /* 32 bit write access */
                     val = ldl_p(buf);
-                    io_mem_write(section->mr, addr1, val, 4);
+                    if (!nested_dma) {
+                        io_mem_write(section->mr, addr1, val, 4);
+                    }
                     l = 4;
                 } else if (l >= 2 && ((addr1 & 1) == 0)) {
                     /* 16 bit write access */
                     val = lduw_p(buf);
-                    io_mem_write(section->mr, addr1, val, 2);
+                    if (!nested_dma) {
+                        io_mem_write(section->mr, addr1, val, 2);
+                    }
                     l = 2;
                 } else {
                     /* 8 bit write access */
                     val = ldub_p(buf);
-                    io_mem_write(section->mr, addr1, val, 1);
+                    if (!nested_dma) {
+                        io_mem_write(section->mr, addr1, val, 1);
+                    }
                     l = 1;
                 }
             } else if (!section->readonly) {
@@ -3552,24 +3563,31 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
             }
         } else {
             if (!(memory_region_is_ram(section->mr) ||
-                  memory_region_is_romd(section->mr))) {
+                  memory_region_is_romd(section->mr)) &&
+                    !nested_dma) {
                 target_phys_addr_t addr1;
                 /* I/O case */
                 addr1 = memory_region_section_addr(section, addr);
                 if (l >= 4 && ((addr1 & 3) == 0)) {
                     /* 32 bit read access */
-                    val = io_mem_read(section->mr, addr1, 4);
-                    stl_p(buf, val);
+                    if (!nested_dma) {
+                        val = io_mem_read(section->mr, addr1, 4);
+                        stl_p(buf, val);
+                    }
                     l = 4;
                 } else if (l >= 2 && ((addr1 & 1) == 0)) {
                     /* 16 bit read access */
-                    val = io_mem_read(section->mr, addr1, 2);
-                    stw_p(buf, val);
+                    if (!nested_dma) {
+                        val = io_mem_read(section->mr, addr1, 2);
+                        stw_p(buf, val);
+                    }
                     l = 2;
                 } else {
                     /* 8 bit read access */
-                    val = io_mem_read(section->mr, addr1, 1);
-                    stb_p(buf, val);
+                    if (!nested_dma) {
+                        val = io_mem_read(section->mr, addr1, 1);
+                        stb_p(buf, val);
+                    }
                     l = 1;
                 }
             } else {
@@ -3586,7 +3604,11 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
         len -= l;
         buf += l;
         addr += l;
-        if (safe_ref == 0) {
+
+        if (context == 1) {
+            thread->mmio_request_pending--;
+        }
+        if (safe_ref == 0 && context == 1) {
             qemu_mutex_unlock_iothread();
         }
     }
diff --git a/hw/hw.h b/hw/hw.h
index e5cb9bf..935b045 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -12,6 +12,7 @@
 #include "irq.h"
 #include "qemu-file.h"
 #include "vmstate.h"
+#include "qemu-thread.h"
 
 #ifdef NEED_CPU_H
 #if TARGET_LONG_BITS == 64
diff --git a/kvm-all.c b/kvm-all.c
index 34b02c1..b3fa597 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1562,10 +1562,12 @@ int kvm_cpu_exec(CPUArchState *env)
             break;
         case KVM_EXIT_MMIO:
             DPRINTF("handle_mmio\n");
+            set_context_type(1);
             cpu_physical_memory_rw(run->mmio.phys_addr,
                                    run->mmio.data,
                                    run->mmio.len,
                                    run->mmio.is_write);
+            set_context_type(0);
             ret = 0;
             break;
         case KVM_EXIT_IRQ_WINDOW_OPEN:
diff --git a/qemu-thread-posix.h b/qemu-thread-posix.h
index 2607b1c..9fcc6f8 100644
--- a/qemu-thread-posix.h
+++ b/qemu-thread-posix.h
@@ -12,6 +12,9 @@ struct QemuCond {
 
 struct QemuThread {
     pthread_t thread;
+    /* 0 clean; 1 mmio; 2 io */
+    int context_type;
+    int mmio_request_pending;
 };
 
 extern pthread_key_t qemu_thread_key;
diff --git a/qemu-thread.h b/qemu-thread.h
index 4a6427d..88eaf94 100644
--- a/qemu-thread.h
+++ b/qemu-thread.h
@@ -45,6 +45,8 @@ void *qemu_thread_join(QemuThread *thread);
 void qemu_thread_get_self(QemuThread *thread);
 bool qemu_thread_is_self(QemuThread *thread);
 void qemu_thread_exit(void *retval);
+int get_context_type(void);
+void set_context_type(int type);
 
 void qemu_thread_key_create(void);
 #endif
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [patch v4 10/16] memory: introduce lock ops for MemoryRegionOps
  2012-10-22  9:23 [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Liu Ping Fan
                   ` (8 preceding siblings ...)
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 09/16] memory: introduce mmio request pending to anti nested DMA Liu Ping Fan
@ 2012-10-22  9:23 ` Liu Ping Fan
  2012-10-22 10:30   ` Avi Kivity
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 11/16] vcpu: push mmio dispatcher out of big lock Liu Ping Fan
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 102+ messages in thread
From: Liu Ping Fan @ 2012-10-22  9:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Marcelo Tosatti, Avi Kivity, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

This can help memory core to use mr's fine lock to mmio dispatch.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 memory.c |   16 +++++++++++++++-
 memory.h |    2 ++
 2 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/memory.c b/memory.c
index d528d1f..86d5623 100644
--- a/memory.c
+++ b/memory.c
@@ -1505,13 +1505,27 @@ void set_system_io_map(MemoryRegion *mr)
 
 uint64_t io_mem_read(MemoryRegion *mr, target_phys_addr_t addr, unsigned size)
 {
-    return memory_region_dispatch_read(mr, addr, size);
+    uint64_t ret;
+    if (mr->ops->lock) {
+        mr->ops->lock(mr);
+    }
+    ret = memory_region_dispatch_read(mr, addr, size);
+    if (mr->ops->lock) {
+        mr->ops->unlock(mr);
+    }
+    return ret;
 }
 
 void io_mem_write(MemoryRegion *mr, target_phys_addr_t addr,
                   uint64_t val, unsigned size)
 {
+    if (mr->ops->lock) {
+        mr->ops->lock(mr);
+    }
     memory_region_dispatch_write(mr, addr, val, size);
+    if (mr->ops->lock) {
+        mr->ops->unlock(mr);
+    }
 }
 
 typedef struct MemoryRegionList MemoryRegionList;
diff --git a/memory.h b/memory.h
index 9039411..5d00066 100644
--- a/memory.h
+++ b/memory.h
@@ -69,6 +69,8 @@ struct MemoryRegionOps {
                   unsigned size);
     int (*ref)(MemoryRegion *mr);
     void (*unref)(MemoryRegion *mr);
+    void (*lock)(MemoryRegion *mr);
+    void (*unlock)(MemoryRegion *mr);
 
     enum device_endian endianness;
     /* Guest-visible constraints: */
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [patch v4 11/16] vcpu: push mmio dispatcher out of big lock
  2012-10-22  9:23 [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Liu Ping Fan
                   ` (9 preceding siblings ...)
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 10/16] memory: introduce lock ops for MemoryRegionOps Liu Ping Fan
@ 2012-10-22  9:23 ` Liu Ping Fan
  2012-10-22 10:31   ` Avi Kivity
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000 Liu Ping Fan
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 102+ messages in thread
From: Liu Ping Fan @ 2012-10-22  9:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Marcelo Tosatti, Avi Kivity, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 kvm-all.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index b3fa597..3d7ae18 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1562,12 +1562,15 @@ int kvm_cpu_exec(CPUArchState *env)
             break;
         case KVM_EXIT_MMIO:
             DPRINTF("handle_mmio\n");
+            qemu_mutex_unlock_iothread();
             set_context_type(1);
             cpu_physical_memory_rw(run->mmio.phys_addr,
                                    run->mmio.data,
                                    run->mmio.len,
                                    run->mmio.is_write);
             set_context_type(0);
+            qemu_mutex_lock_iothread();
+
             ret = 0;
             break;
         case KVM_EXIT_IRQ_WINDOW_OPEN:
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
  2012-10-22  9:23 [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Liu Ping Fan
                   ` (10 preceding siblings ...)
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 11/16] vcpu: push mmio dispatcher out of big lock Liu Ping Fan
@ 2012-10-22  9:23 ` Liu Ping Fan
  2012-10-22 10:37   ` Avi Kivity
  2012-10-23  9:04   ` Jan Kiszka
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state Liu Ping Fan
                   ` (4 subsequent siblings)
  16 siblings, 2 replies; 102+ messages in thread
From: Liu Ping Fan @ 2012-10-22  9:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Marcelo Tosatti, Avi Kivity, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

Use local lock to protect e1000. When calling the system function,
dropping the fine lock before acquiring the big lock. This will
introduce broken device state, which need extra effort to fix.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/e1000.c |   24 +++++++++++++++++++++++-
 1 files changed, 23 insertions(+), 1 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index ae8a6c5..5eddab5 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -85,6 +85,7 @@ typedef struct E1000State_st {
     NICConf conf;
     MemoryRegion mmio;
     MemoryRegion io;
+    QemuMutex e1000_lock;
 
     uint32_t mac_reg[0x8000];
     uint16_t phy_reg[0x20];
@@ -223,13 +224,27 @@ static const uint32_t mac_reg_init[] = {
 static void
 set_interrupt_cause(E1000State *s, int index, uint32_t val)
 {
+    QemuThread *t;
+
     if (val && (E1000_DEVID >= E1000_DEV_ID_82547EI_MOBILE)) {
         /* Only for 8257x */
         val |= E1000_ICR_INT_ASSERTED;
     }
     s->mac_reg[ICR] = val;
     s->mac_reg[ICS] = val;
-    qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
+
+    t = pthread_getspecific(qemu_thread_key);
+    if (t->context_type == 1) {
+        qemu_mutex_unlock(&s->e1000_lock);
+        qemu_mutex_lock_iothread();
+    }
+    if (DEVICE(s)->state < DEV_STATE_STOPPING) {
+        qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
+    }
+    if (t->context_type == 1) {
+        qemu_mutex_unlock_iothread();
+        qemu_mutex_lock(&s->e1000_lock);
+    }
 }
 
 static void
@@ -268,6 +283,7 @@ static void e1000_reset(void *opaque)
     E1000State *d = opaque;
 
     qemu_del_timer(d->autoneg_timer);
+
     memset(d->phy_reg, 0, sizeof d->phy_reg);
     memmove(d->phy_reg, phy_reg_init, sizeof phy_reg_init);
     memset(d->mac_reg, 0, sizeof d->mac_reg);
@@ -448,7 +464,11 @@ e1000_send_packet(E1000State *s, const uint8_t *buf, int size)
     if (s->phy_reg[PHY_CTRL] & MII_CR_LOOPBACK) {
         s->nic->nc.info->receive(&s->nic->nc, buf, size);
     } else {
+        qemu_mutex_unlock(&s->e1000_lock);
+        qemu_mutex_lock_iothread();
         qemu_send_packet(&s->nic->nc, buf, size);
+        qemu_mutex_unlock_iothread();
+        qemu_mutex_lock(&s->e1000_lock);
     }
 }
 
@@ -1221,6 +1241,8 @@ static int pci_e1000_init(PCIDevice *pci_dev)
     int i;
     uint8_t *macaddr;
 
+    qemu_mutex_init(&d->e1000_lock);
+
     pci_conf = d->dev.config;
 
     /* TODO: RST# value should be 0, PCI spec 6.2.4 */
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state
  2012-10-22  9:23 [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Liu Ping Fan
                   ` (11 preceding siblings ...)
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000 Liu Ping Fan
@ 2012-10-22  9:23 ` Liu Ping Fan
  2012-10-22 10:40   ` Avi Kivity
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 14/16] qdev: introduce stopping state Liu Ping Fan
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 102+ messages in thread
From: Liu Ping Fan @ 2012-10-22  9:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Marcelo Tosatti, Avi Kivity, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

The broken device state is caused by releasing local lock before acquiring
big lock. To fix this issue, we have two choice:
  1.use busy flag to protect the state
    The drawback is that we will introduce independent busy flag for each
    independent device's logic unit.
  2.reload the device's state
    The drawback is if the call chain is too deep, the action to reload will
    touch each layer. Also the reloading means to recaculate the intermediate
    result based on device's regs.

This patch adopt the solution 1 to fix the issue.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/e1000.c |   23 ++++++++++++++++++++---
 1 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index 5eddab5..0b4fce5 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -86,6 +86,7 @@ typedef struct E1000State_st {
     MemoryRegion mmio;
     MemoryRegion io;
     QemuMutex e1000_lock;
+    int busy;
 
     uint32_t mac_reg[0x8000];
     uint16_t phy_reg[0x20];
@@ -1033,6 +1034,11 @@ e1000_mmio_write(void *opaque, target_phys_addr_t addr, uint64_t val,
     E1000State *s = opaque;
     unsigned int index = (addr & 0x1ffff) >> 2;
 
+    if (s->busy) {
+        return;
+    } else {
+        s->busy = 1;
+    }
     if (index < NWRITEOPS && macreg_writeops[index]) {
         macreg_writeops[index](s, index, val);
     } else if (index < NREADOPS && macreg_readops[index]) {
@@ -1041,6 +1047,7 @@ e1000_mmio_write(void *opaque, target_phys_addr_t addr, uint64_t val,
         DBGOUT(UNKNOWN, "MMIO unknown write addr=0x%08x,val=0x%08"PRIx64"\n",
                index<<2, val);
     }
+    s->busy = 0;
 }
 
 static uint64_t
@@ -1048,13 +1055,22 @@ e1000_mmio_read(void *opaque, target_phys_addr_t addr, unsigned size)
 {
     E1000State *s = opaque;
     unsigned int index = (addr & 0x1ffff) >> 2;
+    uint64_t ret = 0;
+
+    if (s->busy) {
+        return ret;
+    } else {
+        s->busy = 1;
+    }
 
     if (index < NREADOPS && macreg_readops[index])
     {
-        return macreg_readops[index](s, index);
+        ret = macreg_readops[index](s, index);
+    } else {
+        DBGOUT(UNKNOWN, "MMIO unknown read addr=0x%08x\n", index<<2);
     }
-    DBGOUT(UNKNOWN, "MMIO unknown read addr=0x%08x\n", index<<2);
-    return 0;
+    s->busy = 0;
+    return ret;
 }
 
 static const MemoryRegionOps e1000_mmio_ops = {
@@ -1242,6 +1258,7 @@ static int pci_e1000_init(PCIDevice *pci_dev)
     uint8_t *macaddr;
 
     qemu_mutex_init(&d->e1000_lock);
+    d->busy = 0;
 
     pci_conf = d->dev.config;
 
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [patch v4 14/16] qdev: introduce stopping state
  2012-10-22  9:23 [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Liu Ping Fan
                   ` (12 preceding siblings ...)
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state Liu Ping Fan
@ 2012-10-22  9:23 ` Liu Ping Fan
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 15/16] e1000: introduce unmap() to fix unplug issue Liu Ping Fan
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 102+ messages in thread
From: Liu Ping Fan @ 2012-10-22  9:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Marcelo Tosatti, Avi Kivity, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

Add this state to tell the device is disappearing, and not to append
new requirement to it any longer.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/qdev.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/qdev.h b/hw/qdev.h
index aeae29e..32f08bc 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -22,6 +22,7 @@ typedef struct BusClass BusClass;
 enum DevState {
     DEV_STATE_CREATED = 1,
     DEV_STATE_INITIALIZED,
+    DEV_STATE_STOPPING,
 };
 
 enum {
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [patch v4 15/16] e1000: introduce unmap() to fix unplug issue
  2012-10-22  9:23 [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Liu Ping Fan
                   ` (13 preceding siblings ...)
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 14/16] qdev: introduce stopping state Liu Ping Fan
@ 2012-10-22  9:23 ` Liu Ping Fan
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 16/16] e1000: implement MemoryRegionOps's ref&lock interface Liu Ping Fan
  2012-10-25 14:04 ` [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Peter Maydell
  16 siblings, 0 replies; 102+ messages in thread
From: Liu Ping Fan @ 2012-10-22  9:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Marcelo Tosatti, Avi Kivity, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

When device uninit(), we should ensure that it is not used by any
subsystem. We can acheive this by two solution:
  1.sync on big lock in uninit() function
    This is more easy, but it require that big lock can be recusive,
    because uninit() can be called by iothread or mmio-dispatch.
  2.introduce unmap() as a sync point for all subsystem.

This patch adpots solotion 2.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/e1000.c |   26 ++++++++++++++++++++++----
 1 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index 0b4fce5..72c2324 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -168,7 +168,14 @@ set_phy_ctrl(E1000State *s, int index, uint16_t val)
         e1000_link_down(s);
         s->phy_reg[PHY_STATUS] &= ~MII_SR_AUTONEG_COMPLETE;
         DBGOUT(PHY, "Start link auto negotiation\n");
-        qemu_mod_timer(s->autoneg_timer, qemu_get_clock_ms(vm_clock) + 500);
+
+        qemu_mutex_unlock(&s->e1000_lock);
+        qemu_mutex_lock_iothread();
+        if (DEVICE(s)->state < DEV_STATE_STOPPING) {
+            qemu_mod_timer(s->autoneg_timer, qemu_get_clock_ms(vm_clock) + 500);
+        }
+        qemu_mutex_unlock_iothread();
+        qemu_mutex_lock(&s->e1000_lock);
     }
 }
 
@@ -467,7 +474,9 @@ e1000_send_packet(E1000State *s, const uint8_t *buf, int size)
     } else {
         qemu_mutex_unlock(&s->e1000_lock);
         qemu_mutex_lock_iothread();
-        qemu_send_packet(&s->nic->nc, buf, size);
+        if (DEVICE(s)->state < DEV_STATE_STOPPING) {
+            qemu_send_packet(&s->nic->nc, buf, size);
+        }
         qemu_mutex_unlock_iothread();
         qemu_mutex_lock(&s->e1000_lock);
     }
@@ -1221,6 +1230,16 @@ e1000_mmio_setup(E1000State *d)
 }
 
 static void
+pci_e1000_unmap(PCIDevice *p)
+{
+  E1000State *d = DO_UPCAST(E1000State, dev, p);
+
+  DEVICE(d)->state = DEV_STATE_STOPPING;
+  qemu_del_timer(d->autoneg_timer);
+  qemu_del_net_client(&d->nic->nc);
+}
+
+static void
 e1000_cleanup(NetClientState *nc)
 {
     E1000State *s = DO_UPCAST(NICState, nc, nc)->opaque;
@@ -1233,11 +1252,9 @@ pci_e1000_uninit(PCIDevice *dev)
 {
     E1000State *d = DO_UPCAST(E1000State, dev, dev);
 
-    qemu_del_timer(d->autoneg_timer);
     qemu_free_timer(d->autoneg_timer);
     memory_region_destroy(&d->mmio);
     memory_region_destroy(&d->io);
-    qemu_del_net_client(&d->nic->nc);
 }
 
 static NetClientInfo net_e1000_info = {
@@ -1314,6 +1331,7 @@ static void e1000_class_init(ObjectClass *klass, void *data)
 
     k->init = pci_e1000_init;
     k->exit = pci_e1000_uninit;
+    k->unmap = pci_e1000_unmap;
     k->romfile = "pxe-e1000.rom";
     k->vendor_id = PCI_VENDOR_ID_INTEL;
     k->device_id = E1000_DEVID;
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [patch v4 16/16] e1000: implement MemoryRegionOps's ref&lock interface
  2012-10-22  9:23 [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Liu Ping Fan
                   ` (14 preceding siblings ...)
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 15/16] e1000: introduce unmap() to fix unplug issue Liu Ping Fan
@ 2012-10-22  9:23 ` Liu Ping Fan
  2012-10-25 14:04 ` [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Peter Maydell
  16 siblings, 0 replies; 102+ messages in thread
From: Liu Ping Fan @ 2012-10-22  9:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Marcelo Tosatti, Avi Kivity, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

With this, e1000 tells memory core that it can be protected by
refcnt, and can protected by local lock

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/e1000.c |   31 +++++++++++++++++++++++++++++++
 1 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index 72c2324..9929fe6 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -1082,9 +1082,40 @@ e1000_mmio_read(void *opaque, target_phys_addr_t addr, unsigned size)
     return ret;
 }
 
+static void e1000_mmio_lock(MemoryRegion *mr)
+{
+    E1000State *d = container_of(mr, E1000State, mmio);
+    qemu_mutex_lock(&d->e1000_lock);
+}
+
+static void e1000_mmio_unlock(MemoryRegion *mr)
+{
+    E1000State *d = container_of(mr, E1000State, mmio);
+    qemu_mutex_unlock(&d->e1000_lock);
+}
+
+static int e1000_mmio_ref(MemoryRegion *mr)
+{
+    E1000State *e1000 = container_of(mr, E1000State, mmio);
+
+    object_ref(OBJECT(e1000));
+    return 1;
+}
+
+static void e1000_mmio_unref(MemoryRegion *mr)
+{
+    E1000State *e1000 = container_of(mr, E1000State, mmio);
+
+    object_unref(OBJECT(e1000));
+}
+
 static const MemoryRegionOps e1000_mmio_ops = {
     .read = e1000_mmio_read,
     .write = e1000_mmio_write,
+    .ref = e1000_mmio_ref,
+    .unref = e1000_mmio_unref,
+    .lock = e1000_mmio_lock,
+    .unlock = e1000_mmio_unlock,
     .endianness = DEVICE_LITTLE_ENDIAN,
     .impl = {
         .min_access_size = 4,
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 08/16] QemuThread: make QemuThread as tls to store extra info
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 08/16] QemuThread: make QemuThread as tls to store extra info Liu Ping Fan
@ 2012-10-22  9:30   ` Jan Kiszka
  2012-10-22 17:13     ` Peter Maydell
  0 siblings, 1 reply; 102+ messages in thread
From: Jan Kiszka @ 2012-10-22  9:30 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Avi Kivity,
	Anthony Liguori, Paolo Bonzini

On 2012-10-22 11:23, Liu Ping Fan wrote:
> If mmio dispatch out of big lock, some function's calling context (ie,
> holding big lock or not) are different. We need to trace these info in
> runtime, and use tls to store them.
> By this method, we can avoid to require big lock recursive.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  cpus.c              |    1 +
>  qemu-thread-posix.c |    7 +++++++
>  qemu-thread-posix.h |    2 ++
>  qemu-thread.h       |    1 +
>  vl.c                |    6 ++++++
>  5 files changed, 17 insertions(+), 0 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index e476a3c..4cd7f85 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -735,6 +735,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>      CPUState *cpu = ENV_GET_CPU(env);
>      int r;
>  
> +    pthread_setspecific(qemu_thread_key, cpu->thread);
>      qemu_mutex_lock(&qemu_global_mutex);
>      qemu_thread_get_self(cpu->thread);
>      env->thread_id = qemu_get_thread_id();
> diff --git a/qemu-thread-posix.c b/qemu-thread-posix.c
> index 8fbabda..f448fcb 100644
> --- a/qemu-thread-posix.c
> +++ b/qemu-thread-posix.c
> @@ -19,6 +19,8 @@
>  #include <string.h>
>  #include "qemu-thread.h"
>  
> +pthread_key_t qemu_thread_key;
> +
>  static void error_exit(int err, const char *msg)
>  {
>      fprintf(stderr, "qemu: %s: %s\n", msg, strerror(err));
> @@ -151,6 +153,11 @@ void qemu_thread_get_self(QemuThread *thread)
>      thread->thread = pthread_self();
>  }
>  
> +void qemu_thread_key_create(void)
> +{
> +    pthread_key_create(&qemu_thread_key, NULL);
> +}
> +
>  bool qemu_thread_is_self(QemuThread *thread)
>  {
>     return pthread_equal(pthread_self(), thread->thread);
> diff --git a/qemu-thread-posix.h b/qemu-thread-posix.h
> index ee4618e..2607b1c 100644
> --- a/qemu-thread-posix.h
> +++ b/qemu-thread-posix.h
> @@ -14,4 +14,6 @@ struct QemuThread {
>      pthread_t thread;
>  };
>  
> +extern pthread_key_t qemu_thread_key;
> +
>  #endif
> diff --git a/qemu-thread.h b/qemu-thread.h
> index 05fdaaf..4a6427d 100644
> --- a/qemu-thread.h
> +++ b/qemu-thread.h
> @@ -46,4 +46,5 @@ void qemu_thread_get_self(QemuThread *thread);
>  bool qemu_thread_is_self(QemuThread *thread);
>  void qemu_thread_exit(void *retval);
>  
> +void qemu_thread_key_create(void);
>  #endif
> diff --git a/vl.c b/vl.c
> index 7c577fa..442479a 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -149,6 +149,7 @@ int main(int argc, char **argv)
>  #include "qemu-options.h"
>  #include "qmp-commands.h"
>  #include "main-loop.h"
> +#include "qemu-thread.h"
>  #ifdef CONFIG_VIRTFS
>  #include "fsdev/qemu-fsdev.h"
>  #endif
> @@ -2342,6 +2343,7 @@ int qemu_init_main_loop(void)
>      return main_loop_init();
>  }
>  
> +
>  int main(int argc, char **argv, char **envp)
>  {
>      int i;
> @@ -3483,6 +3485,10 @@ int main(int argc, char **argv, char **envp)
>          exit(1);
>      }
>  
> +    qemu_thread_key_create();
> +    QemuThread *ioctx = g_malloc0(sizeof(QemuThread));
> +    pthread_setspecific(qemu_thread_key, ioctx);
> +
>      os_set_line_buffering();
>  
>      if (init_timer_alarm() < 0) {
> 

Can't we enhance qemu-tls.h to work via pthread_setspecific in case
__thread is not working and use that abstraction (DECLARE/DEFINE_TLS)
directly?

Also, the above breaks win32, doesn't it?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps Liu Ping Fan
@ 2012-10-22  9:38   ` Avi Kivity
  2012-10-23 11:51     ` Paolo Bonzini
  0 siblings, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2012-10-22  9:38 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
> This pair of interface help to decide when dispatching, whether
> we can pin mr without big lock or not.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  memory.h |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/memory.h b/memory.h
> index bd1bbae..9039411 100644
> --- a/memory.h
> +++ b/memory.h
> @@ -25,6 +25,7 @@
>  #include "iorange.h"
>  #include "ioport.h"
>  #include "int128.h"
> +#include "qemu/object.h"

Unneeded.

>  
>  typedef struct MemoryRegionOps MemoryRegionOps;
>  typedef struct MemoryRegion MemoryRegion;
> @@ -66,6 +67,8 @@ struct MemoryRegionOps {
>                    target_phys_addr_t addr,
>                    uint64_t data,
>                    unsigned size);
> +    int (*ref)(MemoryRegion *mr);
> +    void (*unref)(MemoryRegion *mr);
>  

Why return an int?  Should succeed unconditionally.  Please fold into 7
(along with 6).


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 09/16] memory: introduce mmio request pending to anti nested DMA
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 09/16] memory: introduce mmio request pending to anti nested DMA Liu Ping Fan
@ 2012-10-22 10:28   ` Avi Kivity
  2012-10-23 12:38   ` Gleb Natapov
  1 sibling, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2012-10-22 10:28 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
> Rejecting the nested mmio request which does not aim at RAM, so we
> can avoid the potential deadlock caused by the random lock sequence
> of two device's local lock.

I can't say I like this but it's better than anything else we have.

>  }
>  
> +int get_context_type(void)
> +{
> +    QemuThread *t = pthread_getspecific(qemu_thread_key);
> +    return t->context_type;
> +}
> +
> +void set_context_type(int type)
> +{
> +    QemuThread *t = pthread_getspecific(qemu_thread_key);
> +    t->context_type = type;
> +}

Please define an enum so we know what it means.

> +
>  static void *qemu_kvm_cpu_thread_fn(void *arg)
>  {
>      CPUArchState *env = arg;
> @@ -736,6 +748,8 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>      int r;
>  
>      pthread_setspecific(qemu_thread_key, cpu->thread);
> +    set_context_type(0);
> +

Setting this for every thread means we're going to miss some.

> @@ -3500,7 +3502,8 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>          qemu_mutex_lock(&mem_map_lock);
>          safe_ref = phys_page_lookup(page, &obj_mrs);
>          qemu_mutex_unlock(&mem_map_lock);
> -        if (safe_ref == 0) {
> +
> +        if (safe_ref == 0 && context == 1) {
>              qemu_mutex_lock_iothread();
>              qemu_mutex_lock(&mem_map_lock);
>              /* At the 2nd try, mem map can change, so need to judge it again */
> @@ -3511,7 +3514,9 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>              }
>          }
>          section = &obj_mrs;
> -
> +        if (context == 1) {
> +            nested_dma = thread->mmio_request_pending++ > 1 ? 1 : 0;
> +        }
>          if (is_write) {
>              if (!memory_region_is_ram(section->mr)) {
>                  target_phys_addr_t addr1;
> @@ -3521,17 +3526,23 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>                  if (l >= 4 && ((addr1 & 3) == 0)) {
>                      /* 32 bit write access */
>                      val = ldl_p(buf);
> -                    io_mem_write(section->mr, addr1, val, 4);
> +                    if (!nested_dma) {
> +                        io_mem_write(section->mr, addr1, val, 4);
> +                    }
>                      l = 4;
>                  } else if (l >= 2 && ((addr1 & 1) == 0)) {
>                      /* 16 bit write access */
>                      val = lduw_p(buf);
> -                    io_mem_write(section->mr, addr1, val, 2);
> +                    if (!nested_dma) {
> +                        io_mem_write(section->mr, addr1, val, 2);
> +                    }
>                      l = 2;
>                  } else {
>                      /* 8 bit write access */
>                      val = ldub_p(buf);
> -                    io_mem_write(section->mr, addr1, val, 1);
> +                    if (!nested_dma) {
> +                        io_mem_write(section->mr, addr1, val, 1);
> +                    }
>                      l = 1;
>                  }


We need to abort on nested_dma so we know something bad happened and we
have to fix it.

> @@ -12,6 +12,9 @@ struct QemuCond {
>  
>  struct QemuThread {
>      pthread_t thread;
> +    /* 0 clean; 1 mmio; 2 io */
> +    int context_type;
> +    int mmio_request_pending;
>  };

QemuThread is at a too low level of abstraction.  It's just a wrapper
around the host threading facilities, it shouldn't add anything else.



-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 10/16] memory: introduce lock ops for MemoryRegionOps
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 10/16] memory: introduce lock ops for MemoryRegionOps Liu Ping Fan
@ 2012-10-22 10:30   ` Avi Kivity
  2012-10-23  5:53     ` liu ping fan
  0 siblings, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2012-10-22 10:30 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
> This can help memory core to use mr's fine lock to mmio dispatch.
> 
> diff --git a/memory.c b/memory.c
> index d528d1f..86d5623 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1505,13 +1505,27 @@ void set_system_io_map(MemoryRegion *mr)
>  
>  uint64_t io_mem_read(MemoryRegion *mr, target_phys_addr_t addr, unsigned size)
>  {
> -    return memory_region_dispatch_read(mr, addr, size);
> +    uint64_t ret;
> +    if (mr->ops->lock) {
> +        mr->ops->lock(mr);
> +    }
> +    ret = memory_region_dispatch_read(mr, addr, size);
> +    if (mr->ops->lock) {
> +        mr->ops->unlock(mr);
> +    }
> +    return ret;
>  }
>  
>  void io_mem_write(MemoryRegion *mr, target_phys_addr_t addr,
>                    uint64_t val, unsigned size)
>  {
> +    if (mr->ops->lock) {
> +        mr->ops->lock(mr);
> +    }
>      memory_region_dispatch_write(mr, addr, val, size);
> +    if (mr->ops->lock) {
> +        mr->ops->unlock(mr);
> +    }
>  }
>  
>  typedef struct MemoryRegionList MemoryRegionList;
> diff --git a/memory.h b/memory.h
> index 9039411..5d00066 100644
> --- a/memory.h
> +++ b/memory.h
> @@ -69,6 +69,8 @@ struct MemoryRegionOps {
>                    unsigned size);
>      int (*ref)(MemoryRegion *mr);
>      void (*unref)(MemoryRegion *mr);
> +    void (*lock)(MemoryRegion *mr);
> +    void (*unlock)(MemoryRegion *mr);
>  
>      enum device_endian endianness;
>      /* Guest-visible constraints: */
> 

Is this really needed?  Can't read/write callbacks lock and unlock
themselves?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 11/16] vcpu: push mmio dispatcher out of big lock
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 11/16] vcpu: push mmio dispatcher out of big lock Liu Ping Fan
@ 2012-10-22 10:31   ` Avi Kivity
  2012-10-22 10:36     ` Jan Kiszka
  0 siblings, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2012-10-22 10:31 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  kvm-all.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/kvm-all.c b/kvm-all.c
> index b3fa597..3d7ae18 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -1562,12 +1562,15 @@ int kvm_cpu_exec(CPUArchState *env)
>              break;
>          case KVM_EXIT_MMIO:
>              DPRINTF("handle_mmio\n");
> +            qemu_mutex_unlock_iothread();
>              set_context_type(1);
>              cpu_physical_memory_rw(run->mmio.phys_addr,
>                                     run->mmio.data,
>                                     run->mmio.len,
>                                     run->mmio.is_write);
>              set_context_type(0);
> +            qemu_mutex_lock_iothread();
> +
>              ret = 0;
>              break;
>          case KVM_EXIT_IRQ_WINDOW_OPEN:
> 

This is fine for now, but of course later we'll have to remove the lock
completely and apply it for the other exits (and other processing, when
needed).

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 11/16] vcpu: push mmio dispatcher out of big lock
  2012-10-22 10:31   ` Avi Kivity
@ 2012-10-22 10:36     ` Jan Kiszka
  0 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-22 10:36 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Paolo Bonzini

On 2012-10-22 12:31, Avi Kivity wrote:
> On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  kvm-all.c |    3 +++
>>  1 files changed, 3 insertions(+), 0 deletions(-)
>>
>> diff --git a/kvm-all.c b/kvm-all.c
>> index b3fa597..3d7ae18 100644
>> --- a/kvm-all.c
>> +++ b/kvm-all.c
>> @@ -1562,12 +1562,15 @@ int kvm_cpu_exec(CPUArchState *env)
>>              break;
>>          case KVM_EXIT_MMIO:
>>              DPRINTF("handle_mmio\n");
>> +            qemu_mutex_unlock_iothread();
>>              set_context_type(1);
>>              cpu_physical_memory_rw(run->mmio.phys_addr,
>>                                     run->mmio.data,
>>                                     run->mmio.len,
>>                                     run->mmio.is_write);
>>              set_context_type(0);
>> +            qemu_mutex_lock_iothread();
>> +
>>              ret = 0;
>>              break;
>>          case KVM_EXIT_IRQ_WINDOW_OPEN:
>>
> 
> This is fine for now, but of course later we'll have to remove the lock
> completely and apply it for the other exits (and other processing, when
> needed).

I think we can do this already, I posted a patch some time ago.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000 Liu Ping Fan
@ 2012-10-22 10:37   ` Avi Kivity
  2012-10-23  9:04   ` Jan Kiszka
  1 sibling, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2012-10-22 10:37 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
> Use local lock to protect e1000. When calling the system function,
> dropping the fine lock before acquiring the big lock. This will
> introduce broken device state, which need extra effort to fix.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  hw/e1000.c |   24 +++++++++++++++++++++++-
>  1 files changed, 23 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/e1000.c b/hw/e1000.c
> index ae8a6c5..5eddab5 100644
> --- a/hw/e1000.c
> +++ b/hw/e1000.c
> @@ -85,6 +85,7 @@ typedef struct E1000State_st {
>      NICConf conf;
>      MemoryRegion mmio;
>      MemoryRegion io;
> +    QemuMutex e1000_lock;

Can call it 'lock'.

>  
>      uint32_t mac_reg[0x8000];
>      uint16_t phy_reg[0x20];
> @@ -223,13 +224,27 @@ static const uint32_t mac_reg_init[] = {
>  static void
>  set_interrupt_cause(E1000State *s, int index, uint32_t val)
>  {
> +    QemuThread *t;
> +
>      if (val && (E1000_DEVID >= E1000_DEV_ID_82547EI_MOBILE)) {
>          /* Only for 8257x */
>          val |= E1000_ICR_INT_ASSERTED;
>      }
>      s->mac_reg[ICR] = val;
>      s->mac_reg[ICS] = val;
> -    qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
> +
> +    t = pthread_getspecific(qemu_thread_key);
> +    if (t->context_type == 1) {
> +        qemu_mutex_unlock(&s->e1000_lock);
> +        qemu_mutex_lock_iothread();
> +    }
> +    if (DEVICE(s)->state < DEV_STATE_STOPPING) {
> +        qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
> +    }
> +    if (t->context_type == 1) {
> +        qemu_mutex_unlock_iothread();
> +        qemu_mutex_lock(&s->e1000_lock);
> +    }
>  }

This is way too complicated for device model authors.  There's no way to
get it correct.

If mmio dispatch needs to call a non-thread-safe subsystem, it must
acquire the big lock:

Something like

e1000_mmio_read()
{
    if (index < NREADOPS && macreg_readops[index]){
        macreg_lockops[index].lock(s);
        ret = macreg_readops[index](s, index);
        macreg_lockops[index].unlock(s);
    }
    DBGOUT(UNKNOWN, "MMIO unknown read addr=0x%08x\n", index<<2);

}

Where .lock() either locks just the local lock, or both locks.  As
subsystems are converted to be thread safe, we can remove this.



-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state Liu Ping Fan
@ 2012-10-22 10:40   ` Avi Kivity
  2012-10-23  5:52     ` liu ping fan
  0 siblings, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2012-10-22 10:40 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
> The broken device state is caused by releasing local lock before acquiring
> big lock. To fix this issue, we have two choice:
>   1.use busy flag to protect the state
>     The drawback is that we will introduce independent busy flag for each
>     independent device's logic unit.
>   2.reload the device's state
>     The drawback is if the call chain is too deep, the action to reload will
>     touch each layer. Also the reloading means to recaculate the intermediate
>     result based on device's regs.
> 
> This patch adopt the solution 1 to fix the issue.

Doesn't the nested mmio patch detect this?


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 08/16] QemuThread: make QemuThread as tls to store extra info
  2012-10-22  9:30   ` Jan Kiszka
@ 2012-10-22 17:13     ` Peter Maydell
  2012-10-23  5:58       ` liu ping fan
  2012-10-23 11:48       ` Paolo Bonzini
  0 siblings, 2 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-22 17:13 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

On 22 October 2012 10:30, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> Can't we enhance qemu-tls.h to work via pthread_setspecific in case
> __thread is not working and use that abstraction (DECLARE/DEFINE_TLS)
> directly?

Agreed. (There were prototype patches floating around for Win32
at least). The only reason qemu-tls.h has the dummy not-actually-tls
code for non-linux is that IIRC we wanted to get the linux bits
in quickly before a release and we never got round to going back
and doing it properly for the other targets.

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state
  2012-10-22 10:40   ` Avi Kivity
@ 2012-10-23  5:52     ` liu ping fan
  2012-10-23  9:06       ` Avi Kivity
  2012-10-23  9:07       ` Jan Kiszka
  0 siblings, 2 replies; 102+ messages in thread
From: liu ping fan @ 2012-10-23  5:52 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka, Paolo Bonzini

On Mon, Oct 22, 2012 at 6:40 PM, Avi Kivity <avi@redhat.com> wrote:
> On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
>> The broken device state is caused by releasing local lock before acquiring
>> big lock. To fix this issue, we have two choice:
>>   1.use busy flag to protect the state
>>     The drawback is that we will introduce independent busy flag for each
>>     independent device's logic unit.
>>   2.reload the device's state
>>     The drawback is if the call chain is too deep, the action to reload will
>>     touch each layer. Also the reloading means to recaculate the intermediate
>>     result based on device's regs.
>>
>> This patch adopt the solution 1 to fix the issue.
>
> Doesn't the nested mmio patch detect this?
>
It will only record and fix the issue on one thread. But guest can
touch the emulated device on muti-threads.
>
> --
> error compiling committee.c: too many arguments to function
>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 10/16] memory: introduce lock ops for MemoryRegionOps
  2012-10-22 10:30   ` Avi Kivity
@ 2012-10-23  5:53     ` liu ping fan
  2012-10-23  8:53       ` Jan Kiszka
  0 siblings, 1 reply; 102+ messages in thread
From: liu ping fan @ 2012-10-23  5:53 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Anthony Liguori,
	Jan Kiszka, Paolo Bonzini

On Mon, Oct 22, 2012 at 6:30 PM, Avi Kivity <avi@redhat.com> wrote:
> On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
>> This can help memory core to use mr's fine lock to mmio dispatch.
>>
>> diff --git a/memory.c b/memory.c
>> index d528d1f..86d5623 100644
>> --- a/memory.c
>> +++ b/memory.c
>> @@ -1505,13 +1505,27 @@ void set_system_io_map(MemoryRegion *mr)
>>
>>  uint64_t io_mem_read(MemoryRegion *mr, target_phys_addr_t addr, unsigned size)
>>  {
>> -    return memory_region_dispatch_read(mr, addr, size);
>> +    uint64_t ret;
>> +    if (mr->ops->lock) {
>> +        mr->ops->lock(mr);
>> +    }
>> +    ret = memory_region_dispatch_read(mr, addr, size);
>> +    if (mr->ops->lock) {
>> +        mr->ops->unlock(mr);
>> +    }
>> +    return ret;
>>  }
>>
>>  void io_mem_write(MemoryRegion *mr, target_phys_addr_t addr,
>>                    uint64_t val, unsigned size)
>>  {
>> +    if (mr->ops->lock) {
>> +        mr->ops->lock(mr);
>> +    }
>>      memory_region_dispatch_write(mr, addr, val, size);
>> +    if (mr->ops->lock) {
>> +        mr->ops->unlock(mr);
>> +    }
>>  }
>>
>>  typedef struct MemoryRegionList MemoryRegionList;
>> diff --git a/memory.h b/memory.h
>> index 9039411..5d00066 100644
>> --- a/memory.h
>> +++ b/memory.h
>> @@ -69,6 +69,8 @@ struct MemoryRegionOps {
>>                    unsigned size);
>>      int (*ref)(MemoryRegion *mr);
>>      void (*unref)(MemoryRegion *mr);
>> +    void (*lock)(MemoryRegion *mr);
>> +    void (*unlock)(MemoryRegion *mr);
>>
>>      enum device_endian endianness;
>>      /* Guest-visible constraints: */
>>
>
> Is this really needed?  Can't read/write callbacks lock and unlock
> themselves?
>
We can. But then, we need to expand the logic there, and use addr to
tell which fine lock to snatch and release.   Which one do you prefer,
fold the logic into memory core or spread it into devices?

> --
> error compiling committee.c: too many arguments to function
>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 08/16] QemuThread: make QemuThread as tls to store extra info
  2012-10-22 17:13     ` Peter Maydell
@ 2012-10-23  5:58       ` liu ping fan
  2012-10-23 11:48       ` Paolo Bonzini
  1 sibling, 0 replies; 102+ messages in thread
From: liu ping fan @ 2012-10-23  5:58 UTC (permalink / raw)
  To: Peter Maydell, Jan Kiszka
  Cc: Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Avi Kivity,
	Anthony Liguori, Paolo Bonzini

On Tue, Oct 23, 2012 at 1:13 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 22 October 2012 10:30, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> Can't we enhance qemu-tls.h to work via pthread_setspecific in case
>> __thread is not working and use that abstraction (DECLARE/DEFINE_TLS)
>> directly?
>
> Agreed. (There were prototype patches floating around for Win32
> at least). The only reason qemu-tls.h has the dummy not-actually-tls
> code for non-linux is that IIRC we wanted to get the linux bits
> in quickly before a release and we never got round to going back
> and doing it properly for the other targets.
>
Oh, it seems that my need of it can be covered by __thread. I will
resort to it.  BTW, what is the exception that __thread can resolve
(if ELF can support this keyword)?

Thanks and regards,
pingfan

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 10/16] memory: introduce lock ops for MemoryRegionOps
  2012-10-23  5:53     ` liu ping fan
@ 2012-10-23  8:53       ` Jan Kiszka
  0 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-23  8:53 UTC (permalink / raw)
  To: liu ping fan
  Cc: Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Avi Kivity,
	Anthony Liguori, Paolo Bonzini

On 2012-10-23 07:53, liu ping fan wrote:
> On Mon, Oct 22, 2012 at 6:30 PM, Avi Kivity <avi@redhat.com> wrote:
>> On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
>>> This can help memory core to use mr's fine lock to mmio dispatch.
>>>
>>> diff --git a/memory.c b/memory.c
>>> index d528d1f..86d5623 100644
>>> --- a/memory.c
>>> +++ b/memory.c
>>> @@ -1505,13 +1505,27 @@ void set_system_io_map(MemoryRegion *mr)
>>>
>>>  uint64_t io_mem_read(MemoryRegion *mr, target_phys_addr_t addr, unsigned size)
>>>  {
>>> -    return memory_region_dispatch_read(mr, addr, size);
>>> +    uint64_t ret;
>>> +    if (mr->ops->lock) {
>>> +        mr->ops->lock(mr);
>>> +    }
>>> +    ret = memory_region_dispatch_read(mr, addr, size);
>>> +    if (mr->ops->lock) {
>>> +        mr->ops->unlock(mr);
>>> +    }
>>> +    return ret;
>>>  }
>>>
>>>  void io_mem_write(MemoryRegion *mr, target_phys_addr_t addr,
>>>                    uint64_t val, unsigned size)
>>>  {
>>> +    if (mr->ops->lock) {
>>> +        mr->ops->lock(mr);
>>> +    }
>>>      memory_region_dispatch_write(mr, addr, val, size);
>>> +    if (mr->ops->lock) {
>>> +        mr->ops->unlock(mr);
>>> +    }
>>>  }
>>>
>>>  typedef struct MemoryRegionList MemoryRegionList;
>>> diff --git a/memory.h b/memory.h
>>> index 9039411..5d00066 100644
>>> --- a/memory.h
>>> +++ b/memory.h
>>> @@ -69,6 +69,8 @@ struct MemoryRegionOps {
>>>                    unsigned size);
>>>      int (*ref)(MemoryRegion *mr);
>>>      void (*unref)(MemoryRegion *mr);
>>> +    void (*lock)(MemoryRegion *mr);
>>> +    void (*unlock)(MemoryRegion *mr);
>>>
>>>      enum device_endian endianness;
>>>      /* Guest-visible constraints: */
>>>
>>
>> Is this really needed?  Can't read/write callbacks lock and unlock
>> themselves?
>>
> We can. But then, we need to expand the logic there, and use addr to
> tell which fine lock to snatch and release.   Which one do you prefer,
> fold the logic into memory core or spread it into devices?

I also don't see why having to provide additional callbacks is simpler
than straight calls to qemu_mutex_lock/unlock from within the access
handlers.

Moreover, the second model will allow to take or not take the lock
depending on the access address / width / device state / whatever in a
more comprehensible way as you can follow the control flow better when
there are less callbacks. Many plain "read register X" accesses will not
require any device-side locking at all.

Granted, a downside is that the risk of leaking locks in case of early
returns etc. increases.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000 Liu Ping Fan
  2012-10-22 10:37   ` Avi Kivity
@ 2012-10-23  9:04   ` Jan Kiszka
  2012-10-24  6:31     ` liu ping fan
  2012-10-24  7:29     ` liu ping fan
  1 sibling, 2 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-23  9:04 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Avi Kivity,
	Anthony Liguori, Paolo Bonzini

On 2012-10-22 11:23, Liu Ping Fan wrote:
> Use local lock to protect e1000. When calling the system function,
> dropping the fine lock before acquiring the big lock. This will
> introduce broken device state, which need extra effort to fix.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  hw/e1000.c |   24 +++++++++++++++++++++++-
>  1 files changed, 23 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/e1000.c b/hw/e1000.c
> index ae8a6c5..5eddab5 100644
> --- a/hw/e1000.c
> +++ b/hw/e1000.c
> @@ -85,6 +85,7 @@ typedef struct E1000State_st {
>      NICConf conf;
>      MemoryRegion mmio;
>      MemoryRegion io;
> +    QemuMutex e1000_lock;
>  
>      uint32_t mac_reg[0x8000];
>      uint16_t phy_reg[0x20];
> @@ -223,13 +224,27 @@ static const uint32_t mac_reg_init[] = {
>  static void
>  set_interrupt_cause(E1000State *s, int index, uint32_t val)
>  {
> +    QemuThread *t;
> +
>      if (val && (E1000_DEVID >= E1000_DEV_ID_82547EI_MOBILE)) {
>          /* Only for 8257x */
>          val |= E1000_ICR_INT_ASSERTED;
>      }
>      s->mac_reg[ICR] = val;
>      s->mac_reg[ICS] = val;
> -    qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
> +
> +    t = pthread_getspecific(qemu_thread_key);
> +    if (t->context_type == 1) {
> +        qemu_mutex_unlock(&s->e1000_lock);
> +        qemu_mutex_lock_iothread();
> +    }
> +    if (DEVICE(s)->state < DEV_STATE_STOPPING) {
> +        qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
> +    }
> +    if (t->context_type == 1) {
> +        qemu_mutex_unlock_iothread();
> +        qemu_mutex_lock(&s->e1000_lock);
> +    }

This is ugly for many reasons. First of all, it is racy as the register
content may change while dropping the device lock, no? Then you would
raise or clear an IRQ spuriously.

Second, it clearly shows that we need to address lock-less IRQ delivery.
Almost nothing is won if we have to take the global lock again to push
an IRQ event to the guest. I'm repeating myself, but the problem to be
solved here is almost identical to fast IRQ delivery for assigned
devices (which we only address pretty ad-hoc for PCI so far).

And third: too much boilerplate code... :-/

>  }
>  
>  static void
> @@ -268,6 +283,7 @@ static void e1000_reset(void *opaque)
>      E1000State *d = opaque;
>  
>      qemu_del_timer(d->autoneg_timer);
> +
>      memset(d->phy_reg, 0, sizeof d->phy_reg);
>      memmove(d->phy_reg, phy_reg_init, sizeof phy_reg_init);
>      memset(d->mac_reg, 0, sizeof d->mac_reg);
> @@ -448,7 +464,11 @@ e1000_send_packet(E1000State *s, const uint8_t *buf, int size)
>      if (s->phy_reg[PHY_CTRL] & MII_CR_LOOPBACK) {
>          s->nic->nc.info->receive(&s->nic->nc, buf, size);
>      } else {
> +        qemu_mutex_unlock(&s->e1000_lock);
> +        qemu_mutex_lock_iothread();
>          qemu_send_packet(&s->nic->nc, buf, size);
> +        qemu_mutex_unlock_iothread();
> +        qemu_mutex_lock(&s->e1000_lock);

And that is the also a problem to be discussed next: How to handle
locking of backends? Do we want separate locks for backend and frontend?
Although they are typically in a 1:1 relationship? Oh, I'm revealing the
content of my talk... ;)

>      }
>  }
>  
> @@ -1221,6 +1241,8 @@ static int pci_e1000_init(PCIDevice *pci_dev)
>      int i;
>      uint8_t *macaddr;
>  
> +    qemu_mutex_init(&d->e1000_lock);
> +
>      pci_conf = d->dev.config;
>  
>      /* TODO: RST# value should be 0, PCI spec 6.2.4 */
> 

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state
  2012-10-23  5:52     ` liu ping fan
@ 2012-10-23  9:06       ` Avi Kivity
  2012-10-23  9:07       ` Jan Kiszka
  1 sibling, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2012-10-23  9:06 UTC (permalink / raw)
  To: liu ping fan
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka, Paolo Bonzini

On 10/23/2012 07:52 AM, liu ping fan wrote:
> On Mon, Oct 22, 2012 at 6:40 PM, Avi Kivity <avi@redhat.com> wrote:
>> On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
>>> The broken device state is caused by releasing local lock before acquiring
>>> big lock. To fix this issue, we have two choice:
>>>   1.use busy flag to protect the state
>>>     The drawback is that we will introduce independent busy flag for each
>>>     independent device's logic unit.
>>>   2.reload the device's state
>>>     The drawback is if the call chain is too deep, the action to reload will
>>>     touch each layer. Also the reloading means to recaculate the intermediate
>>>     result based on device's regs.
>>>
>>> This patch adopt the solution 1 to fix the issue.
>>
>> Doesn't the nested mmio patch detect this?
>>
> It will only record and fix the issue on one thread. But guest can
> touch the emulated device on muti-threads.

I forgot about that.

I propose that we merge without a fix.  Upstream is broken in the same
way; it won't deadlock but it will surely break in some other way if a
write can cause another write to be triggered to the same location.

When we gain more experience with fine-graining devices we can converge
on a good solution.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state
  2012-10-23  5:52     ` liu ping fan
  2012-10-23  9:06       ` Avi Kivity
@ 2012-10-23  9:07       ` Jan Kiszka
  2012-10-23  9:32         ` liu ping fan
  1 sibling, 1 reply; 102+ messages in thread
From: Jan Kiszka @ 2012-10-23  9:07 UTC (permalink / raw)
  To: liu ping fan
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

On 2012-10-23 07:52, liu ping fan wrote:
> On Mon, Oct 22, 2012 at 6:40 PM, Avi Kivity <avi@redhat.com> wrote:
>> On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
>>> The broken device state is caused by releasing local lock before acquiring
>>> big lock. To fix this issue, we have two choice:
>>>   1.use busy flag to protect the state
>>>     The drawback is that we will introduce independent busy flag for each
>>>     independent device's logic unit.
>>>   2.reload the device's state
>>>     The drawback is if the call chain is too deep, the action to reload will
>>>     touch each layer. Also the reloading means to recaculate the intermediate
>>>     result based on device's regs.
>>>
>>> This patch adopt the solution 1 to fix the issue.
>>
>> Doesn't the nested mmio patch detect this?
>>
> It will only record and fix the issue on one thread. But guest can
> touch the emulated device on muti-threads.

Sorry, what does that mean? A second VCPU accessing the device will
simply be ignored when it races with another VCPU? Specifically

+    if (s->busy) {
+        return;

and

+    uint64_t ret = 0;
+
+    if (s->busy) {
+        return ret;

is worrying me.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state
  2012-10-23  9:07       ` Jan Kiszka
@ 2012-10-23  9:32         ` liu ping fan
  2012-10-23  9:37           ` Avi Kivity
  0 siblings, 1 reply; 102+ messages in thread
From: liu ping fan @ 2012-10-23  9:32 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

On Tue, Oct 23, 2012 at 5:07 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2012-10-23 07:52, liu ping fan wrote:
>> On Mon, Oct 22, 2012 at 6:40 PM, Avi Kivity <avi@redhat.com> wrote:
>>> On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
>>>> The broken device state is caused by releasing local lock before acquiring
>>>> big lock. To fix this issue, we have two choice:
>>>>   1.use busy flag to protect the state
>>>>     The drawback is that we will introduce independent busy flag for each
>>>>     independent device's logic unit.
>>>>   2.reload the device's state
>>>>     The drawback is if the call chain is too deep, the action to reload will
>>>>     touch each layer. Also the reloading means to recaculate the intermediate
>>>>     result based on device's regs.
>>>>
>>>> This patch adopt the solution 1 to fix the issue.
>>>
>>> Doesn't the nested mmio patch detect this?
>>>
>> It will only record and fix the issue on one thread. But guest can
>> touch the emulated device on muti-threads.
>
> Sorry, what does that mean? A second VCPU accessing the device will
> simply be ignored when it races with another VCPU? Specifically
>
Yes, just ignored.  For device which support many logic in parallel,
it should use independent busy flag for each logic

Regards,
pingfan

> +    if (s->busy) {
> +        return;
>
> and
>
> +    uint64_t ret = 0;
> +
> +    if (s->busy) {
> +        return ret;
>
> is worrying me.
>
> Jan
>
> --
> Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state
  2012-10-23  9:32         ` liu ping fan
@ 2012-10-23  9:37           ` Avi Kivity
  2012-10-24  6:36             ` liu ping fan
  2012-10-25  9:00             ` Peter Maydell
  0 siblings, 2 replies; 102+ messages in thread
From: Avi Kivity @ 2012-10-23  9:37 UTC (permalink / raw)
  To: liu ping fan
  Cc: Liu Ping Fan, Jan Kiszka, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini

On 10/23/2012 11:32 AM, liu ping fan wrote:
> On Tue, Oct 23, 2012 at 5:07 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2012-10-23 07:52, liu ping fan wrote:
>>> On Mon, Oct 22, 2012 at 6:40 PM, Avi Kivity <avi@redhat.com> wrote:
>>>> On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
>>>>> The broken device state is caused by releasing local lock before acquiring
>>>>> big lock. To fix this issue, we have two choice:
>>>>>   1.use busy flag to protect the state
>>>>>     The drawback is that we will introduce independent busy flag for each
>>>>>     independent device's logic unit.
>>>>>   2.reload the device's state
>>>>>     The drawback is if the call chain is too deep, the action to reload will
>>>>>     touch each layer. Also the reloading means to recaculate the intermediate
>>>>>     result based on device's regs.
>>>>>
>>>>> This patch adopt the solution 1 to fix the issue.
>>>>
>>>> Doesn't the nested mmio patch detect this?
>>>>
>>> It will only record and fix the issue on one thread. But guest can
>>> touch the emulated device on muti-threads.
>>
>> Sorry, what does that mean? A second VCPU accessing the device will
>> simply be ignored when it races with another VCPU? Specifically
>>
> Yes, just ignored.  For device which support many logic in parallel,
> it should use independent busy flag for each logic

We don't actually know that e1000 doesn't.  Why won't writing into
different registers in parallel work?


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 08/16] QemuThread: make QemuThread as tls to store extra info
  2012-10-22 17:13     ` Peter Maydell
  2012-10-23  5:58       ` liu ping fan
@ 2012-10-23 11:48       ` Paolo Bonzini
  2012-10-23 11:50         ` Peter Maydell
  1 sibling, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-23 11:48 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Liu Ping Fan, Jan Kiszka, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi

Il 22/10/2012 19:13, Peter Maydell ha scritto:
>> > Can't we enhance qemu-tls.h to work via pthread_setspecific in case
>> > __thread is not working and use that abstraction (DECLARE/DEFINE_TLS)
>> > directly?
> Agreed. (There were prototype patches floating around for Win32
> at least). The only reason qemu-tls.h has the dummy not-actually-tls
> code for non-linux is that IIRC we wanted to get the linux bits
> in quickly before a release and we never got round to going back
> and doing it properly for the other targets.

Which will be "never" for OpenBSD.  It just doesn't have enough support.

Thread-wise OpenBSD is 100% crap, and we should stop supporting it IMHO
until they finish their "new" thread library that's been in the works
for 10 years or so.  FreeBSD is totally ok.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 08/16] QemuThread: make QemuThread as tls to store extra info
  2012-10-23 11:48       ` Paolo Bonzini
@ 2012-10-23 11:50         ` Peter Maydell
  2012-10-23 11:51           ` Jan Kiszka
  2012-10-23 12:00           ` Paolo Bonzini
  0 siblings, 2 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-23 11:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liu Ping Fan, Jan Kiszka, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi

On 23 October 2012 12:48, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 22/10/2012 19:13, Peter Maydell ha scritto:
>>> > Can't we enhance qemu-tls.h to work via pthread_setspecific in case
>>> > __thread is not working and use that abstraction (DECLARE/DEFINE_TLS)
>>> > directly?
>> Agreed. (There were prototype patches floating around for Win32
>> at least). The only reason qemu-tls.h has the dummy not-actually-tls
>> code for non-linux is that IIRC we wanted to get the linux bits
>> in quickly before a release and we never got round to going back
>> and doing it properly for the other targets.
>
> Which will be "never" for OpenBSD.  It just doesn't have enough support.
>
> Thread-wise OpenBSD is 100% crap, and we should stop supporting it IMHO
> until they finish their "new" thread library that's been in the works
> for 10 years or so.  FreeBSD is totally ok.

It doesn't support any kind of TLS? Wow.

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-22  9:38   ` Avi Kivity
@ 2012-10-23 11:51     ` Paolo Bonzini
  2012-10-23 11:55       ` Avi Kivity
  0 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-23 11:51 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka

Il 22/10/2012 11:38, Avi Kivity ha scritto:
>> >  
>> >  typedef struct MemoryRegionOps MemoryRegionOps;
>> >  typedef struct MemoryRegion MemoryRegion;
>> > @@ -66,6 +67,8 @@ struct MemoryRegionOps {
>> >                    target_phys_addr_t addr,
>> >                    uint64_t data,
>> >                    unsigned size);
>> > +    int (*ref)(MemoryRegion *mr);
>> > +    void (*unref)(MemoryRegion *mr);
>> >  
> Why return an int?  Should succeed unconditionally.  Please fold into 7
> (along with 6).

So the stop_machine idea is thrown away?  I really believe we're going
down a rat's nest with reference counting.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 08/16] QemuThread: make QemuThread as tls to store extra info
  2012-10-23 11:50         ` Peter Maydell
@ 2012-10-23 11:51           ` Jan Kiszka
  2012-10-23 12:00           ` Paolo Bonzini
  1 sibling, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-23 11:51 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

On 2012-10-23 13:50, Peter Maydell wrote:
> On 23 October 2012 12:48, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 22/10/2012 19:13, Peter Maydell ha scritto:
>>>>> Can't we enhance qemu-tls.h to work via pthread_setspecific in case
>>>>> __thread is not working and use that abstraction (DECLARE/DEFINE_TLS)
>>>>> directly?
>>> Agreed. (There were prototype patches floating around for Win32
>>> at least). The only reason qemu-tls.h has the dummy not-actually-tls
>>> code for non-linux is that IIRC we wanted to get the linux bits
>>> in quickly before a release and we never got round to going back
>>> and doing it properly for the other targets.
>>
>> Which will be "never" for OpenBSD.  It just doesn't have enough support.
>>
>> Thread-wise OpenBSD is 100% crap, and we should stop supporting it IMHO
>> until they finish their "new" thread library that's been in the works
>> for 10 years or so.  FreeBSD is totally ok.
> 
> It doesn't support any kind of TLS? Wow.

It's probably more secure.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-23 11:51     ` Paolo Bonzini
@ 2012-10-23 11:55       ` Avi Kivity
  2012-10-23 11:57         ` Paolo Bonzini
  2012-10-23 12:04         ` Jan Kiszka
  0 siblings, 2 replies; 102+ messages in thread
From: Avi Kivity @ 2012-10-23 11:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka

On 10/23/2012 01:51 PM, Paolo Bonzini wrote:
> Il 22/10/2012 11:38, Avi Kivity ha scritto:
>>> >  
>>> >  typedef struct MemoryRegionOps MemoryRegionOps;
>>> >  typedef struct MemoryRegion MemoryRegion;
>>> > @@ -66,6 +67,8 @@ struct MemoryRegionOps {
>>> >                    target_phys_addr_t addr,
>>> >                    uint64_t data,
>>> >                    unsigned size);
>>> > +    int (*ref)(MemoryRegion *mr);
>>> > +    void (*unref)(MemoryRegion *mr);
>>> >  
>> Why return an int?  Should succeed unconditionally.  Please fold into 7
>> (along with 6).
> 
> So the stop_machine idea is thrown away?  

IIRC I convinced myself that it's just as bad.

> I really believe we're going
> down a rat's nest with reference counting.

There will be a lot of teething problems, but the same ideas are used
extensively in the kernel.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-23 11:55       ` Avi Kivity
@ 2012-10-23 11:57         ` Paolo Bonzini
  2012-10-23 12:02           ` Avi Kivity
  2012-10-23 12:04         ` Jan Kiszka
  1 sibling, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-23 11:57 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka

Il 23/10/2012 13:55, Avi Kivity ha scritto:
>> > So the stop_machine idea is thrown away?  
> IIRC I convinced myself that it's just as bad.

It may be just as bad, but it is less code (and less pervasive), which
makes it less painful.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 08/16] QemuThread: make QemuThread as tls to store extra info
  2012-10-23 11:50         ` Peter Maydell
  2012-10-23 11:51           ` Jan Kiszka
@ 2012-10-23 12:00           ` Paolo Bonzini
  2012-10-23 12:27             ` Peter Maydell
  2012-11-18 10:02             ` Brad Smith
  1 sibling, 2 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-23 12:00 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Liu Ping Fan, Jan Kiszka, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi

Il 23/10/2012 13:50, Peter Maydell ha scritto:
> On 23 October 2012 12:48, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 22/10/2012 19:13, Peter Maydell ha scritto:
>>>>> Can't we enhance qemu-tls.h to work via pthread_setspecific in case
>>>>> __thread is not working and use that abstraction (DECLARE/DEFINE_TLS)
>>>>> directly?
>>> Agreed. (There were prototype patches floating around for Win32
>>> at least). The only reason qemu-tls.h has the dummy not-actually-tls
>>> code for non-linux is that IIRC we wanted to get the linux bits
>>> in quickly before a release and we never got round to going back
>>> and doing it properly for the other targets.
>>
>> Which will be "never" for OpenBSD.  It just doesn't have enough support.
>>
>> Thread-wise OpenBSD is 100% crap, and we should stop supporting it IMHO
>> until they finish their "new" thread library that's been in the works
>> for 10 years or so.  FreeBSD is totally ok.
> 
> It doesn't support any kind of TLS? Wow.

It does support pthread_get/setspecific, but it didn't support something
else so the qemu-tls.h variant that used pthread_get/setspecific didn't
work either.

And it doesn't support sigaltstack in threads, so it's the only platform
where the gthread-based coroutines are used.  Those are buggy because
the coroutines tend to get random signal masks.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-23 11:57         ` Paolo Bonzini
@ 2012-10-23 12:02           ` Avi Kivity
  2012-10-23 12:06             ` Paolo Bonzini
  0 siblings, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2012-10-23 12:02 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka

On 10/23/2012 01:57 PM, Paolo Bonzini wrote:
> Il 23/10/2012 13:55, Avi Kivity ha scritto:
>>> > So the stop_machine idea is thrown away?  
>> IIRC I convinced myself that it's just as bad.
> 
> It may be just as bad, but it is less code (and less pervasive), which
> makes it less painful.

It saves you the ->ref() and ->unref() calls, which are boilerplate, but
not too onerous. All of the device model and subsystem threading work
still needs to be done.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-23 11:55       ` Avi Kivity
  2012-10-23 11:57         ` Paolo Bonzini
@ 2012-10-23 12:04         ` Jan Kiszka
  2012-10-23 12:12           ` Paolo Bonzini
  1 sibling, 1 reply; 102+ messages in thread
From: Jan Kiszka @ 2012-10-23 12:04 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Paolo Bonzini

On 2012-10-23 13:55, Avi Kivity wrote:
> On 10/23/2012 01:51 PM, Paolo Bonzini wrote:
>> Il 22/10/2012 11:38, Avi Kivity ha scritto:
>>>>>  
>>>>>  typedef struct MemoryRegionOps MemoryRegionOps;
>>>>>  typedef struct MemoryRegion MemoryRegion;
>>>>> @@ -66,6 +67,8 @@ struct MemoryRegionOps {
>>>>>                    target_phys_addr_t addr,
>>>>>                    uint64_t data,
>>>>>                    unsigned size);
>>>>> +    int (*ref)(MemoryRegion *mr);
>>>>> +    void (*unref)(MemoryRegion *mr);
>>>>>  
>>> Why return an int?  Should succeed unconditionally.  Please fold into 7
>>> (along with 6).
>>
>> So the stop_machine idea is thrown away?  
> 
> IIRC I convinced myself that it's just as bad.

One tricky part with stop machine is that legacy code may trigger it
while holding the BQL, does not expect to lose that lock even for a
brief while, but synchronizing on other threads does require dropping
the lock right now. Maybe an implementation detail, but at least a nasty
one.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-23 12:02           ` Avi Kivity
@ 2012-10-23 12:06             ` Paolo Bonzini
  2012-10-23 12:15               ` Avi Kivity
  0 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-23 12:06 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka

Il 23/10/2012 14:02, Avi Kivity ha scritto:
> On 10/23/2012 01:57 PM, Paolo Bonzini wrote:
>> Il 23/10/2012 13:55, Avi Kivity ha scritto:
>>>>> So the stop_machine idea is thrown away?  
>>> IIRC I convinced myself that it's just as bad.
>>
>> It may be just as bad, but it is less code (and less pervasive), which
>> makes it less painful.
> 
> It saves you the ->ref() and ->unref() calls, which are boilerplate, but
> not too onerous. All of the device model and subsystem threading work
> still needs to be done.

I'm not worried about saving the ->ref() and ->unref() calls in the
devices.  I'm worried about saving it in timers, bottom halves and
whatnot.  And also I'm not sure whether all callbacks would have
something to ref/unref as they are implemented now.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-23 12:04         ` Jan Kiszka
@ 2012-10-23 12:12           ` Paolo Bonzini
  2012-10-23 12:16             ` Jan Kiszka
  0 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-23 12:12 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori

Il 23/10/2012 14:04, Jan Kiszka ha scritto:
>>> >>
>>> >> So the stop_machine idea is thrown away?  
>> > 
>> > IIRC I convinced myself that it's just as bad.
> One tricky part with stop machine is that legacy code may trigger it
> while holding the BQL, does not expect to lose that lock even for a
> brief while, but synchronizing on other threads does require dropping
> the lock right now. Maybe an implementation detail, but at least a nasty
> one.

But it would only be triggered by hot-unplug, no?  That is already an
asynchronous action, so it is not a problem to delay the actual
stop_machine+qdev_free (and just that part!) to a bottom half or another
place when it is safe to drop the BQL.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 07/16] memory: make mmio dispatch able to be out of biglock
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 07/16] memory: make mmio dispatch able to be out of biglock Liu Ping Fan
@ 2012-10-23 12:12   ` Jan Kiszka
  2012-10-23 12:36     ` Avi Kivity
  0 siblings, 1 reply; 102+ messages in thread
From: Jan Kiszka @ 2012-10-23 12:12 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Avi Kivity,
	Anthony Liguori, Paolo Bonzini

On 2012-10-22 11:23, Liu Ping Fan wrote:
> Without biglock, we try to protect the mr by increase refcnt.
> If we can inc refcnt, go backward and resort to biglock.
> 
> Another point is memory radix-tree can be flushed by another
> thread, so we should get the copy of terminal mr to survive
> from such issue.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  exec.c |  125 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 files changed, 117 insertions(+), 8 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 5834766..91b859b 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -200,6 +200,8 @@ struct PhysPageEntry {
>      uint16_t ptr : 15;
>  };
>  
> +static QemuMutex mem_map_lock;
> +
>  /* Simple allocator for PhysPageEntry nodes */
>  static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
>  static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
> @@ -212,6 +214,8 @@ static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
>  
>  static void io_mem_init(void);
>  static void memory_map_init(void);
> +static int phys_page_lookup(target_phys_addr_t addr, MemoryRegionSection *mrs);
> +
>  
>  static MemoryRegion io_mem_watch;
>  #endif
> @@ -2245,6 +2249,7 @@ static void register_subpage(MemoryRegionSection *section)
>      subpage_t *subpage;
>      target_phys_addr_t base = section->offset_within_address_space
>          & TARGET_PAGE_MASK;
> +    /* Already under the protection of mem_map_lock */
>      MemoryRegionSection *existing = phys_page_find(base >> TARGET_PAGE_BITS);
>      MemoryRegionSection subsection = {
>          .offset_within_address_space = base,
> @@ -3165,6 +3170,8 @@ static void io_mem_init(void)
>  
>  static void core_begin(MemoryListener *listener)
>  {
> +    /* protect the updating process of mrs in memory core agaist readers */
> +    qemu_mutex_lock(&mem_map_lock);
>      destroy_all_mappings();
>      phys_sections_clear();
>      phys_map.ptr = PHYS_MAP_NODE_NIL;
> @@ -3184,17 +3191,32 @@ static void core_commit(MemoryListener *listener)
>      for(env = first_cpu; env != NULL; env = env->next_cpu) {
>          tlb_flush(env, 1);
>      }
> +    qemu_mutex_unlock(&mem_map_lock);
>  }
>  
>  static void core_region_add(MemoryListener *listener,
>                              MemoryRegionSection *section)
>  {
> +    MemoryRegion *mr = section->mr;
> +
> +    if (mr->ops) {
> +        if (mr->ops->ref) {

if (mr->ops && mr->ops->ref) {

here and in the cases below. Unless we avoid that callback anyway,
turning it into an mr flag.

> +            mr->ops->ref(mr);
> +        }
> +    }
>      cpu_register_physical_memory_log(section, section->readonly);
>  }
>  
>  static void core_region_del(MemoryListener *listener,
>                              MemoryRegionSection *section)
>  {
> +    MemoryRegion *mr = section->mr;
> +
> +    if (mr->ops) {
> +        if (mr->ops->unref) {
> +            mr->ops->unref(mr);
> +        }
> +    }
>  }
>  
>  static void core_region_nop(MemoryListener *listener,
> @@ -3348,6 +3370,8 @@ static void memory_map_init(void)
>      memory_region_init(system_io, "io", 65536);
>      set_system_io_map(system_io);
>  
> +    qemu_mutex_init(&mem_map_lock);
> +
>      memory_listener_register(&core_memory_listener, system_memory);
>      memory_listener_register(&io_memory_listener, system_io);
>  }
> @@ -3406,6 +3430,58 @@ int cpu_memory_rw_debug(CPUArchState *env, target_ulong addr,
>  }
>  
>  #else
> +
> +static MemoryRegionSection *subpage_get_terminal(subpage_t *mmio,
> +    target_phys_addr_t addr)
> +{
> +    MemoryRegionSection *section;
> +    unsigned int idx = SUBPAGE_IDX(addr);
> +
> +    section = &phys_sections[mmio->sub_section[idx]];
> +    return section;
> +}
> +
> +static int memory_region_section_ref(MemoryRegionSection *mrs)
> +{
> +    MemoryRegion *mr;
> +    int ret = 0;
> +
> +    mr = mrs->mr;
> +    if (mr->ops) {
> +        if (mr->ops->ref) {
> +            ret = mr->ops->ref(mr);
> +        }
> +    }
> +    return ret;

The return type should be bool, delivering true if reference was successful.

> +}
> +
> +static void memory_region_section_unref(MemoryRegionSection *mrs)
> +{
> +    MemoryRegion *mr;
> +
> +    mr = mrs->mr;
> +    if (mr->ops) {
> +        if (mr->ops->unref) {
> +            mr->ops->unref(mr);
> +        }
> +    }
> +}
> +
> +static int phys_page_lookup(target_phys_addr_t addr, MemoryRegionSection *mrs)
> +{
> +    MemoryRegionSection *section;
> +    int ret;
> +
> +    section = phys_page_find(addr >> TARGET_PAGE_BITS);
> +    if (section->mr->subpage) {
> +        section = subpage_get_terminal(section->mr->opaque, addr);
> +    }
> +    *mrs = *section;
> +    ret = memory_region_section_ref(mrs);
> +
> +    return ret;
> +}
> +
>  void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>                              int len, int is_write)
>  {
> @@ -3413,14 +3489,28 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>      uint8_t *ptr;
>      uint32_t val;
>      target_phys_addr_t page;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection *section, obj_mrs;
> +    int safe_ref;
>  
>      while (len > 0) {
>          page = addr & TARGET_PAGE_MASK;
>          l = (page + TARGET_PAGE_SIZE) - addr;
>          if (l > len)
>              l = len;
> -        section = phys_page_find(page >> TARGET_PAGE_BITS);
> +        qemu_mutex_lock(&mem_map_lock);
> +        safe_ref = phys_page_lookup(page, &obj_mrs);
> +        qemu_mutex_unlock(&mem_map_lock);
> +        if (safe_ref == 0) {
> +            qemu_mutex_lock_iothread();
> +            qemu_mutex_lock(&mem_map_lock);
> +            /* At the 2nd try, mem map can change, so need to judge it again */
> +            safe_ref = phys_page_lookup(page, &obj_mrs);
> +            qemu_mutex_unlock(&mem_map_lock);
> +            if (safe_ref > 0) {
> +                qemu_mutex_unlock_iothread();
> +            }
> +        }
> +        section = &obj_mrs;
>  
>          if (is_write) {
>              if (!memory_region_is_ram(section->mr)) {
> @@ -3491,10 +3581,16 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>                  qemu_put_ram_ptr(ptr);
>              }
>          }
> +
> +        memory_region_section_unref(&obj_mrs);

The mapping cannot change from not-referenced to reference-counted while
we were dispatching? I mean the case where we found not ref callback on
entry and took the big lock, but now there is an unref callback.

>          len -= l;
>          buf += l;
>          addr += l;
> +        if (safe_ref == 0) {
> +            qemu_mutex_unlock_iothread();
> +        }
>      }
> +
>  }
>  
>  /* used for ROM loading : can write in RAM and ROM */
> @@ -3504,14 +3600,18 @@ void cpu_physical_memory_write_rom(target_phys_addr_t addr,
>      int l;
>      uint8_t *ptr;
>      target_phys_addr_t page;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection *section, mr_obj;
>  
>      while (len > 0) {
>          page = addr & TARGET_PAGE_MASK;
>          l = (page + TARGET_PAGE_SIZE) - addr;
>          if (l > len)
>              l = len;
> -        section = phys_page_find(page >> TARGET_PAGE_BITS);
> +
> +        qemu_mutex_lock(&mem_map_lock);
> +        phys_page_lookup(page, &mr_obj);
> +        qemu_mutex_unlock(&mem_map_lock);
> +        section = &mr_obj;

But here we don't care about the return code of phys_page_lookup and all
related topics? Because we assume the BQL is held? Reminds me that we
will need some support for assert(qemu_mutex_is_locked(&lock)).

>  
>          if (!(memory_region_is_ram(section->mr) ||
>                memory_region_is_romd(section->mr))) {
> @@ -3528,6 +3628,7 @@ void cpu_physical_memory_write_rom(target_phys_addr_t addr,
>          len -= l;
>          buf += l;
>          addr += l;
> +        memory_region_section_unref(&mr_obj);
>      }
>  }
>  
> @@ -3592,7 +3693,7 @@ void *cpu_physical_memory_map(target_phys_addr_t addr,
>      target_phys_addr_t todo = 0;
>      int l;
>      target_phys_addr_t page;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection *section, mr_obj;
>      ram_addr_t raddr = RAM_ADDR_MAX;
>      ram_addr_t rlen;
>      void *ret;
> @@ -3602,7 +3703,10 @@ void *cpu_physical_memory_map(target_phys_addr_t addr,
>          l = (page + TARGET_PAGE_SIZE) - addr;
>          if (l > len)
>              l = len;
> -        section = phys_page_find(page >> TARGET_PAGE_BITS);
> +        qemu_mutex_lock(&mem_map_lock);
> +        phys_page_lookup(page, &mr_obj);
> +        qemu_mutex_unlock(&mem_map_lock);
> +        section = &mr_obj;
>  
>          if (!(memory_region_is_ram(section->mr) && !section->readonly)) {
>              if (todo || bounce.buffer) {
> @@ -3616,6 +3720,7 @@ void *cpu_physical_memory_map(target_phys_addr_t addr,
>              }
>  
>              *plen = l;
> +            memory_region_section_unref(&mr_obj);
>              return bounce.buffer;
>          }
>          if (!todo) {
> @@ -3630,6 +3735,7 @@ void *cpu_physical_memory_map(target_phys_addr_t addr,
>      rlen = todo;
>      ret = qemu_ram_ptr_length(raddr, &rlen);
>      *plen = rlen;
> +    memory_region_section_unref(&mr_obj);
>      return ret;
>  }
>  
> @@ -4239,9 +4345,12 @@ bool virtio_is_big_endian(void)
>  #ifndef CONFIG_USER_ONLY
>  bool cpu_physical_memory_is_io(target_phys_addr_t phys_addr)
>  {
> -    MemoryRegionSection *section;
> +    MemoryRegionSection *section, mr_obj;
>  
> -    section = phys_page_find(phys_addr >> TARGET_PAGE_BITS);
> +    qemu_mutex_lock(&mem_map_lock);
> +    phys_page_lookup(phys_addr, &mr_obj);
> +    qemu_mutex_unlock(&mem_map_lock);
> +    section = &mr_obj;

Err, no unref needed here?

>  
>      return !(memory_region_is_ram(section->mr) ||
>               memory_region_is_romd(section->mr));
> 

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-23 12:06             ` Paolo Bonzini
@ 2012-10-23 12:15               ` Avi Kivity
  2012-10-23 12:32                 ` Paolo Bonzini
  0 siblings, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2012-10-23 12:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka

On 10/23/2012 02:06 PM, Paolo Bonzini wrote:
> Il 23/10/2012 14:02, Avi Kivity ha scritto:
>> On 10/23/2012 01:57 PM, Paolo Bonzini wrote:
>>> Il 23/10/2012 13:55, Avi Kivity ha scritto:
>>>>>> So the stop_machine idea is thrown away?  
>>>> IIRC I convinced myself that it's just as bad.
>>>
>>> It may be just as bad, but it is less code (and less pervasive), which
>>> makes it less painful.
>> 
>> It saves you the ->ref() and ->unref() calls, which are boilerplate, but
>> not too onerous. All of the device model and subsystem threading work
>> still needs to be done.
> 
> I'm not worried about saving the ->ref() and ->unref() calls in the
> devices.  I'm worried about saving it in timers, bottom halves and
> whatnot.  And also I'm not sure whether all callbacks would have
> something to ref/unref as they are implemented now.

Hard to say without examples.

Something that bothers be with stop_machine is the reliance on
cancellation.  With timers it's easy, stop_machine, remove the timer,
resume.  But if you have an aio operation in progress that is not
cancellable, you have to wait for that operation to complete.  Refcounts
handle that well, the object stays until completion, then disappears.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-23 12:12           ` Paolo Bonzini
@ 2012-10-23 12:16             ` Jan Kiszka
  2012-10-23 12:28               ` Avi Kivity
  0 siblings, 1 reply; 102+ messages in thread
From: Jan Kiszka @ 2012-10-23 12:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori

On 2012-10-23 14:12, Paolo Bonzini wrote:
> Il 23/10/2012 14:04, Jan Kiszka ha scritto:
>>>>>>
>>>>>> So the stop_machine idea is thrown away?  
>>>>
>>>> IIRC I convinced myself that it's just as bad.
>> One tricky part with stop machine is that legacy code may trigger it
>> while holding the BQL, does not expect to lose that lock even for a
>> brief while, but synchronizing on other threads does require dropping
>> the lock right now. Maybe an implementation detail, but at least a nasty
>> one.
> 
> But it would only be triggered by hot-unplug, no?

Once all code that adds/removes memory regions from within access
handlers is converted. Legacy is biting, not necessarily the pure model.

>  That is already an
> asynchronous action, so it is not a problem to delay the actual
> stop_machine+qdev_free (and just that part!) to a bottom half or another
> place when it is safe to drop the BQL.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 08/16] QemuThread: make QemuThread as tls to store extra info
  2012-10-23 12:00           ` Paolo Bonzini
@ 2012-10-23 12:27             ` Peter Maydell
  2012-11-18 10:02             ` Brad Smith
  1 sibling, 0 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-23 12:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liu Ping Fan, Jan Kiszka, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Stefan Hajnoczi

On 23 October 2012 13:00, Paolo Bonzini <pbonzini@redhat.com> wrote:
> It does support pthread_get/setspecific, but it didn't support something
> else so the qemu-tls.h variant that used pthread_get/setspecific didn't
> work either.
>
> And it doesn't support sigaltstack in threads, so it's the only platform
> where the gthread-based coroutines are used.  Those are buggy because
> the coroutines tend to get random signal masks.

MacOS uses the gthread version too. In fact anything that doesn't
doesn't use makecontext will use gthread -- you won't get the
sigaltstack version unless you explicitly ask configure for it.

[insert usual rant here about what a bad idea coroutines are]

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-23 12:16             ` Jan Kiszka
@ 2012-10-23 12:28               ` Avi Kivity
  2012-10-23 12:40                 ` Jan Kiszka
  0 siblings, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2012-10-23 12:28 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Paolo Bonzini

On 10/23/2012 02:16 PM, Jan Kiszka wrote:
> On 2012-10-23 14:12, Paolo Bonzini wrote:
>> Il 23/10/2012 14:04, Jan Kiszka ha scritto:
>>>>>>>
>>>>>>> So the stop_machine idea is thrown away?  
>>>>>
>>>>> IIRC I convinced myself that it's just as bad.
>>> One tricky part with stop machine is that legacy code may trigger it
>>> while holding the BQL, does not expect to lose that lock even for a
>>> brief while, but synchronizing on other threads does require dropping
>>> the lock right now. Maybe an implementation detail, but at least a nasty
>>> one.
>> 
>> But it would only be triggered by hot-unplug, no?
> 
> Once all code that adds/removes memory regions from within access
> handlers is converted. 

add/del is fine.  memory_region_destroy() is the problem.  I have
patches queued that fix those problems and add an assert() to make sure
we don't add more.

It's not just memory regions, it's practically anything that can be
removed and that has callbacks.  The two proposals are:

- qomify
- split unplug into isolate+destroy
- let the issuer of the callbacks manage the reference counts

vs

- split unplug into isolate+destroy
- let unplug defer destruction to a bottom half, and stop_machine there
- if we depend on the results [1], add a continuation

[1] Say a monitor command wants to return only after the block device
has been detached from qemu

> Legacy is biting, not necessarily the pure model.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-23 12:15               ` Avi Kivity
@ 2012-10-23 12:32                 ` Paolo Bonzini
  2012-10-23 14:49                   ` Avi Kivity
  0 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-23 12:32 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka

Il 23/10/2012 14:15, Avi Kivity ha scritto:
> On 10/23/2012 02:06 PM, Paolo Bonzini wrote:
>> Il 23/10/2012 14:02, Avi Kivity ha scritto:
>>> On 10/23/2012 01:57 PM, Paolo Bonzini wrote:
>>>> Il 23/10/2012 13:55, Avi Kivity ha scritto:
>>>>>>> So the stop_machine idea is thrown away?  
>>>>> IIRC I convinced myself that it's just as bad.
>>>>
>>>> It may be just as bad, but it is less code (and less pervasive), which
>>>> makes it less painful.
>>>
>>> It saves you the ->ref() and ->unref() calls, which are boilerplate, but
>>> not too onerous. All of the device model and subsystem threading work
>>> still needs to be done.
>>
>> I'm not worried about saving the ->ref() and ->unref() calls in the
>> devices.  I'm worried about saving it in timers, bottom halves and
>> whatnot.  And also I'm not sure whether all callbacks would have
>> something to ref/unref as they are implemented now.
> 
> Hard to say without examples.
> 
> Something that bothers be with stop_machine is the reliance on
> cancellation.  With timers it's easy, stop_machine, remove the timer,
> resume.  But if you have an aio operation in progress that is not
> cancellable, you have to wait for that operation to complete.  Refcounts
> handle that well, the object stays until completion, then disappears.

Yes, that's the point of doing things asynchronously---you do not need
to do everything within stop_machine, you can start canceling AIO as
soon as the OS sends the hot-unplug request.  Then you only proceed with
stop_machine and freeing device memory when the first part.

In other words, isolate can complete asynchronously.

The good thing is that this is an improvement that can be applied on top
of the current code, which avoids doing too many things at once...

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 07/16] memory: make mmio dispatch able to be out of biglock
  2012-10-23 12:12   ` Jan Kiszka
@ 2012-10-23 12:36     ` Avi Kivity
  2012-10-24  6:31       ` liu ping fan
  0 siblings, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2012-10-23 12:36 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Paolo Bonzini

On 10/23/2012 02:12 PM, Jan Kiszka wrote:
> On 2012-10-22 11:23, Liu Ping Fan wrote:
>> Without biglock, we try to protect the mr by increase refcnt.
>> If we can inc refcnt, go backward and resort to biglock.
>> 
>> Another point is memory radix-tree can be flushed by another
>> thread, so we should get the copy of terminal mr to survive
>> from such issue.
>> 
>> +
>>  void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>                              int len, int is_write)
>>  {
>> @@ -3413,14 +3489,28 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>      uint8_t *ptr;
>>      uint32_t val;
>>      target_phys_addr_t page;
>> -    MemoryRegionSection *section;
>> +    MemoryRegionSection *section, obj_mrs;
>> +    int safe_ref;
>>  
>>      while (len > 0) {
>>          page = addr & TARGET_PAGE_MASK;
>>          l = (page + TARGET_PAGE_SIZE) - addr;
>>          if (l > len)
>>              l = len;
>> -        section = phys_page_find(page >> TARGET_PAGE_BITS);
>> +        qemu_mutex_lock(&mem_map_lock);
>> +        safe_ref = phys_page_lookup(page, &obj_mrs);
>> +        qemu_mutex_unlock(&mem_map_lock);
>> +        if (safe_ref == 0) {
>> +            qemu_mutex_lock_iothread();
>> +            qemu_mutex_lock(&mem_map_lock);
>> +            /* At the 2nd try, mem map can change, so need to judge it again */
>> +            safe_ref = phys_page_lookup(page, &obj_mrs);
>> +            qemu_mutex_unlock(&mem_map_lock);
>> +            if (safe_ref > 0) {
>> +                qemu_mutex_unlock_iothread();
>> +            }
>> +        }
>> +        section = &obj_mrs;
>>  
>>          if (is_write) {
>>              if (!memory_region_is_ram(section->mr)) {
>> @@ -3491,10 +3581,16 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>                  qemu_put_ram_ptr(ptr);
>>              }
>>          }
>> +
>> +        memory_region_section_unref(&obj_mrs);
> 
> The mapping cannot change from not-referenced to reference-counted while
> we were dispatching? I mean the case where we found not ref callback on
> entry and took the big lock, but now there is an unref callback.

We drop the big lock in that case, so we end up in the same situation.

> 
>>          len -= l;
>>          buf += l;
>>          addr += l;
>> +        if (safe_ref == 0) {
>> +            qemu_mutex_unlock_iothread();
>> +        }
>>      }
>> +
>>  }
>>  
>>  /* used for ROM loading : can write in RAM and ROM */
>> @@ -3504,14 +3600,18 @@ void cpu_physical_memory_write_rom(target_phys_addr_t addr,
>>      int l;
>>      uint8_t *ptr;
>>      target_phys_addr_t page;
>> -    MemoryRegionSection *section;
>> +    MemoryRegionSection *section, mr_obj;
>>  
>>      while (len > 0) {
>>          page = addr & TARGET_PAGE_MASK;
>>          l = (page + TARGET_PAGE_SIZE) - addr;
>>          if (l > len)
>>              l = len;
>> -        section = phys_page_find(page >> TARGET_PAGE_BITS);
>> +
>> +        qemu_mutex_lock(&mem_map_lock);
>> +        phys_page_lookup(page, &mr_obj);
>> +        qemu_mutex_unlock(&mem_map_lock);
>> +        section = &mr_obj;
> 
> But here we don't care about the return code of phys_page_lookup and all
> related topics? Because we assume the BQL is held? Reminds me that we
> will need some support for assert(qemu_mutex_is_locked(&lock)).

I guess it's better to drop that assumption than to have asymmetric APIs.

>>  
>> @@ -4239,9 +4345,12 @@ bool virtio_is_big_endian(void)
>>  #ifndef CONFIG_USER_ONLY
>>  bool cpu_physical_memory_is_io(target_phys_addr_t phys_addr)
>>  {
>> -    MemoryRegionSection *section;
>> +    MemoryRegionSection *section, mr_obj;
>>  
>> -    section = phys_page_find(phys_addr >> TARGET_PAGE_BITS);
>> +    qemu_mutex_lock(&mem_map_lock);
>> +    phys_page_lookup(phys_addr, &mr_obj);
>> +    qemu_mutex_unlock(&mem_map_lock);
>> +    section = &mr_obj;
> 
> Err, no unref needed here?

Need _ref in the name to remind reviewers that it leaves the refcount
unbalanced.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 09/16] memory: introduce mmio request pending to anti nested DMA
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 09/16] memory: introduce mmio request pending to anti nested DMA Liu Ping Fan
  2012-10-22 10:28   ` Avi Kivity
@ 2012-10-23 12:38   ` Gleb Natapov
  2012-10-24  6:31     ` liu ping fan
  1 sibling, 1 reply; 102+ messages in thread
From: Gleb Natapov @ 2012-10-23 12:38 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Avi Kivity,
	Anthony Liguori, Jan Kiszka, Paolo Bonzini

On Mon, Oct 22, 2012 at 05:23:52PM +0800, Liu Ping Fan wrote:
> Rejecting the nested mmio request which does not aim at RAM, so we
> can avoid the potential deadlock caused by the random lock sequence
> of two device's local lock.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  cpus.c              |   14 ++++++++++++++
>  exec.c              |   50 ++++++++++++++++++++++++++++++++++++--------------
>  hw/hw.h             |    1 +
>  kvm-all.c           |    2 ++
>  qemu-thread-posix.h |    3 +++
>  qemu-thread.h       |    2 ++
>  6 files changed, 58 insertions(+), 14 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index 4cd7f85..365a512 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -729,6 +729,18 @@ static void qemu_kvm_wait_io_event(CPUArchState *env)
>      qemu_wait_io_event_common(env);
>  }
>  
> +int get_context_type(void)
> +{
> +    QemuThread *t = pthread_getspecific(qemu_thread_key);
> +    return t->context_type;
> +}
> +
You defined the function but not use it.

What 0/1 context_type means?

> +void set_context_type(int type)
> +{
> +    QemuThread *t = pthread_getspecific(qemu_thread_key);
> +    t->context_type = type;
> +}
> +
>  static void *qemu_kvm_cpu_thread_fn(void *arg)
>  {
>      CPUArchState *env = arg;
> @@ -736,6 +748,8 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>      int r;
>  
>      pthread_setspecific(qemu_thread_key, cpu->thread);
> +    set_context_type(0);
> +
>      qemu_mutex_lock(&qemu_global_mutex);
>      qemu_thread_get_self(cpu->thread);
>      env->thread_id = qemu_get_thread_id();
> diff --git a/exec.c b/exec.c
> index 91b859b..a0327a1 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -3490,7 +3490,9 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>      uint32_t val;
>      target_phys_addr_t page;
>      MemoryRegionSection *section, obj_mrs;
> -    int safe_ref;
> +    int safe_ref, nested_dma = 0;
> +    QemuThread *thread = pthread_getspecific(qemu_thread_key);
> +    int context = thread->context_type;
>  
>      while (len > 0) {
>          page = addr & TARGET_PAGE_MASK;
> @@ -3500,7 +3502,8 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>          qemu_mutex_lock(&mem_map_lock);
>          safe_ref = phys_page_lookup(page, &obj_mrs);
>          qemu_mutex_unlock(&mem_map_lock);
> -        if (safe_ref == 0) {
> +
> +        if (safe_ref == 0 && context == 1) {
>              qemu_mutex_lock_iothread();
>              qemu_mutex_lock(&mem_map_lock);
>              /* At the 2nd try, mem map can change, so need to judge it again */
> @@ -3511,7 +3514,9 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>              }
>          }
>          section = &obj_mrs;
> -
> +        if (context == 1) {
> +            nested_dma = thread->mmio_request_pending++ > 1 ? 1 : 0;
> +        }
>          if (is_write) {
>              if (!memory_region_is_ram(section->mr)) {
>                  target_phys_addr_t addr1;
> @@ -3521,17 +3526,23 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>                  if (l >= 4 && ((addr1 & 3) == 0)) {
>                      /* 32 bit write access */
>                      val = ldl_p(buf);
> -                    io_mem_write(section->mr, addr1, val, 4);
> +                    if (!nested_dma) {
> +                        io_mem_write(section->mr, addr1, val, 4);
> +                    }
>                      l = 4;
>                  } else if (l >= 2 && ((addr1 & 1) == 0)) {
>                      /* 16 bit write access */
>                      val = lduw_p(buf);
> -                    io_mem_write(section->mr, addr1, val, 2);
> +                    if (!nested_dma) {
> +                        io_mem_write(section->mr, addr1, val, 2);
> +                    }
>                      l = 2;
>                  } else {
>                      /* 8 bit write access */
>                      val = ldub_p(buf);
> -                    io_mem_write(section->mr, addr1, val, 1);
> +                    if (!nested_dma) {
> +                        io_mem_write(section->mr, addr1, val, 1);
> +                    }
>                      l = 1;
>                  }
>              } else if (!section->readonly) {
> @@ -3552,24 +3563,31 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>              }
>          } else {
>              if (!(memory_region_is_ram(section->mr) ||
> -                  memory_region_is_romd(section->mr))) {
> +                  memory_region_is_romd(section->mr)) &&
> +                    !nested_dma) {
>                  target_phys_addr_t addr1;
>                  /* I/O case */
>                  addr1 = memory_region_section_addr(section, addr);
>                  if (l >= 4 && ((addr1 & 3) == 0)) {
>                      /* 32 bit read access */
> -                    val = io_mem_read(section->mr, addr1, 4);
> -                    stl_p(buf, val);
> +                    if (!nested_dma) {
> +                        val = io_mem_read(section->mr, addr1, 4);
> +                        stl_p(buf, val);
> +                    }
>                      l = 4;
>                  } else if (l >= 2 && ((addr1 & 1) == 0)) {
>                      /* 16 bit read access */
> -                    val = io_mem_read(section->mr, addr1, 2);
> -                    stw_p(buf, val);
> +                    if (!nested_dma) {
> +                        val = io_mem_read(section->mr, addr1, 2);
> +                        stw_p(buf, val);
> +                    }
>                      l = 2;
>                  } else {
>                      /* 8 bit read access */
> -                    val = io_mem_read(section->mr, addr1, 1);
> -                    stb_p(buf, val);
> +                    if (!nested_dma) {
> +                        val = io_mem_read(section->mr, addr1, 1);
> +                        stb_p(buf, val);
> +                    }
>                      l = 1;
>                  }
>              } else {
> @@ -3586,7 +3604,11 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>          len -= l;
>          buf += l;
>          addr += l;
> -        if (safe_ref == 0) {
> +
> +        if (context == 1) {
> +            thread->mmio_request_pending--;
> +        }
> +        if (safe_ref == 0 && context == 1) {
>              qemu_mutex_unlock_iothread();
>          }
>      }
> diff --git a/hw/hw.h b/hw/hw.h
> index e5cb9bf..935b045 100644
> --- a/hw/hw.h
> +++ b/hw/hw.h
> @@ -12,6 +12,7 @@
>  #include "irq.h"
>  #include "qemu-file.h"
>  #include "vmstate.h"
> +#include "qemu-thread.h"
>  
>  #ifdef NEED_CPU_H
>  #if TARGET_LONG_BITS == 64
> diff --git a/kvm-all.c b/kvm-all.c
> index 34b02c1..b3fa597 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -1562,10 +1562,12 @@ int kvm_cpu_exec(CPUArchState *env)
>              break;
>          case KVM_EXIT_MMIO:
>              DPRINTF("handle_mmio\n");
> +            set_context_type(1);
>              cpu_physical_memory_rw(run->mmio.phys_addr,
>                                     run->mmio.data,
>                                     run->mmio.len,
>                                     run->mmio.is_write);
> +            set_context_type(0);
>              ret = 0;
>              break;
>          case KVM_EXIT_IRQ_WINDOW_OPEN:
> diff --git a/qemu-thread-posix.h b/qemu-thread-posix.h
> index 2607b1c..9fcc6f8 100644
> --- a/qemu-thread-posix.h
> +++ b/qemu-thread-posix.h
> @@ -12,6 +12,9 @@ struct QemuCond {
>  
>  struct QemuThread {
>      pthread_t thread;
> +    /* 0 clean; 1 mmio; 2 io */
> +    int context_type;
> +    int mmio_request_pending;
>  };
>  
>  extern pthread_key_t qemu_thread_key;
> diff --git a/qemu-thread.h b/qemu-thread.h
> index 4a6427d..88eaf94 100644
> --- a/qemu-thread.h
> +++ b/qemu-thread.h
> @@ -45,6 +45,8 @@ void *qemu_thread_join(QemuThread *thread);
>  void qemu_thread_get_self(QemuThread *thread);
>  bool qemu_thread_is_self(QemuThread *thread);
>  void qemu_thread_exit(void *retval);
> +int get_context_type(void);
> +void set_context_type(int type);
>  
>  void qemu_thread_key_create(void);
>  #endif
> -- 
> 1.7.4.4
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-23 12:28               ` Avi Kivity
@ 2012-10-23 12:40                 ` Jan Kiszka
  2012-10-23 14:37                   ` Avi Kivity
  0 siblings, 1 reply; 102+ messages in thread
From: Jan Kiszka @ 2012-10-23 12:40 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Paolo Bonzini

On 2012-10-23 14:28, Avi Kivity wrote:
> On 10/23/2012 02:16 PM, Jan Kiszka wrote:
>> On 2012-10-23 14:12, Paolo Bonzini wrote:
>>> Il 23/10/2012 14:04, Jan Kiszka ha scritto:
>>>>>>>>
>>>>>>>> So the stop_machine idea is thrown away?  
>>>>>>
>>>>>> IIRC I convinced myself that it's just as bad.
>>>> One tricky part with stop machine is that legacy code may trigger it
>>>> while holding the BQL, does not expect to lose that lock even for a
>>>> brief while, but synchronizing on other threads does require dropping
>>>> the lock right now. Maybe an implementation detail, but at least a nasty
>>>> one.
>>>
>>> But it would only be triggered by hot-unplug, no?
>>
>> Once all code that adds/removes memory regions from within access
>> handlers is converted. 
> 
> add/del is fine.  memory_region_destroy() is the problem.  I have
> patches queued that fix those problems and add an assert() to make sure
> we don't add more.
> 
> It's not just memory regions, it's practically anything that can be
> removed and that has callbacks.  The two proposals are:
> 
> - qomify
> - split unplug into isolate+destroy
> - let the issuer of the callbacks manage the reference counts

What do you mean with the last one?

> 
> vs
> 
> - split unplug into isolate+destroy
> - let unplug defer destruction to a bottom half, and stop_machine there
> - if we depend on the results [1], add a continuation
> 
> [1] Say a monitor command wants to return only after the block device
> has been detached from qemu

The monitor is likely harmless (as it's pretty confined). But is that
all? Hunting down all (corner) cases will make switching to this model
tricky.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-23 12:40                 ` Jan Kiszka
@ 2012-10-23 14:37                   ` Avi Kivity
  0 siblings, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2012-10-23 14:37 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Paolo Bonzini

On 10/23/2012 02:40 PM, Jan Kiszka wrote:
> On 2012-10-23 14:28, Avi Kivity wrote:
>> On 10/23/2012 02:16 PM, Jan Kiszka wrote:
>>> On 2012-10-23 14:12, Paolo Bonzini wrote:
>>>> Il 23/10/2012 14:04, Jan Kiszka ha scritto:
>>>>>>>>>
>>>>>>>>> So the stop_machine idea is thrown away?  
>>>>>>>
>>>>>>> IIRC I convinced myself that it's just as bad.
>>>>> One tricky part with stop machine is that legacy code may trigger it
>>>>> while holding the BQL, does not expect to lose that lock even for a
>>>>> brief while, but synchronizing on other threads does require dropping
>>>>> the lock right now. Maybe an implementation detail, but at least a nasty
>>>>> one.
>>>>
>>>> But it would only be triggered by hot-unplug, no?
>>>
>>> Once all code that adds/removes memory regions from within access
>>> handlers is converted. 
>> 
>> add/del is fine.  memory_region_destroy() is the problem.  I have
>> patches queued that fix those problems and add an assert() to make sure
>> we don't add more.
>> 
>> It's not just memory regions, it's practically anything that can be
>> removed and that has callbacks.  The two proposals are:
>> 
>> - qomify
>> - split unplug into isolate+destroy
>> - let the issuer of the callbacks manage the reference counts
> 
> What do you mean with the last one?

Call ref/unref as needed (this patchset).

Here, "the issuer of the callbacks" is the memory core.  For timer
callbacks, it is the timer subsystem.

> 
>> 
>> vs
>> 
>> - split unplug into isolate+destroy
>> - let unplug defer destruction to a bottom half, and stop_machine there
>> - if we depend on the results [1], add a continuation
>> 
>> [1] Say a monitor command wants to return only after the block device
>> has been detached from qemu
> 
> The monitor is likely harmless (as it's pretty confined). But is that
> all? Hunting down all (corner) cases will make switching to this model
> tricky.

That is my feeling as well.  The first model requires more work, but is
complete.  The second model is easier, but we may run into a wall if we
find a case it doesn't cover.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-23 12:32                 ` Paolo Bonzini
@ 2012-10-23 14:49                   ` Avi Kivity
  2012-10-23 15:26                     ` Paolo Bonzini
  0 siblings, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2012-10-23 14:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka

On 10/23/2012 02:32 PM, Paolo Bonzini wrote:
> Il 23/10/2012 14:15, Avi Kivity ha scritto:
>> On 10/23/2012 02:06 PM, Paolo Bonzini wrote:
>>> Il 23/10/2012 14:02, Avi Kivity ha scritto:
>>>> On 10/23/2012 01:57 PM, Paolo Bonzini wrote:
>>>>> Il 23/10/2012 13:55, Avi Kivity ha scritto:
>>>>>>>> So the stop_machine idea is thrown away?  
>>>>>> IIRC I convinced myself that it's just as bad.
>>>>>
>>>>> It may be just as bad, but it is less code (and less pervasive), which
>>>>> makes it less painful.
>>>>
>>>> It saves you the ->ref() and ->unref() calls, which are boilerplate, but
>>>> not too onerous. All of the device model and subsystem threading work
>>>> still needs to be done.
>>>
>>> I'm not worried about saving the ->ref() and ->unref() calls in the
>>> devices.  I'm worried about saving it in timers, bottom halves and
>>> whatnot.  And also I'm not sure whether all callbacks would have
>>> something to ref/unref as they are implemented now.
>> 
>> Hard to say without examples.
>> 
>> Something that bothers be with stop_machine is the reliance on
>> cancellation.  With timers it's easy, stop_machine, remove the timer,
>> resume.  But if you have an aio operation in progress that is not
>> cancellable, you have to wait for that operation to complete.  Refcounts
>> handle that well, the object stays until completion, then disappears.
> 
> Yes, that's the point of doing things asynchronously---you do not need
> to do everything within stop_machine, you can start canceling AIO as
> soon as the OS sends the hot-unplug request.  Then you only proceed with
> stop_machine and freeing device memory when the first part.

You cannot always cancel I/O (for example threaded I/O already in progress).

> In other words, isolate can complete asynchronously.

Can it?  I don't think so.

Here's how I see it:

 1. non-malicious guest stops driving device
 2. isolate()
 3. a malicious guest cannot drive the device at this point
 4. some kind of barrier to let the device, or drive activity from a
malicious guest, wind down
 5. destroy()

If you need to report the completion of step 2, it cannot be done
asynchronously.  One example is if the guest drivers the process from an
mmio callback.

We may also want notification after step 4 (or 5); if the device holds
some host resource someone may want to know that it is ready for reuse.

> The good thing is that this is an improvement that can be applied on top
> of the current code, which avoids doing too many things at once...



-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-23 14:49                   ` Avi Kivity
@ 2012-10-23 15:26                     ` Paolo Bonzini
  2012-10-23 16:09                       ` Avi Kivity
  0 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-23 15:26 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka

Il 23/10/2012 16:49, Avi Kivity ha scritto:
> On 10/23/2012 02:32 PM, Paolo Bonzini wrote:
>> Il 23/10/2012 14:15, Avi Kivity ha scritto:
>>> On 10/23/2012 02:06 PM, Paolo Bonzini wrote:
>>>> Il 23/10/2012 14:02, Avi Kivity ha scritto:
>>>>> On 10/23/2012 01:57 PM, Paolo Bonzini wrote:
>>>>>> Il 23/10/2012 13:55, Avi Kivity ha scritto:
>>>>>>>>> So the stop_machine idea is thrown away?  
>>>>>>> IIRC I convinced myself that it's just as bad.
>>>>>>
>>>>>> It may be just as bad, but it is less code (and less pervasive), which
>>>>>> makes it less painful.
>>>>>
>>>>> It saves you the ->ref() and ->unref() calls, which are boilerplate, but
>>>>> not too onerous. All of the device model and subsystem threading work
>>>>> still needs to be done.
>>>>
>>>> I'm not worried about saving the ->ref() and ->unref() calls in the
>>>> devices.  I'm worried about saving it in timers, bottom halves and
>>>> whatnot.  And also I'm not sure whether all callbacks would have
>>>> something to ref/unref as they are implemented now.
>>>
>>> Hard to say without examples.
>>>
>>> Something that bothers be with stop_machine is the reliance on
>>> cancellation.  With timers it's easy, stop_machine, remove the timer,
>>> resume.  But if you have an aio operation in progress that is not
>>> cancellable, you have to wait for that operation to complete.  Refcounts
>>> handle that well, the object stays until completion, then disappears.
>>
>> Yes, that's the point of doing things asynchronously---you do not need
>> to do everything within stop_machine, you can start canceling AIO as
>> soon as the OS sends the hot-unplug request.  Then you only proceed with
>> stop_machine and freeing device memory when the first part.
> 
> You cannot always cancel I/O (for example threaded I/O already in progress).

Yep, but we try to do this anyway today and nothing changes really.  The
difference is between hotplug never completing and blocking
synchronously, vs. hotplug never completing and not invoking the
asynchronous callback.  I.e. really no difference at all.

>> In other words, isolate can complete asynchronously.
> 
> Can it?  I don't think so.
> 
> Here's how I see it:
> 
>  1. non-malicious guest stops driving device
>  2. isolate()
>  3. a malicious guest cannot drive the device at this point
>  4. some kind of barrier to let the device, or drive activity from a
> malicious guest, wind down
>  5. destroy()
> 
> If you need to report the completion of step 2, it cannot be done
> asynchronously.

In hardware everything is asynchronous anyway.  It will *look*
synchronous, because if CPU#0 is stuck in a synchronous isolate(), and
CPU#1 polls for the outcome, CPU#1 will lock on the BQL held by CPU#0.

But our interfaces had better support asynchronicity, and indeed they
do: after you write to the "eject" register, the "up" will show the
device as present until after destroy is done.  This can be changed to
show the device as present only until after step 4 is done.

> We may also want notification after step 4 (or 5); if the device holds
> some host resource someone may want to know that it is ready for reuse.

I think guest notification should be after (4), while management
notification should be after (5).

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-23 15:26                     ` Paolo Bonzini
@ 2012-10-23 16:09                       ` Avi Kivity
  2012-10-24  7:29                         ` Paolo Bonzini
  0 siblings, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2012-10-23 16:09 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka

On 10/23/2012 05:26 PM, Paolo Bonzini wrote:
>>> Yes, that's the point of doing things asynchronously---you do not need
>>> to do everything within stop_machine, you can start canceling AIO as
>>> soon as the OS sends the hot-unplug request.  Then you only proceed with
>>> stop_machine and freeing device memory when the first part.
>> 
>> You cannot always cancel I/O (for example threaded I/O already in progress).
> 
> Yep, but we try to do this anyway today and nothing changes really.  The
> difference is between hotplug never completing and blocking
> synchronously, vs. hotplug never completing and not invoking the
> asynchronous callback.  I.e. really no difference at all.

Not cancelling is not the same as not completing; the request will
complete on its own eventually.  The question is whether the programming
model is synchronous or callback based.

> 
>>> In other words, isolate can complete asynchronously.
>> 
>> Can it?  I don't think so.
>> 
>> Here's how I see it:
>> 
>>  1. non-malicious guest stops driving device
>>  2. isolate()
>>  3. a malicious guest cannot drive the device at this point
>>  4. some kind of barrier to let the device, or drive activity from a
>> malicious guest, wind down
>>  5. destroy()
>> 
>> If you need to report the completion of step 2, it cannot be done
>> asynchronously.
> 
> In hardware everything is asynchronous anyway.  It will *look*
> synchronous, because if CPU#0 is stuck in a synchronous isolate(), and
> CPU#1 polls for the outcome, CPU#1 will lock on the BQL held by CPU#0.

That is fine.  isolate() is expensive but it is cpu bound, it does not
involve any I/O (unlike the barrier afterwards, which has to wait on any
I/O which we were not able to cancel).

> But our interfaces had better support asynchronicity, and indeed they
> do: after you write to the "eject" register, the "up" will show the
> device as present until after destroy is done.  This can be changed to
> show the device as present only until after step 4 is done.

Let's say we want to eject the hotplug hardware itself (just as an
example).  With refcounts, the callback that updates "up" will hold on
to to it via refcounts.  With stop_machine(), you need to cancel that
callback, or wait for it somehow, or it can arrive after the
stop_machine() and bite you.

> 
>> We may also want notification after step 4 (or 5); if the device holds
>> some host resource someone may want to know that it is ready for reuse.
> 
> I think guest notification should be after (4), while management
> notification should be after (5).

Yes. After (2) we can return from the eject mmio.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 09/16] memory: introduce mmio request pending to anti nested DMA
  2012-10-23 12:38   ` Gleb Natapov
@ 2012-10-24  6:31     ` liu ping fan
  0 siblings, 0 replies; 102+ messages in thread
From: liu ping fan @ 2012-10-24  6:31 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Jan Kiszka, Paolo Bonzini

On Tue, Oct 23, 2012 at 8:38 PM, Gleb Natapov <gleb@redhat.com> wrote:
> On Mon, Oct 22, 2012 at 05:23:52PM +0800, Liu Ping Fan wrote:
>> Rejecting the nested mmio request which does not aim at RAM, so we
>> can avoid the potential deadlock caused by the random lock sequence
>> of two device's local lock.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  cpus.c              |   14 ++++++++++++++
>>  exec.c              |   50 ++++++++++++++++++++++++++++++++++++--------------
>>  hw/hw.h             |    1 +
>>  kvm-all.c           |    2 ++
>>  qemu-thread-posix.h |    3 +++
>>  qemu-thread.h       |    2 ++
>>  6 files changed, 58 insertions(+), 14 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index 4cd7f85..365a512 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -729,6 +729,18 @@ static void qemu_kvm_wait_io_event(CPUArchState *env)
>>      qemu_wait_io_event_common(env);
>>  }
>>
>> +int get_context_type(void)
>> +{
>> +    QemuThread *t = pthread_getspecific(qemu_thread_key);
>> +    return t->context_type;
>> +}
>> +
> You defined the function but not use it.
>
> What 0/1 context_type means?
>
Will s/t->context/get_context_type/ . The context_type is just for
device's function to tell whether it is called by local lock or
biglock.  0 is initial value, 1 is under mmio dispatching, 2 is under
io dispatching.  Will use enum to define them.

Regards,
pingfan
>> +void set_context_type(int type)
>> +{
>> +    QemuThread *t = pthread_getspecific(qemu_thread_key);
>> +    t->context_type = type;
>> +}
>> +
>>  static void *qemu_kvm_cpu_thread_fn(void *arg)
>>  {
>>      CPUArchState *env = arg;
>> @@ -736,6 +748,8 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>>      int r;
>>
>>      pthread_setspecific(qemu_thread_key, cpu->thread);
>> +    set_context_type(0);
>> +
>>      qemu_mutex_lock(&qemu_global_mutex);
>>      qemu_thread_get_self(cpu->thread);
>>      env->thread_id = qemu_get_thread_id();
>> diff --git a/exec.c b/exec.c
>> index 91b859b..a0327a1 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -3490,7 +3490,9 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>      uint32_t val;
>>      target_phys_addr_t page;
>>      MemoryRegionSection *section, obj_mrs;
>> -    int safe_ref;
>> +    int safe_ref, nested_dma = 0;
>> +    QemuThread *thread = pthread_getspecific(qemu_thread_key);
>> +    int context = thread->context_type;
>>
>>      while (len > 0) {
>>          page = addr & TARGET_PAGE_MASK;
>> @@ -3500,7 +3502,8 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>          qemu_mutex_lock(&mem_map_lock);
>>          safe_ref = phys_page_lookup(page, &obj_mrs);
>>          qemu_mutex_unlock(&mem_map_lock);
>> -        if (safe_ref == 0) {
>> +
>> +        if (safe_ref == 0 && context == 1) {
>>              qemu_mutex_lock_iothread();
>>              qemu_mutex_lock(&mem_map_lock);
>>              /* At the 2nd try, mem map can change, so need to judge it again */
>> @@ -3511,7 +3514,9 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>              }
>>          }
>>          section = &obj_mrs;
>> -
>> +        if (context == 1) {
>> +            nested_dma = thread->mmio_request_pending++ > 1 ? 1 : 0;
>> +        }
>>          if (is_write) {
>>              if (!memory_region_is_ram(section->mr)) {
>>                  target_phys_addr_t addr1;
>> @@ -3521,17 +3526,23 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>                  if (l >= 4 && ((addr1 & 3) == 0)) {
>>                      /* 32 bit write access */
>>                      val = ldl_p(buf);
>> -                    io_mem_write(section->mr, addr1, val, 4);
>> +                    if (!nested_dma) {
>> +                        io_mem_write(section->mr, addr1, val, 4);
>> +                    }
>>                      l = 4;
>>                  } else if (l >= 2 && ((addr1 & 1) == 0)) {
>>                      /* 16 bit write access */
>>                      val = lduw_p(buf);
>> -                    io_mem_write(section->mr, addr1, val, 2);
>> +                    if (!nested_dma) {
>> +                        io_mem_write(section->mr, addr1, val, 2);
>> +                    }
>>                      l = 2;
>>                  } else {
>>                      /* 8 bit write access */
>>                      val = ldub_p(buf);
>> -                    io_mem_write(section->mr, addr1, val, 1);
>> +                    if (!nested_dma) {
>> +                        io_mem_write(section->mr, addr1, val, 1);
>> +                    }
>>                      l = 1;
>>                  }
>>              } else if (!section->readonly) {
>> @@ -3552,24 +3563,31 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>              }
>>          } else {
>>              if (!(memory_region_is_ram(section->mr) ||
>> -                  memory_region_is_romd(section->mr))) {
>> +                  memory_region_is_romd(section->mr)) &&
>> +                    !nested_dma) {
>>                  target_phys_addr_t addr1;
>>                  /* I/O case */
>>                  addr1 = memory_region_section_addr(section, addr);
>>                  if (l >= 4 && ((addr1 & 3) == 0)) {
>>                      /* 32 bit read access */
>> -                    val = io_mem_read(section->mr, addr1, 4);
>> -                    stl_p(buf, val);
>> +                    if (!nested_dma) {
>> +                        val = io_mem_read(section->mr, addr1, 4);
>> +                        stl_p(buf, val);
>> +                    }
>>                      l = 4;
>>                  } else if (l >= 2 && ((addr1 & 1) == 0)) {
>>                      /* 16 bit read access */
>> -                    val = io_mem_read(section->mr, addr1, 2);
>> -                    stw_p(buf, val);
>> +                    if (!nested_dma) {
>> +                        val = io_mem_read(section->mr, addr1, 2);
>> +                        stw_p(buf, val);
>> +                    }
>>                      l = 2;
>>                  } else {
>>                      /* 8 bit read access */
>> -                    val = io_mem_read(section->mr, addr1, 1);
>> -                    stb_p(buf, val);
>> +                    if (!nested_dma) {
>> +                        val = io_mem_read(section->mr, addr1, 1);
>> +                        stb_p(buf, val);
>> +                    }
>>                      l = 1;
>>                  }
>>              } else {
>> @@ -3586,7 +3604,11 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>          len -= l;
>>          buf += l;
>>          addr += l;
>> -        if (safe_ref == 0) {
>> +
>> +        if (context == 1) {
>> +            thread->mmio_request_pending--;
>> +        }
>> +        if (safe_ref == 0 && context == 1) {
>>              qemu_mutex_unlock_iothread();
>>          }
>>      }
>> diff --git a/hw/hw.h b/hw/hw.h
>> index e5cb9bf..935b045 100644
>> --- a/hw/hw.h
>> +++ b/hw/hw.h
>> @@ -12,6 +12,7 @@
>>  #include "irq.h"
>>  #include "qemu-file.h"
>>  #include "vmstate.h"
>> +#include "qemu-thread.h"
>>
>>  #ifdef NEED_CPU_H
>>  #if TARGET_LONG_BITS == 64
>> diff --git a/kvm-all.c b/kvm-all.c
>> index 34b02c1..b3fa597 100644
>> --- a/kvm-all.c
>> +++ b/kvm-all.c
>> @@ -1562,10 +1562,12 @@ int kvm_cpu_exec(CPUArchState *env)
>>              break;
>>          case KVM_EXIT_MMIO:
>>              DPRINTF("handle_mmio\n");
>> +            set_context_type(1);
>>              cpu_physical_memory_rw(run->mmio.phys_addr,
>>                                     run->mmio.data,
>>                                     run->mmio.len,
>>                                     run->mmio.is_write);
>> +            set_context_type(0);
>>              ret = 0;
>>              break;
>>          case KVM_EXIT_IRQ_WINDOW_OPEN:
>> diff --git a/qemu-thread-posix.h b/qemu-thread-posix.h
>> index 2607b1c..9fcc6f8 100644
>> --- a/qemu-thread-posix.h
>> +++ b/qemu-thread-posix.h
>> @@ -12,6 +12,9 @@ struct QemuCond {
>>
>>  struct QemuThread {
>>      pthread_t thread;
>> +    /* 0 clean; 1 mmio; 2 io */
>> +    int context_type;
>> +    int mmio_request_pending;
>>  };
>>
>>  extern pthread_key_t qemu_thread_key;
>> diff --git a/qemu-thread.h b/qemu-thread.h
>> index 4a6427d..88eaf94 100644
>> --- a/qemu-thread.h
>> +++ b/qemu-thread.h
>> @@ -45,6 +45,8 @@ void *qemu_thread_join(QemuThread *thread);
>>  void qemu_thread_get_self(QemuThread *thread);
>>  bool qemu_thread_is_self(QemuThread *thread);
>>  void qemu_thread_exit(void *retval);
>> +int get_context_type(void);
>> +void set_context_type(int type);
>>
>>  void qemu_thread_key_create(void);
>>  #endif
>> --
>> 1.7.4.4
>>
>
> --
>                         Gleb.
>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 07/16] memory: make mmio dispatch able to be out of biglock
  2012-10-23 12:36     ` Avi Kivity
@ 2012-10-24  6:31       ` liu ping fan
  2012-10-24  6:56         ` liu ping fan
  0 siblings, 1 reply; 102+ messages in thread
From: liu ping fan @ 2012-10-24  6:31 UTC (permalink / raw)
  To: Avi Kivity, Jan Kiszka
  Cc: Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Anthony Liguori,
	Paolo Bonzini

On Tue, Oct 23, 2012 at 8:36 PM, Avi Kivity <avi@redhat.com> wrote:
> On 10/23/2012 02:12 PM, Jan Kiszka wrote:
>> On 2012-10-22 11:23, Liu Ping Fan wrote:
>>> Without biglock, we try to protect the mr by increase refcnt.
>>> If we can inc refcnt, go backward and resort to biglock.
>>>
>>> Another point is memory radix-tree can be flushed by another
>>> thread, so we should get the copy of terminal mr to survive
>>> from such issue.
>>>
>>> +
>>>  void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>>                              int len, int is_write)
>>>  {
>>> @@ -3413,14 +3489,28 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>>      uint8_t *ptr;
>>>      uint32_t val;
>>>      target_phys_addr_t page;
>>> -    MemoryRegionSection *section;
>>> +    MemoryRegionSection *section, obj_mrs;
>>> +    int safe_ref;
>>>
>>>      while (len > 0) {
>>>          page = addr & TARGET_PAGE_MASK;
>>>          l = (page + TARGET_PAGE_SIZE) - addr;
>>>          if (l > len)
>>>              l = len;
>>> -        section = phys_page_find(page >> TARGET_PAGE_BITS);
>>> +        qemu_mutex_lock(&mem_map_lock);
>>> +        safe_ref = phys_page_lookup(page, &obj_mrs);
>>> +        qemu_mutex_unlock(&mem_map_lock);
>>> +        if (safe_ref == 0) {
>>> +            qemu_mutex_lock_iothread();
>>> +            qemu_mutex_lock(&mem_map_lock);
>>> +            /* At the 2nd try, mem map can change, so need to judge it again */
>>> +            safe_ref = phys_page_lookup(page, &obj_mrs);
>>> +            qemu_mutex_unlock(&mem_map_lock);
>>> +            if (safe_ref > 0) {
>>> +                qemu_mutex_unlock_iothread();
>>> +            }
>>> +        }
>>> +        section = &obj_mrs;
>>>
>>>          if (is_write) {
>>>              if (!memory_region_is_ram(section->mr)) {
>>> @@ -3491,10 +3581,16 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>>                  qemu_put_ram_ptr(ptr);
>>>              }
>>>          }
>>> +
>>> +        memory_region_section_unref(&obj_mrs);
>>
>> The mapping cannot change from not-referenced to reference-counted while
>> we were dispatching? I mean the case where we found not ref callback on
>> entry and took the big lock, but now there is an unref callback.
>
> We drop the big lock in that case, so we end up in the same situation.
>
>>
>>>          len -= l;
>>>          buf += l;
>>>          addr += l;
>>> +        if (safe_ref == 0) {
>>> +            qemu_mutex_unlock_iothread();
>>> +        }
>>>      }
>>> +
>>>  }
>>>
>>>  /* used for ROM loading : can write in RAM and ROM */
>>> @@ -3504,14 +3600,18 @@ void cpu_physical_memory_write_rom(target_phys_addr_t addr,
>>>      int l;
>>>      uint8_t *ptr;
>>>      target_phys_addr_t page;
>>> -    MemoryRegionSection *section;
>>> +    MemoryRegionSection *section, mr_obj;
>>>
>>>      while (len > 0) {
>>>          page = addr & TARGET_PAGE_MASK;
>>>          l = (page + TARGET_PAGE_SIZE) - addr;
>>>          if (l > len)
>>>              l = len;
>>> -        section = phys_page_find(page >> TARGET_PAGE_BITS);
>>> +
>>> +        qemu_mutex_lock(&mem_map_lock);
>>> +        phys_page_lookup(page, &mr_obj);
>>> +        qemu_mutex_unlock(&mem_map_lock);
>>> +        section = &mr_obj;
>>
>> But here we don't care about the return code of phys_page_lookup and all
>> related topics? Because we assume the BQL is held? Reminds me that we
>> will need some support for assert(qemu_mutex_is_locked(&lock)).
>
> I guess it's better to drop that assumption than to have asymmetric APIs.
>
Yes, now the updater of physmap based on mem_map_lock, and the same it
will be for readers.
>>>
>>> @@ -4239,9 +4345,12 @@ bool virtio_is_big_endian(void)
>>>  #ifndef CONFIG_USER_ONLY
>>>  bool cpu_physical_memory_is_io(target_phys_addr_t phys_addr)
>>>  {
>>> -    MemoryRegionSection *section;
>>> +    MemoryRegionSection *section, mr_obj;
>>>
>>> -    section = phys_page_find(phys_addr >> TARGET_PAGE_BITS);
>>> +    qemu_mutex_lock(&mem_map_lock);
>>> +    phys_page_lookup(phys_addr, &mr_obj);
>>> +    qemu_mutex_unlock(&mem_map_lock);
>>> +    section = &mr_obj;
>>
>> Err, no unref needed here?
>
> Need _ref in the name to remind reviewers that it leaves the refcount
> unbalanced.
>
Oh, here is a bug, need unref.  As to unbalanced refcount, it will be
adopted for virtio-blk listener (not implement in this patchset)

Regards,
pingfan
> --
> error compiling committee.c: too many arguments to function
>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
  2012-10-23  9:04   ` Jan Kiszka
@ 2012-10-24  6:31     ` liu ping fan
  2012-10-24  7:17       ` Jan Kiszka
  2012-10-24  7:29     ` liu ping fan
  1 sibling, 1 reply; 102+ messages in thread
From: liu ping fan @ 2012-10-24  6:31 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

On Tue, Oct 23, 2012 at 5:04 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2012-10-22 11:23, Liu Ping Fan wrote:
>> Use local lock to protect e1000. When calling the system function,
>> dropping the fine lock before acquiring the big lock. This will
>> introduce broken device state, which need extra effort to fix.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  hw/e1000.c |   24 +++++++++++++++++++++++-
>>  1 files changed, 23 insertions(+), 1 deletions(-)
>>
>> diff --git a/hw/e1000.c b/hw/e1000.c
>> index ae8a6c5..5eddab5 100644
>> --- a/hw/e1000.c
>> +++ b/hw/e1000.c
>> @@ -85,6 +85,7 @@ typedef struct E1000State_st {
>>      NICConf conf;
>>      MemoryRegion mmio;
>>      MemoryRegion io;
>> +    QemuMutex e1000_lock;
>>
>>      uint32_t mac_reg[0x8000];
>>      uint16_t phy_reg[0x20];
>> @@ -223,13 +224,27 @@ static const uint32_t mac_reg_init[] = {
>>  static void
>>  set_interrupt_cause(E1000State *s, int index, uint32_t val)
>>  {
>> +    QemuThread *t;
>> +
>>      if (val && (E1000_DEVID >= E1000_DEV_ID_82547EI_MOBILE)) {
>>          /* Only for 8257x */
>>          val |= E1000_ICR_INT_ASSERTED;
>>      }
>>      s->mac_reg[ICR] = val;
>>      s->mac_reg[ICS] = val;
>> -    qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
>> +
>> +    t = pthread_getspecific(qemu_thread_key);
>> +    if (t->context_type == 1) {
>> +        qemu_mutex_unlock(&s->e1000_lock);
>> +        qemu_mutex_lock_iothread();
>> +    }
>> +    if (DEVICE(s)->state < DEV_STATE_STOPPING) {
>> +        qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
>> +    }
>> +    if (t->context_type == 1) {
>> +        qemu_mutex_unlock_iothread();
>> +        qemu_mutex_lock(&s->e1000_lock);
>> +    }
>
> This is ugly for many reasons. First of all, it is racy as the register
> content may change while dropping the device lock, no? Then you would
> raise or clear an IRQ spuriously.
>
Device state's intact is protected by busy flag, and will not broken

> Second, it clearly shows that we need to address lock-less IRQ delivery.
> Almost nothing is won if we have to take the global lock again to push
> an IRQ event to the guest. I'm repeating myself, but the problem to be
> solved here is almost identical to fast IRQ delivery for assigned
> devices (which we only address pretty ad-hoc for PCI so far).
>
Yes, agreed. But here is the 1st step to show how play device out of
big lock protection.  Also help us set up target for each subsystem. I
think at the next step, we will consider each subsystem.

> And third: too much boilerplate code... :-/
>
Yeah, without the recursive big lock, we need to tell the code is with
or w/o big lock.  I like to  make big lock recursive, but maybe it has
more drawbacks.

Thanks and regards,
pingfan
>>  }
>>
>>  static void
>> @@ -268,6 +283,7 @@ static void e1000_reset(void *opaque)
>>      E1000State *d = opaque;
>>
>>      qemu_del_timer(d->autoneg_timer);
>> +
>>      memset(d->phy_reg, 0, sizeof d->phy_reg);
>>      memmove(d->phy_reg, phy_reg_init, sizeof phy_reg_init);
>>      memset(d->mac_reg, 0, sizeof d->mac_reg);
>> @@ -448,7 +464,11 @@ e1000_send_packet(E1000State *s, const uint8_t *buf, int size)
>>      if (s->phy_reg[PHY_CTRL] & MII_CR_LOOPBACK) {
>>          s->nic->nc.info->receive(&s->nic->nc, buf, size);
>>      } else {
>> +        qemu_mutex_unlock(&s->e1000_lock);
>> +        qemu_mutex_lock_iothread();
>>          qemu_send_packet(&s->nic->nc, buf, size);
>> +        qemu_mutex_unlock_iothread();
>> +        qemu_mutex_lock(&s->e1000_lock);
>
> And that is the also a problem to be discussed next: How to handle
> locking of backends? Do we want separate locks for backend and frontend?
> Although they are typically in a 1:1 relationship? Oh, I'm revealing the
> content of my talk... ;)
>
>>      }
>>  }
>>
>> @@ -1221,6 +1241,8 @@ static int pci_e1000_init(PCIDevice *pci_dev)
>>      int i;
>>      uint8_t *macaddr;
>>
>> +    qemu_mutex_init(&d->e1000_lock);
>> +
>>      pci_conf = d->dev.config;
>>
>>      /* TODO: RST# value should be 0, PCI spec 6.2.4 */
>>
>
> Jan
>
> --
> Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
> Corporate Competence Center Embedded Linux
>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state
  2012-10-23  9:37           ` Avi Kivity
@ 2012-10-24  6:36             ` liu ping fan
  2012-10-25  8:55               ` Avi Kivity
  2012-10-25  9:00             ` Peter Maydell
  1 sibling, 1 reply; 102+ messages in thread
From: liu ping fan @ 2012-10-24  6:36 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Liu Ping Fan, Jan Kiszka, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini

On Tue, Oct 23, 2012 at 5:37 PM, Avi Kivity <avi@redhat.com> wrote:
> On 10/23/2012 11:32 AM, liu ping fan wrote:
>> On Tue, Oct 23, 2012 at 5:07 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> On 2012-10-23 07:52, liu ping fan wrote:
>>>> On Mon, Oct 22, 2012 at 6:40 PM, Avi Kivity <avi@redhat.com> wrote:
>>>>> On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
>>>>>> The broken device state is caused by releasing local lock before acquiring
>>>>>> big lock. To fix this issue, we have two choice:
>>>>>>   1.use busy flag to protect the state
>>>>>>     The drawback is that we will introduce independent busy flag for each
>>>>>>     independent device's logic unit.
>>>>>>   2.reload the device's state
>>>>>>     The drawback is if the call chain is too deep, the action to reload will
>>>>>>     touch each layer. Also the reloading means to recaculate the intermediate
>>>>>>     result based on device's regs.
>>>>>>
>>>>>> This patch adopt the solution 1 to fix the issue.
>>>>>
>>>>> Doesn't the nested mmio patch detect this?
>>>>>
>>>> It will only record and fix the issue on one thread. But guest can
>>>> touch the emulated device on muti-threads.
>>>
>>> Sorry, what does that mean? A second VCPU accessing the device will
>>> simply be ignored when it races with another VCPU? Specifically
>>>
>> Yes, just ignored.  For device which support many logic in parallel,
>> it should use independent busy flag for each logic
>
> We don't actually know that e1000 doesn't.  Why won't writing into
> different registers in parallel work?
>
I think e1000 has only one transfer logic, so one busy flag is enough.
And the normal guest's driver will access the registers one by one.
But anyway, it may have parallel modules.  So what about model it like
this
if busy:
  wait

clear busy:
   wakeup

Regards,
pingfan
>
> --
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 07/16] memory: make mmio dispatch able to be out of biglock
  2012-10-24  6:31       ` liu ping fan
@ 2012-10-24  6:56         ` liu ping fan
  2012-10-25  8:57           ` Avi Kivity
  0 siblings, 1 reply; 102+ messages in thread
From: liu ping fan @ 2012-10-24  6:56 UTC (permalink / raw)
  To: Avi Kivity, Jan Kiszka
  Cc: Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Anthony Liguori,
	Paolo Bonzini

On Wed, Oct 24, 2012 at 2:31 PM, liu ping fan <qemulist@gmail.com> wrote:
> On Tue, Oct 23, 2012 at 8:36 PM, Avi Kivity <avi@redhat.com> wrote:
>> On 10/23/2012 02:12 PM, Jan Kiszka wrote:
>>> On 2012-10-22 11:23, Liu Ping Fan wrote:
>>>> Without biglock, we try to protect the mr by increase refcnt.
>>>> If we can inc refcnt, go backward and resort to biglock.
>>>>
>>>> Another point is memory radix-tree can be flushed by another
>>>> thread, so we should get the copy of terminal mr to survive
>>>> from such issue.
>>>>
>>>> +
>>>>  void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>>>                              int len, int is_write)
>>>>  {
>>>> @@ -3413,14 +3489,28 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>>>      uint8_t *ptr;
>>>>      uint32_t val;
>>>>      target_phys_addr_t page;
>>>> -    MemoryRegionSection *section;
>>>> +    MemoryRegionSection *section, obj_mrs;
>>>> +    int safe_ref;
>>>>
>>>>      while (len > 0) {
>>>>          page = addr & TARGET_PAGE_MASK;
>>>>          l = (page + TARGET_PAGE_SIZE) - addr;
>>>>          if (l > len)
>>>>              l = len;
>>>> -        section = phys_page_find(page >> TARGET_PAGE_BITS);
>>>> +        qemu_mutex_lock(&mem_map_lock);
>>>> +        safe_ref = phys_page_lookup(page, &obj_mrs);
>>>> +        qemu_mutex_unlock(&mem_map_lock);
>>>> +        if (safe_ref == 0) {
>>>> +            qemu_mutex_lock_iothread();
>>>> +            qemu_mutex_lock(&mem_map_lock);
>>>> +            /* At the 2nd try, mem map can change, so need to judge it again */
>>>> +            safe_ref = phys_page_lookup(page, &obj_mrs);
>>>> +            qemu_mutex_unlock(&mem_map_lock);
>>>> +            if (safe_ref > 0) {
>>>> +                qemu_mutex_unlock_iothread();
>>>> +            }
>>>> +        }
>>>> +        section = &obj_mrs;
>>>>
>>>>          if (is_write) {
>>>>              if (!memory_region_is_ram(section->mr)) {
>>>> @@ -3491,10 +3581,16 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>>>>                  qemu_put_ram_ptr(ptr);
>>>>              }
>>>>          }
>>>> +
>>>> +        memory_region_section_unref(&obj_mrs);
>>>
>>> The mapping cannot change from not-referenced to reference-counted while
>>> we were dispatching? I mean the case where we found not ref callback on
>>> entry and took the big lock, but now there is an unref callback.
>>
>> We drop the big lock in that case, so we end up in the same situation.
>>
>>>
>>>>          len -= l;
>>>>          buf += l;
>>>>          addr += l;
>>>> +        if (safe_ref == 0) {
>>>> +            qemu_mutex_unlock_iothread();
>>>> +        }
>>>>      }
>>>> +
>>>>  }
>>>>
>>>>  /* used for ROM loading : can write in RAM and ROM */
>>>> @@ -3504,14 +3600,18 @@ void cpu_physical_memory_write_rom(target_phys_addr_t addr,
>>>>      int l;
>>>>      uint8_t *ptr;
>>>>      target_phys_addr_t page;
>>>> -    MemoryRegionSection *section;
>>>> +    MemoryRegionSection *section, mr_obj;
>>>>
>>>>      while (len > 0) {
>>>>          page = addr & TARGET_PAGE_MASK;
>>>>          l = (page + TARGET_PAGE_SIZE) - addr;
>>>>          if (l > len)
>>>>              l = len;
>>>> -        section = phys_page_find(page >> TARGET_PAGE_BITS);
>>>> +
>>>> +        qemu_mutex_lock(&mem_map_lock);
>>>> +        phys_page_lookup(page, &mr_obj);
>>>> +        qemu_mutex_unlock(&mem_map_lock);
>>>> +        section = &mr_obj;
>>>
>>> But here we don't care about the return code of phys_page_lookup and all
>>> related topics? Because we assume the BQL is held? Reminds me that we
>>> will need some support for assert(qemu_mutex_is_locked(&lock)).
>>
>> I guess it's better to drop that assumption than to have asymmetric APIs.
>>
> Yes, now the updater of physmap based on mem_map_lock, and the same it
> will be for readers.
>>>>
>>>> @@ -4239,9 +4345,12 @@ bool virtio_is_big_endian(void)
>>>>  #ifndef CONFIG_USER_ONLY
>>>>  bool cpu_physical_memory_is_io(target_phys_addr_t phys_addr)
>>>>  {
>>>> -    MemoryRegionSection *section;
>>>> +    MemoryRegionSection *section, mr_obj;
>>>>
>>>> -    section = phys_page_find(phys_addr >> TARGET_PAGE_BITS);
>>>> +    qemu_mutex_lock(&mem_map_lock);
>>>> +    phys_page_lookup(phys_addr, &mr_obj);
>>>> +    qemu_mutex_unlock(&mem_map_lock);
>>>> +    section = &mr_obj;
>>>
>>> Err, no unref needed here?
>>
>> Need _ref in the name to remind reviewers that it leaves the refcount
>> unbalanced.
>>
> Oh, here is a bug, need unref.  As to unbalanced refcount, it will be
> adopted for virtio-blk listener (not implement in this patchset)
>
It is like cpu_physical_memory_map/unmap, the map will hold the
unbalanced ref, and unmap release it.

> Regards,
> pingfan
>> --
>> error compiling committee.c: too many arguments to function
>>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
  2012-10-24  6:31     ` liu ping fan
@ 2012-10-24  7:17       ` Jan Kiszka
  2012-10-25  9:01         ` Avi Kivity
  0 siblings, 1 reply; 102+ messages in thread
From: Jan Kiszka @ 2012-10-24  7:17 UTC (permalink / raw)
  To: liu ping fan
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2442 bytes --]

On 2012-10-24 08:31, liu ping fan wrote:
> On Tue, Oct 23, 2012 at 5:04 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2012-10-22 11:23, Liu Ping Fan wrote:
>>> Use local lock to protect e1000. When calling the system function,
>>> dropping the fine lock before acquiring the big lock. This will
>>> introduce broken device state, which need extra effort to fix.
>>>
>>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>> ---
>>>  hw/e1000.c |   24 +++++++++++++++++++++++-
>>>  1 files changed, 23 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/hw/e1000.c b/hw/e1000.c
>>> index ae8a6c5..5eddab5 100644
>>> --- a/hw/e1000.c
>>> +++ b/hw/e1000.c
>>> @@ -85,6 +85,7 @@ typedef struct E1000State_st {
>>>      NICConf conf;
>>>      MemoryRegion mmio;
>>>      MemoryRegion io;
>>> +    QemuMutex e1000_lock;
>>>
>>>      uint32_t mac_reg[0x8000];
>>>      uint16_t phy_reg[0x20];
>>> @@ -223,13 +224,27 @@ static const uint32_t mac_reg_init[] = {
>>>  static void
>>>  set_interrupt_cause(E1000State *s, int index, uint32_t val)
>>>  {
>>> +    QemuThread *t;
>>> +
>>>      if (val && (E1000_DEVID >= E1000_DEV_ID_82547EI_MOBILE)) {
>>>          /* Only for 8257x */
>>>          val |= E1000_ICR_INT_ASSERTED;
>>>      }
>>>      s->mac_reg[ICR] = val;
>>>      s->mac_reg[ICS] = val;
>>> -    qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
>>> +
>>> +    t = pthread_getspecific(qemu_thread_key);
>>> +    if (t->context_type == 1) {
>>> +        qemu_mutex_unlock(&s->e1000_lock);
>>> +        qemu_mutex_lock_iothread();
>>> +    }
>>> +    if (DEVICE(s)->state < DEV_STATE_STOPPING) {
>>> +        qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
>>> +    }
>>> +    if (t->context_type == 1) {
>>> +        qemu_mutex_unlock_iothread();
>>> +        qemu_mutex_lock(&s->e1000_lock);
>>> +    }
>>
>> This is ugly for many reasons. First of all, it is racy as the register
>> content may change while dropping the device lock, no? Then you would
>> raise or clear an IRQ spuriously.
>>
> Device state's intact is protected by busy flag, and will not broken

Except that the busy flag concept is broken in itself.

I see that we have a all-or-nothing problem here: to address this
properly, we need to convert the IRQ path to lock-less (or at least
compatible with holding per-device locks) as well.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-23 16:09                       ` Avi Kivity
@ 2012-10-24  7:29                         ` Paolo Bonzini
  2012-10-25 16:28                           ` Avi Kivity
  0 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-24  7:29 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka

Il 23/10/2012 18:09, Avi Kivity ha scritto:
>> But our interfaces had better support asynchronicity, and indeed they
>> do: after you write to the "eject" register, the "up" will show the
>> device as present until after destroy is done.  This can be changed to
>> show the device as present only until after step 4 is done.
> 
> Let's say we want to eject the hotplug hardware itself (just as an
> example).  With refcounts, the callback that updates "up" will hold on
> to to it via refcounts.  With stop_machine(), you need to cancel that
> callback, or wait for it somehow, or it can arrive after the
> stop_machine() and bite you.

The callback that updates "up" is for the parent of the hotplug
hardware.  There is nothing that has to be updated in the hotplug
hardware itself.

Updating the "up" register is the final part of isolate(), and runs
before the stop_machine().  The steps above can be further refined like
this:

4a. close all backends (also cancel or complete all pending I/O)
4b. notify parent that we're done
    4ba. parent removes device from its bus
    4bb. parent notifies guest
    4bc. parent schedules stop_machine(qdev_free(child))
5. a bottom half calls stop_machine(qdev_free(child))

If unplugging a whole sub-tree, the parent can notify its own parent at
the end of 4b.  Because the only purpose of stop_machine is to quiesce
subsystems not affected by step 4 (timer+memory, typically),
destructions can be done in any order and even intermixed with
executions of 4b for the parent.

In the beginning the only asynchronous step would be 5.  If the need
arises we can use continuation-passing to make all the preceding steps
asynchronous too.

>>> We may also want notification after step 4 (or 5); if the device holds
>>> some host resource someone may want to know that it is ready for reuse.
>>
>> I think guest notification should be after (4), while management
>> notification should be after (5).
> Yes. After (2) we can return from the eject mmio.

Agreed.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
  2012-10-23  9:04   ` Jan Kiszka
  2012-10-24  6:31     ` liu ping fan
@ 2012-10-24  7:29     ` liu ping fan
  2012-10-25 13:34       ` Jan Kiszka
  1 sibling, 1 reply; 102+ messages in thread
From: liu ping fan @ 2012-10-24  7:29 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

On Tue, Oct 23, 2012 at 5:04 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2012-10-22 11:23, Liu Ping Fan wrote:
>> Use local lock to protect e1000. When calling the system function,
>> dropping the fine lock before acquiring the big lock. This will
>> introduce broken device state, which need extra effort to fix.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  hw/e1000.c |   24 +++++++++++++++++++++++-
>>  1 files changed, 23 insertions(+), 1 deletions(-)
>>
>> diff --git a/hw/e1000.c b/hw/e1000.c
>> index ae8a6c5..5eddab5 100644
>> --- a/hw/e1000.c
>> +++ b/hw/e1000.c
>> @@ -85,6 +85,7 @@ typedef struct E1000State_st {
>>      NICConf conf;
>>      MemoryRegion mmio;
>>      MemoryRegion io;
>> +    QemuMutex e1000_lock;
>>
>>      uint32_t mac_reg[0x8000];
>>      uint16_t phy_reg[0x20];
>> @@ -223,13 +224,27 @@ static const uint32_t mac_reg_init[] = {
>>  static void
>>  set_interrupt_cause(E1000State *s, int index, uint32_t val)
>>  {
>> +    QemuThread *t;
>> +
>>      if (val && (E1000_DEVID >= E1000_DEV_ID_82547EI_MOBILE)) {
>>          /* Only for 8257x */
>>          val |= E1000_ICR_INT_ASSERTED;
>>      }
>>      s->mac_reg[ICR] = val;
>>      s->mac_reg[ICS] = val;
>> -    qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
>> +
>> +    t = pthread_getspecific(qemu_thread_key);
>> +    if (t->context_type == 1) {
>> +        qemu_mutex_unlock(&s->e1000_lock);
>> +        qemu_mutex_lock_iothread();
>> +    }
>> +    if (DEVICE(s)->state < DEV_STATE_STOPPING) {
>> +        qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
>> +    }
>> +    if (t->context_type == 1) {
>> +        qemu_mutex_unlock_iothread();
>> +        qemu_mutex_lock(&s->e1000_lock);
>> +    }
>
> This is ugly for many reasons. First of all, it is racy as the register
> content may change while dropping the device lock, no? Then you would
> raise or clear an IRQ spuriously.
>
> Second, it clearly shows that we need to address lock-less IRQ delivery.
> Almost nothing is won if we have to take the global lock again to push
> an IRQ event to the guest. I'm repeating myself, but the problem to be
> solved here is almost identical to fast IRQ delivery for assigned
> devices (which we only address pretty ad-hoc for PCI so far).
>
Interesting, could you show me more detail about it, so I can google...

Thanks,
pingfan
> And third: too much boilerplate code... :-/
>
>>  }
>>
>>  static void
>> @@ -268,6 +283,7 @@ static void e1000_reset(void *opaque)
>>      E1000State *d = opaque;
>>
>>      qemu_del_timer(d->autoneg_timer);
>> +
>>      memset(d->phy_reg, 0, sizeof d->phy_reg);
>>      memmove(d->phy_reg, phy_reg_init, sizeof phy_reg_init);
>>      memset(d->mac_reg, 0, sizeof d->mac_reg);
>> @@ -448,7 +464,11 @@ e1000_send_packet(E1000State *s, const uint8_t *buf, int size)
>>      if (s->phy_reg[PHY_CTRL] & MII_CR_LOOPBACK) {
>>          s->nic->nc.info->receive(&s->nic->nc, buf, size);
>>      } else {
>> +        qemu_mutex_unlock(&s->e1000_lock);
>> +        qemu_mutex_lock_iothread();
>>          qemu_send_packet(&s->nic->nc, buf, size);
>> +        qemu_mutex_unlock_iothread();
>> +        qemu_mutex_lock(&s->e1000_lock);
>
> And that is the also a problem to be discussed next: How to handle
> locking of backends? Do we want separate locks for backend and frontend?
> Although they are typically in a 1:1 relationship? Oh, I'm revealing the
> content of my talk... ;)
>
>>      }
>>  }
>>
>> @@ -1221,6 +1241,8 @@ static int pci_e1000_init(PCIDevice *pci_dev)
>>      int i;
>>      uint8_t *macaddr;
>>
>> +    qemu_mutex_init(&d->e1000_lock);
>> +
>>      pci_conf = d->dev.config;
>>
>>      /* TODO: RST# value should be 0, PCI spec 6.2.4 */
>>
>
> Jan
>
> --
> Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
> Corporate Competence Center Embedded Linux
>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state
  2012-10-24  6:36             ` liu ping fan
@ 2012-10-25  8:55               ` Avi Kivity
  0 siblings, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2012-10-25  8:55 UTC (permalink / raw)
  To: liu ping fan
  Cc: Liu Ping Fan, Jan Kiszka, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini

On 10/24/2012 08:36 AM, liu ping fan wrote:
> On Tue, Oct 23, 2012 at 5:37 PM, Avi Kivity <avi@redhat.com> wrote:
>> On 10/23/2012 11:32 AM, liu ping fan wrote:
>>> On Tue, Oct 23, 2012 at 5:07 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>> On 2012-10-23 07:52, liu ping fan wrote:
>>>>> On Mon, Oct 22, 2012 at 6:40 PM, Avi Kivity <avi@redhat.com> wrote:
>>>>>> On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
>>>>>>> The broken device state is caused by releasing local lock before acquiring
>>>>>>> big lock. To fix this issue, we have two choice:
>>>>>>>   1.use busy flag to protect the state
>>>>>>>     The drawback is that we will introduce independent busy flag for each
>>>>>>>     independent device's logic unit.
>>>>>>>   2.reload the device's state
>>>>>>>     The drawback is if the call chain is too deep, the action to reload will
>>>>>>>     touch each layer. Also the reloading means to recaculate the intermediate
>>>>>>>     result based on device's regs.
>>>>>>>
>>>>>>> This patch adopt the solution 1 to fix the issue.
>>>>>>
>>>>>> Doesn't the nested mmio patch detect this?
>>>>>>
>>>>> It will only record and fix the issue on one thread. But guest can
>>>>> touch the emulated device on muti-threads.
>>>>
>>>> Sorry, what does that mean? A second VCPU accessing the device will
>>>> simply be ignored when it races with another VCPU? Specifically
>>>>
>>> Yes, just ignored.  For device which support many logic in parallel,
>>> it should use independent busy flag for each logic
>>
>> We don't actually know that e1000 doesn't.  Why won't writing into
>> different registers in parallel work?
>>
> I think e1000 has only one transfer logic, so one busy flag is enough.
> And the normal guest's driver will access the registers one by one.
> But anyway, it may have parallel modules.  So what about model it like
> this
> if busy:
>   wait
> 
> clear busy:
>    wakeup
> 

You mean lock()/unlock()?

Again I suggest to ignore this issue for now.  We need to make progress
and we can't get everything perfect (or even agree on everything).  When
we have converted a few devices, we will have more information and can
think of a good solution.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 07/16] memory: make mmio dispatch able to be out of biglock
  2012-10-24  6:56         ` liu ping fan
@ 2012-10-25  8:57           ` Avi Kivity
  0 siblings, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2012-10-25  8:57 UTC (permalink / raw)
  To: liu ping fan
  Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, Anthony Liguori,
	Stefan Hajnoczi, Paolo Bonzini

On 10/24/2012 08:56 AM, liu ping fan wrote:
>>>
>> Oh, here is a bug, need unref.  As to unbalanced refcount, it will be
>> adopted for virtio-blk listener (not implement in this patchset)
>>
> It is like cpu_physical_memory_map/unmap, the map will hold the
> unbalanced ref, and unmap release it.

Those APIs have symmetric names.  map/unmap, ref/unref, lock/unlock.
But here we have lookup and no unlookup.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state
  2012-10-23  9:37           ` Avi Kivity
  2012-10-24  6:36             ` liu ping fan
@ 2012-10-25  9:00             ` Peter Maydell
  2012-10-25  9:04               ` Avi Kivity
  1 sibling, 1 reply; 102+ messages in thread
From: Peter Maydell @ 2012-10-25  9:00 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Liu Ping Fan, Jan Kiszka, Marcelo Tosatti, qemu-devel,
	liu ping fan, Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini

On 23 October 2012 10:37, Avi Kivity <avi@redhat.com> wrote:
> On 10/23/2012 11:32 AM, liu ping fan wrote:
>> On Tue, Oct 23, 2012 at 5:07 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> On 2012-10-23 07:52, liu ping fan wrote:
>>>> On Mon, Oct 22, 2012 at 6:40 PM, Avi Kivity <avi@redhat.com> wrote:
>>>>> On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
>>>> It will only record and fix the issue on one thread. But guest can
>>>> touch the emulated device on muti-threads.
>>>
>>> Sorry, what does that mean? A second VCPU accessing the device will
>>> simply be ignored when it races with another VCPU? Specifically
>>>
>> Yes, just ignored.  For device which support many logic in parallel,
>> it should use independent busy flag for each logic
>
> We don't actually know that e1000 doesn't.  Why won't writing into
> different registers in parallel work?

Unless the device we're emulating supports multiple in
parallel accesses (and I bet 99.9% of the devices we model
don't) then the memory framework needs to serialise the
loads/stores. Otherwise it's just going to be excessively
hard to write a reliable device model.

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
  2012-10-24  7:17       ` Jan Kiszka
@ 2012-10-25  9:01         ` Avi Kivity
  2012-10-25  9:31           ` Jan Kiszka
  0 siblings, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2012-10-25  9:01 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	liu ping fan, Anthony Liguori, Paolo Bonzini

On 10/24/2012 09:17 AM, Jan Kiszka wrote:
>>>
>>> This is ugly for many reasons. First of all, it is racy as the register
>>> content may change while dropping the device lock, no? Then you would
>>> raise or clear an IRQ spuriously.
>>>
>> Device state's intact is protected by busy flag, and will not broken
> 
> Except that the busy flag concept is broken in itself.

How do we fix an mmio that ends up mmio'ing back to itself, perhaps
indirectly?  Note this is broken in mainline too, but in a different way.

Do we introduce clever locks that can detect deadlocks?

> I see that we have a all-or-nothing problem here: to address this
> properly, we need to convert the IRQ path to lock-less (or at least
> compatible with holding per-device locks) as well.

There is a transitional path where writing to a register that can cause
IRQ changes takes both the big lock and the local lock.

Eventually, though, of course all inner subsystems must be threaded for
this work to have value.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state
  2012-10-25  9:00             ` Peter Maydell
@ 2012-10-25  9:04               ` Avi Kivity
  2012-10-26  3:05                 ` liu ping fan
  0 siblings, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2012-10-25  9:04 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Liu Ping Fan, Jan Kiszka, Marcelo Tosatti, qemu-devel,
	liu ping fan, Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini

On 10/25/2012 11:00 AM, Peter Maydell wrote:
> On 23 October 2012 10:37, Avi Kivity <avi@redhat.com> wrote:
>> On 10/23/2012 11:32 AM, liu ping fan wrote:
>>> On Tue, Oct 23, 2012 at 5:07 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>> On 2012-10-23 07:52, liu ping fan wrote:
>>>>> On Mon, Oct 22, 2012 at 6:40 PM, Avi Kivity <avi@redhat.com> wrote:
>>>>>> On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
>>>>> It will only record and fix the issue on one thread. But guest can
>>>>> touch the emulated device on muti-threads.
>>>>
>>>> Sorry, what does that mean? A second VCPU accessing the device will
>>>> simply be ignored when it races with another VCPU? Specifically
>>>>
>>> Yes, just ignored.  For device which support many logic in parallel,
>>> it should use independent busy flag for each logic
>>
>> We don't actually know that e1000 doesn't.  Why won't writing into
>> different registers in parallel work?
> 
> Unless the device we're emulating supports multiple in
> parallel accesses (and I bet 99.9% of the devices we model
> don't) then the memory framework needs to serialise the
> loads/stores. Otherwise it's just going to be excessively
> hard to write a reliable device model.

That's why we have a per-device lock.  The busy flag breaks that model
by discarding accesses that occur in parallel.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
  2012-10-25  9:01         ` Avi Kivity
@ 2012-10-25  9:31           ` Jan Kiszka
  2012-10-25 16:21             ` Avi Kivity
  0 siblings, 1 reply; 102+ messages in thread
From: Jan Kiszka @ 2012-10-25  9:31 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	liu ping fan, Anthony Liguori, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1492 bytes --]

On 2012-10-25 11:01, Avi Kivity wrote:
> On 10/24/2012 09:17 AM, Jan Kiszka wrote:
>>>>
>>>> This is ugly for many reasons. First of all, it is racy as the register
>>>> content may change while dropping the device lock, no? Then you would
>>>> raise or clear an IRQ spuriously.
>>>>
>>> Device state's intact is protected by busy flag, and will not broken
>>
>> Except that the busy flag concept is broken in itself.
> 
> How do we fix an mmio that ends up mmio'ing back to itself, perhaps
> indirectly?  Note this is broken in mainline too, but in a different way.
> 
> Do we introduce clever locks that can detect deadlocks?

That problem is already addressed (to my understanding) by blocking
nested MMIO in general. The brokenness of the busy flag is that it
prevents concurrent MMIO by dropping requests.

> 
>> I see that we have a all-or-nothing problem here: to address this
>> properly, we need to convert the IRQ path to lock-less (or at least
>> compatible with holding per-device locks) as well.
> 
> There is a transitional path where writing to a register that can cause
> IRQ changes takes both the big lock and the local lock.
> 
> Eventually, though, of course all inner subsystems must be threaded for
> this work to have value.
> 

But that transitional path must not introduce regressions. Opening a
race window between IRQ cause update and event injection is such a
thing, just like dropping concurrent requests on the floor.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
  2012-10-24  7:29     ` liu ping fan
@ 2012-10-25 13:34       ` Jan Kiszka
  2012-10-25 16:23         ` Avi Kivity
  2012-10-29  5:24         ` liu ping fan
  0 siblings, 2 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-25 13:34 UTC (permalink / raw)
  To: liu ping fan
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

On 2012-10-24 09:29, liu ping fan wrote:
> On Tue, Oct 23, 2012 at 5:04 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2012-10-22 11:23, Liu Ping Fan wrote:
>>> Use local lock to protect e1000. When calling the system function,
>>> dropping the fine lock before acquiring the big lock. This will
>>> introduce broken device state, which need extra effort to fix.
>>>
>>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>> ---
>>>  hw/e1000.c |   24 +++++++++++++++++++++++-
>>>  1 files changed, 23 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/hw/e1000.c b/hw/e1000.c
>>> index ae8a6c5..5eddab5 100644
>>> --- a/hw/e1000.c
>>> +++ b/hw/e1000.c
>>> @@ -85,6 +85,7 @@ typedef struct E1000State_st {
>>>      NICConf conf;
>>>      MemoryRegion mmio;
>>>      MemoryRegion io;
>>> +    QemuMutex e1000_lock;
>>>
>>>      uint32_t mac_reg[0x8000];
>>>      uint16_t phy_reg[0x20];
>>> @@ -223,13 +224,27 @@ static const uint32_t mac_reg_init[] = {
>>>  static void
>>>  set_interrupt_cause(E1000State *s, int index, uint32_t val)
>>>  {
>>> +    QemuThread *t;
>>> +
>>>      if (val && (E1000_DEVID >= E1000_DEV_ID_82547EI_MOBILE)) {
>>>          /* Only for 8257x */
>>>          val |= E1000_ICR_INT_ASSERTED;
>>>      }
>>>      s->mac_reg[ICR] = val;
>>>      s->mac_reg[ICS] = val;
>>> -    qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
>>> +
>>> +    t = pthread_getspecific(qemu_thread_key);
>>> +    if (t->context_type == 1) {
>>> +        qemu_mutex_unlock(&s->e1000_lock);
>>> +        qemu_mutex_lock_iothread();
>>> +    }
>>> +    if (DEVICE(s)->state < DEV_STATE_STOPPING) {
>>> +        qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
>>> +    }
>>> +    if (t->context_type == 1) {
>>> +        qemu_mutex_unlock_iothread();
>>> +        qemu_mutex_lock(&s->e1000_lock);
>>> +    }
>>
>> This is ugly for many reasons. First of all, it is racy as the register
>> content may change while dropping the device lock, no? Then you would
>> raise or clear an IRQ spuriously.
>>
>> Second, it clearly shows that we need to address lock-less IRQ delivery.
>> Almost nothing is won if we have to take the global lock again to push
>> an IRQ event to the guest. I'm repeating myself, but the problem to be
>> solved here is almost identical to fast IRQ delivery for assigned
>> devices (which we only address pretty ad-hoc for PCI so far).
>>
> Interesting, could you show me more detail about it, so I can google...

No need to look that far, just grep for pci_device_route_intx_to_irq,
pci_device_set_intx_routing_notifier and related functions in the code.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock
  2012-10-22  9:23 [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Liu Ping Fan
                   ` (15 preceding siblings ...)
  2012-10-22  9:23 ` [Qemu-devel] [patch v4 16/16] e1000: implement MemoryRegionOps's ref&lock interface Liu Ping Fan
@ 2012-10-25 14:04 ` Peter Maydell
  2012-10-25 16:44   ` Jan Kiszka
  2012-10-25 17:07   ` Avi Kivity
  16 siblings, 2 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-25 14:04 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Avi Kivity,
	Anthony Liguori, Jan Kiszka, Paolo Bonzini

On 22 October 2012 10:23, Liu Ping Fan <pingfank@linux.vnet.ibm.com> wrote:
> v1:
> https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg03312.html
>
> v2:
> http://lists.gnu.org/archive/html/qemu-devel/2012-08/msg01275.html
>
> v3:
> http://lists.nongnu.org/archive/html/qemu-devel/2012-09/msg01474.html

Is there a clear up to date description somewhere of the design and
locking strategy here somewhere? I'd rather not have to try to
reconstitute it by reading the whole patchset...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
  2012-10-25  9:31           ` Jan Kiszka
@ 2012-10-25 16:21             ` Avi Kivity
  2012-10-25 16:39               ` Jan Kiszka
  0 siblings, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2012-10-25 16:21 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	liu ping fan, Anthony Liguori, Paolo Bonzini

On 10/25/2012 11:31 AM, Jan Kiszka wrote:
> On 2012-10-25 11:01, Avi Kivity wrote:
>> On 10/24/2012 09:17 AM, Jan Kiszka wrote:
>>>>>
>>>>> This is ugly for many reasons. First of all, it is racy as the register
>>>>> content may change while dropping the device lock, no? Then you would
>>>>> raise or clear an IRQ spuriously.
>>>>>
>>>> Device state's intact is protected by busy flag, and will not broken
>>>
>>> Except that the busy flag concept is broken in itself.
>> 
>> How do we fix an mmio that ends up mmio'ing back to itself, perhaps
>> indirectly?  Note this is broken in mainline too, but in a different way.
>> 
>> Do we introduce clever locks that can detect deadlocks?
> 
> That problem is already addressed (to my understanding) by blocking
> nested MMIO in general. 

That doesn't work cross-thread.

vcpu A: write to device X, dma-ing to device Y
vcpu B: write to device Y, dma-ing to device X

My suggestion was to drop the locks around DMA, then re-acquire the lock
and re-validate data.

> The brokenness of the busy flag is that it
> prevents concurrent MMIO by dropping requests.

Right.

> 
>> 
>>> I see that we have a all-or-nothing problem here: to address this
>>> properly, we need to convert the IRQ path to lock-less (or at least
>>> compatible with holding per-device locks) as well.
>> 
>> There is a transitional path where writing to a register that can cause
>> IRQ changes takes both the big lock and the local lock.
>> 
>> Eventually, though, of course all inner subsystems must be threaded for
>> this work to have value.
>> 
> 
> But that transitional path must not introduce regressions. Opening a
> race window between IRQ cause update and event injection is such a
> thing, just like dropping concurrent requests on the floor.

Can you explain the race?


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
  2012-10-25 13:34       ` Jan Kiszka
@ 2012-10-25 16:23         ` Avi Kivity
  2012-10-25 16:41           ` Jan Kiszka
  2012-10-29  5:24         ` liu ping fan
  1 sibling, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2012-10-25 16:23 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	liu ping fan, Anthony Liguori, Paolo Bonzini

On 10/25/2012 03:34 PM, Jan Kiszka wrote:

>>> Second, it clearly shows that we need to address lock-less IRQ delivery.
>>> Almost nothing is won if we have to take the global lock again to push
>>> an IRQ event to the guest. I'm repeating myself, but the problem to be
>>> solved here is almost identical to fast IRQ delivery for assigned
>>> devices (which we only address pretty ad-hoc for PCI so far).
>>>
>> Interesting, could you show me more detail about it, so I can google...
> 
> No need to look that far, just grep for pci_device_route_intx_to_irq,
> pci_device_set_intx_routing_notifier and related functions in the code.

We can address it in the same way the memory core supports concurrency,
by copying dispatch information into rcu or lock protected data structures.

But I really hope we can avoid doing it now.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-24  7:29                         ` Paolo Bonzini
@ 2012-10-25 16:28                           ` Avi Kivity
  2012-10-26 15:05                             ` Paolo Bonzini
  0 siblings, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2012-10-25 16:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka

On 10/24/2012 09:29 AM, Paolo Bonzini wrote:
> Il 23/10/2012 18:09, Avi Kivity ha scritto:
>>> But our interfaces had better support asynchronicity, and indeed they
>>> do: after you write to the "eject" register, the "up" will show the
>>> device as present until after destroy is done.  This can be changed to
>>> show the device as present only until after step 4 is done.
>> 
>> Let's say we want to eject the hotplug hardware itself (just as an
>> example).  With refcounts, the callback that updates "up" will hold on
>> to to it via refcounts.  With stop_machine(), you need to cancel that
>> callback, or wait for it somehow, or it can arrive after the
>> stop_machine() and bite you.
> 
> The callback that updates "up" is for the parent of the hotplug
> hardware.  There is nothing that has to be updated in the hotplug
> hardware itself.

I meant, as an unrealistic example, hot-unplugging the bridge itself.
So we have a callback that updates information in the bridge (up
register state) being called asynchronously.

A more realistic example would be hot-unplug of an HBA, then the block
layer callback comes back to update the device.  So stop_machine() would
need to cancel all I/O and wait for I/O that cannot be cancelled.

> 
> Updating the "up" register is the final part of isolate(), and runs
> before the stop_machine().  The steps above can be further refined like
> this:
> 
> 4a. close all backends (also cancel or complete all pending I/O)

^ long latency

> 4b. notify parent that we're done
>     4ba. parent removes device from its bus
>     4bb. parent notifies guest
>     4bc. parent schedules stop_machine(qdev_free(child))
> 5. a bottom half calls stop_machine(qdev_free(child))
> 
> If unplugging a whole sub-tree, the parent can notify its own parent at
> the end of 4b.  Because the only purpose of stop_machine is to quiesce
> subsystems not affected by step 4 (timer+memory, typically),
> destructions can be done in any order and even intermixed with
> executions of 4b for the parent.
> 
> In the beginning the only asynchronous step would be 5.  If the need
> arises we can use continuation-passing to make all the preceding steps
> asynchronous too.
> 

Maybe my worry about long stop_machine latencies is premature.  Everyone
in the kernel hates it, but the kernel scales a lot more than qemu and
is in a much better place wrt threading.



-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
  2012-10-25 16:21             ` Avi Kivity
@ 2012-10-25 16:39               ` Jan Kiszka
  2012-10-25 17:02                 ` Avi Kivity
  0 siblings, 1 reply; 102+ messages in thread
From: Jan Kiszka @ 2012-10-25 16:39 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	liu ping fan, Anthony Liguori, Paolo Bonzini

On 2012-10-25 18:21, Avi Kivity wrote:
> On 10/25/2012 11:31 AM, Jan Kiszka wrote:
>> On 2012-10-25 11:01, Avi Kivity wrote:
>>> On 10/24/2012 09:17 AM, Jan Kiszka wrote:
>>>>>>
>>>>>> This is ugly for many reasons. First of all, it is racy as the register
>>>>>> content may change while dropping the device lock, no? Then you would
>>>>>> raise or clear an IRQ spuriously.
>>>>>>
>>>>> Device state's intact is protected by busy flag, and will not broken
>>>>
>>>> Except that the busy flag concept is broken in itself.
>>>
>>> How do we fix an mmio that ends up mmio'ing back to itself, perhaps
>>> indirectly?  Note this is broken in mainline too, but in a different way.
>>>
>>> Do we introduce clever locks that can detect deadlocks?
>>
>> That problem is already addressed (to my understanding) by blocking
>> nested MMIO in general. 
> 
> That doesn't work cross-thread.
> 
> vcpu A: write to device X, dma-ing to device Y
> vcpu B: write to device Y, dma-ing to device X

We will deny DMA-ing from device X on behalf of a VCPU, ie. in dispatch
context, to Y.

What we do not deny, though, is DMA-ing from an I/O thread that
processes an event for device X. If the invoked callback of device X
holds the device lock across some DMA request to Y, then we risk to run
into the same ABBA issue. Hmm...

> 
> My suggestion was to drop the locks around DMA, then re-acquire the lock
> and re-validate data.

Maybe possible, but hairy depending on the device model.

> 
>> The brokenness of the busy flag is that it
>> prevents concurrent MMIO by dropping requests.
> 
> Right.
> 
>>
>>>
>>>> I see that we have a all-or-nothing problem here: to address this
>>>> properly, we need to convert the IRQ path to lock-less (or at least
>>>> compatible with holding per-device locks) as well.
>>>
>>> There is a transitional path where writing to a register that can cause
>>> IRQ changes takes both the big lock and the local lock.
>>>
>>> Eventually, though, of course all inner subsystems must be threaded for
>>> this work to have value.
>>>
>>
>> But that transitional path must not introduce regressions. Opening a
>> race window between IRQ cause update and event injection is such a
>> thing, just like dropping concurrent requests on the floor.
> 
> Can you explain the race?

Context A				Context B

device.lock
...
device.set interrupt_cause = 0
lower_irq = true
...
device.unlock
					device.lock
					...
					device.interrupt_cause = 42
					raise_irq = true
					...
					device.unlock
					if (raise_irq)
						bql.lock
						set_irq(device.irqno)
						bql.unlock
if (lower_irq)
	bql.lock
	clear_irq(device.irqno)
	bql.unlock


And there it goes, our interrupt event.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
  2012-10-25 16:23         ` Avi Kivity
@ 2012-10-25 16:41           ` Jan Kiszka
  2012-10-25 17:03             ` Avi Kivity
  0 siblings, 1 reply; 102+ messages in thread
From: Jan Kiszka @ 2012-10-25 16:41 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	liu ping fan, Anthony Liguori, Paolo Bonzini

On 2012-10-25 18:23, Avi Kivity wrote:
> On 10/25/2012 03:34 PM, Jan Kiszka wrote:
> 
>>>> Second, it clearly shows that we need to address lock-less IRQ delivery.
>>>> Almost nothing is won if we have to take the global lock again to push
>>>> an IRQ event to the guest. I'm repeating myself, but the problem to be
>>>> solved here is almost identical to fast IRQ delivery for assigned
>>>> devices (which we only address pretty ad-hoc for PCI so far).
>>>>
>>> Interesting, could you show me more detail about it, so I can google...
>>
>> No need to look that far, just grep for pci_device_route_intx_to_irq,
>> pci_device_set_intx_routing_notifier and related functions in the code.
> 
> We can address it in the same way the memory core supports concurrency,
> by copying dispatch information into rcu or lock protected data structures.
> 
> But I really hope we can avoid doing it now.

I doubt so as the alternative is taking the BQL while (still) holding
the device lock. But that creates ABBA risks.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock
  2012-10-25 14:04 ` [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Peter Maydell
@ 2012-10-25 16:44   ` Jan Kiszka
  2012-10-25 17:07   ` Avi Kivity
  1 sibling, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-25 16:44 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

On 2012-10-25 16:04, Peter Maydell wrote:
> On 22 October 2012 10:23, Liu Ping Fan <pingfank@linux.vnet.ibm.com> wrote:
>> v1:
>> https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg03312.html
>>
>> v2:
>> http://lists.gnu.org/archive/html/qemu-devel/2012-08/msg01275.html
>>
>> v3:
>> http://lists.nongnu.org/archive/html/qemu-devel/2012-09/msg01474.html
> 
> Is there a clear up to date description somewhere of the design and
> locking strategy here somewhere? I'd rather not have to try to
> reconstitute it by reading the whole patchset...

Not yet (someone may correct if I'm missing a bit). But I'm collecting
latest ideas and open issues for Barcelona ATM. Anyone who would like to
see something not recently brought up (again) in that overview or has
problematic code in QEMU in mind, may send me pointers.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
  2012-10-25 16:39               ` Jan Kiszka
@ 2012-10-25 17:02                 ` Avi Kivity
  2012-10-25 18:48                   ` Jan Kiszka
  0 siblings, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2012-10-25 17:02 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	liu ping fan, Anthony Liguori, Paolo Bonzini

On 10/25/2012 06:39 PM, Jan Kiszka wrote:
>> 
>> That doesn't work cross-thread.
>> 
>> vcpu A: write to device X, dma-ing to device Y
>> vcpu B: write to device Y, dma-ing to device X
> 
> We will deny DMA-ing from device X on behalf of a VCPU, ie. in dispatch
> context, to Y.
> 
> What we do not deny, though, is DMA-ing from an I/O thread that
> processes an event for device X. 

I would really like to avoid depending on the context.  In real hardware, there is no such thing.

> If the invoked callback of device X
> holds the device lock across some DMA request to Y, then we risk to run
> into the same ABBA issue. Hmm...

Yup.

> 
>> 
>> My suggestion was to drop the locks around DMA, then re-acquire the lock
>> and re-validate data.
> 
> Maybe possible, but hairy depending on the device model.

It's unpleasant, yes.

Note depending on the device, we may not need to re-validate data, it may be sufficient to load it into local variables to we know it is consistent at some point.  But all those solutions suffer from requiring device model authors to understand all those issues, rather than just add a simple lock around access to their data structures.

>>>>> I see that we have a all-or-nothing problem here: to address this
>>>>> properly, we need to convert the IRQ path to lock-less (or at least
>>>>> compatible with holding per-device locks) as well.
>>>>
>>>> There is a transitional path where writing to a register that can cause
>>>> IRQ changes takes both the big lock and the local lock.
>>>>
>>>> Eventually, though, of course all inner subsystems must be threaded for
>>>> this work to have value.
>>>>
>>>
>>> But that transitional path must not introduce regressions. Opening a
>>> race window between IRQ cause update and event injection is such a
>>> thing, just like dropping concurrent requests on the floor.
>> 
>> Can you explain the race?
> 
> Context A				Context B
> 
> device.lock
> ...
> device.set interrupt_cause = 0
> lower_irq = true
> ...
> device.unlock
> 					device.lock
> 					...
> 					device.interrupt_cause = 42
> 					raise_irq = true
> 					...
> 					device.unlock
> 					if (raise_irq)
> 						bql.lock
> 						set_irq(device.irqno)
> 						bql.unlock
> if (lower_irq)
> 	bql.lock
> 	clear_irq(device.irqno)
> 	bql.unlock
> 
> 
> And there it goes, our interrupt event.

Obviously you'll need to reacquire the device lock after taking bql and revalidate stuff.  But that is not what I am suggesting.  Instead, any path that can lead to an irq update (or timer update etc) will take both the bql and the device lock.  This will leave after the first pass only side effect free register reads and writes, which is silly if we keep it that way, but we will follow with a threaded timer and irq subsystem and we'll peel away those big locks.

  device_mmio_write:
    if register is involved in irq or timers or block layer or really anything that matters:
      bql.acquire
    device.lock.acquire
    do stuff
    device.lock.release
    if that big condition from above was true:
      bql.release

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
  2012-10-25 16:41           ` Jan Kiszka
@ 2012-10-25 17:03             ` Avi Kivity
  0 siblings, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2012-10-25 17:03 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	liu ping fan, Anthony Liguori, Paolo Bonzini

On 10/25/2012 06:41 PM, Jan Kiszka wrote:
> On 2012-10-25 18:23, Avi Kivity wrote:
>> On 10/25/2012 03:34 PM, Jan Kiszka wrote:
>> 
>>>>> Second, it clearly shows that we need to address lock-less IRQ delivery.
>>>>> Almost nothing is won if we have to take the global lock again to push
>>>>> an IRQ event to the guest. I'm repeating myself, but the problem to be
>>>>> solved here is almost identical to fast IRQ delivery for assigned
>>>>> devices (which we only address pretty ad-hoc for PCI so far).
>>>>>
>>>> Interesting, could you show me more detail about it, so I can google...
>>>
>>> No need to look that far, just grep for pci_device_route_intx_to_irq,
>>> pci_device_set_intx_routing_notifier and related functions in the code.
>> 
>> We can address it in the same way the memory core supports concurrency,
>> by copying dispatch information into rcu or lock protected data structures.
>> 
>> But I really hope we can avoid doing it now.
> 
> I doubt so as the alternative is taking the BQL while (still) holding
> the device lock. 

Sorry, doesn't parse.

> But that creates ABBA risks.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock
  2012-10-25 14:04 ` [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Peter Maydell
  2012-10-25 16:44   ` Jan Kiszka
@ 2012-10-25 17:07   ` Avi Kivity
  2012-10-25 17:13     ` Peter Maydell
  1 sibling, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2012-10-25 17:07 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka, Paolo Bonzini

On 10/25/2012 04:04 PM, Peter Maydell wrote:
> On 22 October 2012 10:23, Liu Ping Fan <pingfank@linux.vnet.ibm.com> wrote:
>> v1:
>> https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg03312.html
>>
>> v2:
>> http://lists.gnu.org/archive/html/qemu-devel/2012-08/msg01275.html
>>
>> v3:
>> http://lists.nongnu.org/archive/html/qemu-devel/2012-09/msg01474.html
> 
> Is there a clear up to date description somewhere of the design and
> locking strategy here somewhere? I'd rather not have to try to
> reconstitute it by reading the whole patchset...

It was described somewhere in a document by Marcelo and myself.
Basically the goal is to arrive at

address_space_write():
  rcu_read_lock()
  mr = lookup()
  mr->ref()
  rcu_read_unlock()

  mr->dispatch()

  mr->unref()

This is the same strategy used in many places in the kernel.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock
  2012-10-25 17:07   ` Avi Kivity
@ 2012-10-25 17:13     ` Peter Maydell
  2012-10-25 18:13       ` Marcelo Tosatti
  2012-10-29 15:24       ` Avi Kivity
  0 siblings, 2 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-25 17:13 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka, Paolo Bonzini

On 25 October 2012 18:07, Avi Kivity <avi@redhat.com> wrote:
> On 10/25/2012 04:04 PM, Peter Maydell wrote:
>> Is there a clear up to date description somewhere of the design and
>> locking strategy here somewhere? I'd rather not have to try to
>> reconstitute it by reading the whole patchset...
>
> It was described somewhere in a document by Marcelo and myself.
> Basically the goal is to arrive at
>
> address_space_write():
>   rcu_read_lock()
>   mr = lookup()
>   mr->ref()
>   rcu_read_unlock()
>
>   mr->dispatch()
>
>   mr->unref()
>
> This is the same strategy used in many places in the kernel.

Yes, but this is rather short on the details (eg, does every
device have its own lock, what are we doing with irqs, how about
dma from devices, etc etc). It's the details of the design I'd
like to see described...

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock
  2012-10-25 17:13     ` Peter Maydell
@ 2012-10-25 18:13       ` Marcelo Tosatti
  2012-10-25 19:00         ` Jan Kiszka
  2012-10-29 15:24       ` Avi Kivity
  1 sibling, 1 reply; 102+ messages in thread
From: Marcelo Tosatti @ 2012-10-25 18:13 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Liu Ping Fan, Stefan Hajnoczi, qemu-devel, Avi Kivity,
	Anthony Liguori, Jan Kiszka, Paolo Bonzini

On Thu, Oct 25, 2012 at 06:13:51PM +0100, Peter Maydell wrote:
> On 25 October 2012 18:07, Avi Kivity <avi@redhat.com> wrote:
> > On 10/25/2012 04:04 PM, Peter Maydell wrote:
> >> Is there a clear up to date description somewhere of the design and
> >> locking strategy here somewhere? I'd rather not have to try to
> >> reconstitute it by reading the whole patchset...
> >
> > It was described somewhere in a document by Marcelo and myself.
> > Basically the goal is to arrive at
> >
> > address_space_write():
> >   rcu_read_lock()
> >   mr = lookup()
> >   mr->ref()
> >   rcu_read_unlock()
> >
> >   mr->dispatch()
> >
> >   mr->unref()
> >
> > This is the same strategy used in many places in the kernel.
> 
> Yes, but this is rather short on the details (eg, does every
> device have its own lock, what are we doing with irqs, how about
> dma from devices, etc etc). It's the details of the design I'd
> like to see described...
> 
> -- PMM

A document should be maintained and updated to reflect ongoing 
agreement of problems and solutions... Jan/Liu, someone steps up
to do that?

The original is:
http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04315.html

For one thing, inter-device DMA issue discussed on the list is not
covered and probably large parts of it are obsolete by now (and 
should be deleted).

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
  2012-10-25 17:02                 ` Avi Kivity
@ 2012-10-25 18:48                   ` Jan Kiszka
  2012-10-29  5:24                     ` liu ping fan
  0 siblings, 1 reply; 102+ messages in thread
From: Jan Kiszka @ 2012-10-25 18:48 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	liu ping fan, Anthony Liguori, Paolo Bonzini

On 2012-10-25 19:02, Avi Kivity wrote:
> On 10/25/2012 06:39 PM, Jan Kiszka wrote:
>>>
>>> That doesn't work cross-thread.
>>>
>>> vcpu A: write to device X, dma-ing to device Y
>>> vcpu B: write to device Y, dma-ing to device X
>>
>> We will deny DMA-ing from device X on behalf of a VCPU, ie. in dispatch
>> context, to Y.
>>
>> What we do not deny, though, is DMA-ing from an I/O thread that
>> processes an event for device X. 
> 
> I would really like to avoid depending on the context.  In real hardware, there is no such thing.

The point is how we deal with any kind of access to a device that
requires taking that device's lock while holding another lock, provided
that scenario can also take place in reverse order at the same time.
Known scenarios are:

 - vcpu 1 -> access device A -> access device B
 - vcpu 2 -> access device B -> access device A

 - event 1 -> device A event processing -> access device B
 - event 2 -> device B event processing -> access device A

and combinations of those pairs.

> 
>> If the invoked callback of device X
>> holds the device lock across some DMA request to Y, then we risk to run
>> into the same ABBA issue. Hmm...
> 
> Yup.
> 
>>
>>>
>>> My suggestion was to drop the locks around DMA, then re-acquire the lock
>>> and re-validate data.
>>
>> Maybe possible, but hairy depending on the device model.
> 
> It's unpleasant, yes.
> 
> Note depending on the device, we may not need to re-validate data, it may be sufficient to load it into local variables to we know it is consistent at some point.  But all those solutions suffer from requiring device model authors to understand all those issues, rather than just add a simple lock around access to their data structures.

Right. And therefor it is a suboptimal way to start (patching).

> 
>>>>>> I see that we have a all-or-nothing problem here: to address this
>>>>>> properly, we need to convert the IRQ path to lock-less (or at least
>>>>>> compatible with holding per-device locks) as well.
>>>>>
>>>>> There is a transitional path where writing to a register that can cause
>>>>> IRQ changes takes both the big lock and the local lock.
>>>>>
>>>>> Eventually, though, of course all inner subsystems must be threaded for
>>>>> this work to have value.
>>>>>
>>>>
>>>> But that transitional path must not introduce regressions. Opening a
>>>> race window between IRQ cause update and event injection is such a
>>>> thing, just like dropping concurrent requests on the floor.
>>>
>>> Can you explain the race?
>>
>> Context A				Context B
>>
>> device.lock
>> ...
>> device.set interrupt_cause = 0
>> lower_irq = true
>> ...
>> device.unlock
>> 					device.lock
>> 					...
>> 					device.interrupt_cause = 42
>> 					raise_irq = true
>> 					...
>> 					device.unlock
>> 					if (raise_irq)
>> 						bql.lock
>> 						set_irq(device.irqno)
>> 						bql.unlock
>> if (lower_irq)
>> 	bql.lock
>> 	clear_irq(device.irqno)
>> 	bql.unlock
>>
>>
>> And there it goes, our interrupt event.
> 
> Obviously you'll need to reacquire the device lock after taking bql and revalidate stuff.  But that is not what I am suggesting.  Instead, any path that can lead to an irq update (or timer update etc) will take both the bql and the device lock.  This will leave after the first pass only side effect free register reads and writes, which is silly if we keep it that way, but we will follow with a threaded timer and irq subsystem and we'll peel away those big locks.
> 
>   device_mmio_write:
>     if register is involved in irq or timers or block layer or really anything that matters:
>       bql.acquire
>     device.lock.acquire
>     do stuff
>     device.lock.release
>     if that big condition from above was true:
>       bql.release

Looks simpler than it is as you cannot wrap complete handlers with that
pattern. An example where it would fail (until we solved the locking
issues above):

mmio_write:
  bql.conditional_lock
  device.lock
  device.check_state
  issue_dma
  device.update_state
  update_irq, play_with_timers, etc.
  device.unlock
  bql.conditional_unlock

If that DMA request hits an unconverted MMIO region or one that takes
BQL conditionally as above, we will lock up (or bail out as our mutexes
detect the error). E1000's start_xmit looks like this so far, and that's
a pretty import service.

Moreover, I prefer having a representative cut-through over enjoying to
merge a first step that excludes some 80% of the problems. For that
reason I would be even be inclined to start with addressing the IRQ
injection topic first (patch-wise), then the other necessary backend
services for the e1000 or whatever and convert some device(s) last.

IOW: cut out anything from this series that touches e1000 until the
building blocks for converting it reasonably are finished. Carrying
experimental, partially broken conversion on top is fine, try to merge
pieces of that not, IMHO.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock
  2012-10-25 18:13       ` Marcelo Tosatti
@ 2012-10-25 19:00         ` Jan Kiszka
  2012-10-25 19:06           ` Peter Maydell
  0 siblings, 1 reply; 102+ messages in thread
From: Jan Kiszka @ 2012-10-25 19:00 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Peter Maydell, Liu Ping Fan, Stefan Hajnoczi, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

On 2012-10-25 20:13, Marcelo Tosatti wrote:
> On Thu, Oct 25, 2012 at 06:13:51PM +0100, Peter Maydell wrote:
>> On 25 October 2012 18:07, Avi Kivity <avi@redhat.com> wrote:
>>> On 10/25/2012 04:04 PM, Peter Maydell wrote:
>>>> Is there a clear up to date description somewhere of the design and
>>>> locking strategy here somewhere? I'd rather not have to try to
>>>> reconstitute it by reading the whole patchset...
>>>
>>> It was described somewhere in a document by Marcelo and myself.
>>> Basically the goal is to arrive at
>>>
>>> address_space_write():
>>>   rcu_read_lock()
>>>   mr = lookup()
>>>   mr->ref()
>>>   rcu_read_unlock()
>>>
>>>   mr->dispatch()
>>>
>>>   mr->unref()
>>>
>>> This is the same strategy used in many places in the kernel.
>>
>> Yes, but this is rather short on the details (eg, does every
>> device have its own lock, what are we doing with irqs, how about
>> dma from devices, etc etc). It's the details of the design I'd
>> like to see described...
>>
>> -- PMM
> 
> A document should be maintained and updated to reflect ongoing 
> agreement of problems and solutions... Jan/Liu, someone steps up
> to do that?

I can pick this up as I have to anyway. First results will be pushed to
the wiki, around or after the KVM forum.

Jan

> 
> The original is:
> http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04315.html
> 
> For one thing, inter-device DMA issue discussed on the list is not
> covered and probably large parts of it are obsolete by now (and 
> should be deleted).

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock
  2012-10-25 19:00         ` Jan Kiszka
@ 2012-10-25 19:06           ` Peter Maydell
  0 siblings, 0 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-25 19:06 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

On 25 October 2012 20:00, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2012-10-25 20:13, Marcelo Tosatti wrote:
>> A document should be maintained and updated to reflect ongoing
>> agreement of problems and solutions... Jan/Liu, someone steps up
>> to do that?
>
> I can pick this up as I have to anyway. First results will be pushed to
> the wiki, around or after the KVM forum.

Thanks -- I appreciate it.

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state
  2012-10-25  9:04               ` Avi Kivity
@ 2012-10-26  3:05                 ` liu ping fan
  2012-10-26  3:08                   ` liu ping fan
  0 siblings, 1 reply; 102+ messages in thread
From: liu ping fan @ 2012-10-26  3:05 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka, Paolo Bonzini

On Thu, Oct 25, 2012 at 5:04 PM, Avi Kivity <avi@redhat.com> wrote:
> On 10/25/2012 11:00 AM, Peter Maydell wrote:
>> On 23 October 2012 10:37, Avi Kivity <avi@redhat.com> wrote:
>>> On 10/23/2012 11:32 AM, liu ping fan wrote:
>>>> On Tue, Oct 23, 2012 at 5:07 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>>> On 2012-10-23 07:52, liu ping fan wrote:
>>>>>> On Mon, Oct 22, 2012 at 6:40 PM, Avi Kivity <avi@redhat.com> wrote:
>>>>>>> On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
>>>>>> It will only record and fix the issue on one thread. But guest can
>>>>>> touch the emulated device on muti-threads.
>>>>>
>>>>> Sorry, what does that mean? A second VCPU accessing the device will
>>>>> simply be ignored when it races with another VCPU? Specifically
>>>>>
>>>> Yes, just ignored.  For device which support many logic in parallel,
>>>> it should use independent busy flag for each logic
>>>
>>> We don't actually know that e1000 doesn't.  Why won't writing into
>>> different registers in parallel work?
>>
>> Unless the device we're emulating supports multiple in
>> parallel accesses (and I bet 99.9% of the devices we model
>> don't) then the memory framework needs to serialise the
>> loads/stores. Otherwise it's just going to be excessively
>> hard to write a reliable device model.
>
> That's why we have a per-device lock.  The busy flag breaks that model
> by discarding accesses that occur in parallel.
>
I think by adopting the model, we can avoid this.

struct device_logic {
  bool busy;
  qemu_mutex lock;
  QemuCond wait;
};

LOCK(logic->lock)
while (logic->busy) {
qemu_cond_wait(&logic->wait, &logic->lock);
}
....
do hardware emulation
...
logic->busy = false;
qemu_cond_signal(&logic->wait);

This is identical to the biglock's behavior for parallel access to
device for nowadays. And then, the problem left is what level for
parallel we want. If we expect more parallel, we need to degrade the
device into more logic unit.


Regards,
pingfan
>
> --
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state
  2012-10-26  3:05                 ` liu ping fan
@ 2012-10-26  3:08                   ` liu ping fan
  2012-10-26 10:25                     ` Jan Kiszka
  0 siblings, 1 reply; 102+ messages in thread
From: liu ping fan @ 2012-10-26  3:08 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka, Paolo Bonzini

On Fri, Oct 26, 2012 at 11:05 AM, liu ping fan <qemulist@gmail.com> wrote:
> On Thu, Oct 25, 2012 at 5:04 PM, Avi Kivity <avi@redhat.com> wrote:
>> On 10/25/2012 11:00 AM, Peter Maydell wrote:
>>> On 23 October 2012 10:37, Avi Kivity <avi@redhat.com> wrote:
>>>> On 10/23/2012 11:32 AM, liu ping fan wrote:
>>>>> On Tue, Oct 23, 2012 at 5:07 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>>>> On 2012-10-23 07:52, liu ping fan wrote:
>>>>>>> On Mon, Oct 22, 2012 at 6:40 PM, Avi Kivity <avi@redhat.com> wrote:
>>>>>>>> On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
>>>>>>> It will only record and fix the issue on one thread. But guest can
>>>>>>> touch the emulated device on muti-threads.
>>>>>>
>>>>>> Sorry, what does that mean? A second VCPU accessing the device will
>>>>>> simply be ignored when it races with another VCPU? Specifically
>>>>>>
>>>>> Yes, just ignored.  For device which support many logic in parallel,
>>>>> it should use independent busy flag for each logic
>>>>
>>>> We don't actually know that e1000 doesn't.  Why won't writing into
>>>> different registers in parallel work?
>>>
>>> Unless the device we're emulating supports multiple in
>>> parallel accesses (and I bet 99.9% of the devices we model
>>> don't) then the memory framework needs to serialise the
>>> loads/stores. Otherwise it's just going to be excessively
>>> hard to write a reliable device model.
>>
>> That's why we have a per-device lock.  The busy flag breaks that model
>> by discarding accesses that occur in parallel.
>>
> I think by adopting the model, we can avoid this.
>
> struct device_logic {
>   bool busy;
>   qemu_mutex lock;
>   QemuCond wait;
> };
>
> LOCK(logic->lock)
> while (logic->busy) {
> qemu_cond_wait(&logic->wait, &logic->lock);
> }
> ....
> do hardware emulation
> ...
> logic->busy = false;
UNLOCK(lock); <-------------------------------------- forget
> qemu_cond_signal(&logic->wait);
>
> This is identical to the biglock's behavior for parallel access to
> device for nowadays. And then, the problem left is what level for
> parallel we want. If we expect more parallel, we need to degrade the
> device into more logic unit.
>
>
> Regards,
> pingfan
>>
>> --
>> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state
  2012-10-26  3:08                   ` liu ping fan
@ 2012-10-26 10:25                     ` Jan Kiszka
  2012-10-29  5:24                       ` liu ping fan
  0 siblings, 1 reply; 102+ messages in thread
From: Jan Kiszka @ 2012-10-26 10:25 UTC (permalink / raw)
  To: liu ping fan
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

On 2012-10-26 05:08, liu ping fan wrote:
> On Fri, Oct 26, 2012 at 11:05 AM, liu ping fan <qemulist@gmail.com> wrote:
>> On Thu, Oct 25, 2012 at 5:04 PM, Avi Kivity <avi@redhat.com> wrote:
>>> On 10/25/2012 11:00 AM, Peter Maydell wrote:
>>>> On 23 October 2012 10:37, Avi Kivity <avi@redhat.com> wrote:
>>>>> On 10/23/2012 11:32 AM, liu ping fan wrote:
>>>>>> On Tue, Oct 23, 2012 at 5:07 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>>>>> On 2012-10-23 07:52, liu ping fan wrote:
>>>>>>>> On Mon, Oct 22, 2012 at 6:40 PM, Avi Kivity <avi@redhat.com> wrote:
>>>>>>>>> On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
>>>>>>>> It will only record and fix the issue on one thread. But guest can
>>>>>>>> touch the emulated device on muti-threads.
>>>>>>>
>>>>>>> Sorry, what does that mean? A second VCPU accessing the device will
>>>>>>> simply be ignored when it races with another VCPU? Specifically
>>>>>>>
>>>>>> Yes, just ignored.  For device which support many logic in parallel,
>>>>>> it should use independent busy flag for each logic
>>>>>
>>>>> We don't actually know that e1000 doesn't.  Why won't writing into
>>>>> different registers in parallel work?
>>>>
>>>> Unless the device we're emulating supports multiple in
>>>> parallel accesses (and I bet 99.9% of the devices we model
>>>> don't) then the memory framework needs to serialise the
>>>> loads/stores. Otherwise it's just going to be excessively
>>>> hard to write a reliable device model.
>>>
>>> That's why we have a per-device lock.  The busy flag breaks that model
>>> by discarding accesses that occur in parallel.
>>>
>> I think by adopting the model, we can avoid this.
>>
>> struct device_logic {
>>   bool busy;
>>   qemu_mutex lock;
>>   QemuCond wait;
>> };
>>
>> LOCK(logic->lock)
>> while (logic->busy) {
>> qemu_cond_wait(&logic->wait, &logic->lock);
>> }
>> ....
>> do hardware emulation
>> ...
>> logic->busy = false;
> UNLOCK(lock); <-------------------------------------- forget
>> qemu_cond_signal(&logic->wait);
>>
>> This is identical to the biglock's behavior for parallel access to
>> device for nowadays. And then, the problem left is what level for
>> parallel we want. If we expect more parallel, we need to degrade the
>> device into more logic unit.

But where is the remaining added-value of the busy flag then? Everyone
could just as well be serialized by the lock itself. And even when
dropping the lock while running the hw emulation, that doesn't change
anything to the semantic - and our ABBA problems I sketched yesterday.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps
  2012-10-25 16:28                           ` Avi Kivity
@ 2012-10-26 15:05                             ` Paolo Bonzini
  0 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-26 15:05 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka



----- Messaggio originale -----
> Da: "Avi Kivity" <avi@redhat.com>
> A: "Paolo Bonzini" <pbonzini@redhat.com>
> Cc: "Liu Ping Fan" <pingfank@linux.vnet.ibm.com>, qemu-devel@nongnu.org, "Anthony Liguori" <anthony@codemonkey.ws>,
> "Marcelo Tosatti" <mtosatti@redhat.com>, "Jan Kiszka" <jan.kiszka@siemens.com>, "Stefan Hajnoczi"
> <stefanha@gmail.com>
> Inviato: Giovedì, 25 ottobre 2012 18:28:27
> Oggetto: Re: [patch v4 05/16] memory: introduce ref,unref interface for MemoryRegionOps
> 
> On 10/24/2012 09:29 AM, Paolo Bonzini wrote:
> > Il 23/10/2012 18:09, Avi Kivity ha scritto:
> >>> But our interfaces had better support asynchronicity, and indeed
> >>> they
> >>> do: after you write to the "eject" register, the "up" will show
> >>> the
> >>> device as present until after destroy is done.  This can be
> >>> changed to
> >>> show the device as present only until after step 4 is done.
> >> 
> >> Let's say we want to eject the hotplug hardware itself (just as an
> >> example).  With refcounts, the callback that updates "up" will hold
> >> on to to it via refcounts.  With stop_machine(), you need to cancel
> >> that callback, or wait for it somehow, or it can arrive after the
> >> stop_machine() and bite you.
> > 
> > The callback that updates "up" is for the parent of the hotplug
> > hardware.  There is nothing that has to be updated in the hotplug
> > hardware itself.
> 
> I meant, as an unrealistic example, hot-unplugging the bridge itself.
> So we have a callback that updates information in the bridge (up
> register state) being called asynchronously.
> 
> A more realistic example would be hot-unplug of an HBA, then the block
> layer callback comes back to update the device.  So stop_machine()
> would need to cancel all I/O and wait for I/O that cannot be cancelled.

Cancellation+wait would be triggered by isolate (4a) and it would run
outside stop_machine().  We know that stop_machine() will eventually
run because the guest cannot place more requests for the devices to
process.

At this point we're here:

> > 4a. close all backends (also cancel or complete all pending I/O)
> 
> ^ long latency
> 

but none of this is done in stop_machine().  Once cancellation/wait
finishes, the HBA gives a green-light to the parent, which proceeds
as follows:

> > 4b. notify parent that we're done
> >     4ba. parent removes device from its bus
> >     4bb. parent notifies guest
> >     4bc. parent schedules stop_machine(qdev_free(child))
> > 5. a bottom half calls stop_machine(qdev_free(child))

All we're doing in stop_machine() is really calling the destructor,
which---in an isolate-enabled device---only includes calls to
qemu_del_timer, drive_put_ref, memory_region_destroy and the like.

> Maybe my worry about long stop_machine latencies is premature.
> Everyone in the kernel hates it, but the kernel scales a lot more
> than qemu and is in a much better place wrt threading.

stop_machine may indeed require (or at least warmly suggest) a
conversion to isolate of storage devices, in order to reduce the
latency of the destructor.  We do not have that many though (the
IDE and SCSI buses, and virtio-blk).

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
  2012-10-25 18:48                   ` Jan Kiszka
@ 2012-10-29  5:24                     ` liu ping fan
  0 siblings, 0 replies; 102+ messages in thread
From: liu ping fan @ 2012-10-29  5:24 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

On Fri, Oct 26, 2012 at 2:48 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2012-10-25 19:02, Avi Kivity wrote:
>> On 10/25/2012 06:39 PM, Jan Kiszka wrote:
>>>>
>>>> That doesn't work cross-thread.
>>>>
>>>> vcpu A: write to device X, dma-ing to device Y
>>>> vcpu B: write to device Y, dma-ing to device X
>>>
>>> We will deny DMA-ing from device X on behalf of a VCPU, ie. in dispatch
>>> context, to Y.
>>>
>>> What we do not deny, though, is DMA-ing from an I/O thread that
>>> processes an event for device X.
>>
>> I would really like to avoid depending on the context.  In real hardware, there is no such thing.
>
> The point is how we deal with any kind of access to a device that
> requires taking that device's lock while holding another lock, provided
> that scenario can also take place in reverse order at the same time.
> Known scenarios are:
>
>  - vcpu 1 -> access device A -> access device B
>  - vcpu 2 -> access device B -> access device A
>
>  - event 1 -> device A event processing -> access device B
>  - event 2 -> device B event processing -> access device A
>
> and combinations of those pairs.
>
>>
>>> If the invoked callback of device X
>>> holds the device lock across some DMA request to Y, then we risk to run
>>> into the same ABBA issue. Hmm...
>>
>> Yup.
>>
>>>
>>>>
>>>> My suggestion was to drop the locks around DMA, then re-acquire the lock
>>>> and re-validate data.
>>>
>>> Maybe possible, but hairy depending on the device model.
>>
>> It's unpleasant, yes.
>>
>> Note depending on the device, we may not need to re-validate data, it may be sufficient to load it into local variables to we know it is consistent at some point.  But all those solutions suffer from requiring device model authors to understand all those issues, rather than just add a simple lock around access to their data structures.
>
> Right. And therefor it is a suboptimal way to start (patching).
>
>>
>>>>>>> I see that we have a all-or-nothing problem here: to address this
>>>>>>> properly, we need to convert the IRQ path to lock-less (or at least
>>>>>>> compatible with holding per-device locks) as well.
>>>>>>
>>>>>> There is a transitional path where writing to a register that can cause
>>>>>> IRQ changes takes both the big lock and the local lock.
>>>>>>
>>>>>> Eventually, though, of course all inner subsystems must be threaded for
>>>>>> this work to have value.
>>>>>>
>>>>>
>>>>> But that transitional path must not introduce regressions. Opening a
>>>>> race window between IRQ cause update and event injection is such a
>>>>> thing, just like dropping concurrent requests on the floor.
>>>>
>>>> Can you explain the race?
>>>
>>> Context A                            Context B
>>>
>>> device.lock
>>> ...
>>> device.set interrupt_cause = 0
>>> lower_irq = true
>>> ...
>>> device.unlock
>>>                                      device.lock
>>>                                      ...
>>>                                      device.interrupt_cause = 42
>>>                                      raise_irq = true
>>>                                      ...
>>>                                      device.unlock
>>>                                      if (raise_irq)
>>>                                              bql.lock
>>>                                              set_irq(device.irqno)
>>>                                              bql.unlock
>>> if (lower_irq)
>>>      bql.lock
>>>      clear_irq(device.irqno)
>>>      bql.unlock
>>>
>>>
>>> And there it goes, our interrupt event.
>>
>> Obviously you'll need to reacquire the device lock after taking bql and revalidate stuff.  But that is not what I am suggesting.  Instead, any path that can lead to an irq update (or timer update etc) will take both the bql and the device lock.  This will leave after the first pass only side effect free register reads and writes, which is silly if we keep it that way, but we will follow with a threaded timer and irq subsystem and we'll peel away those big locks.
>>
>>   device_mmio_write:
>>     if register is involved in irq or timers or block layer or really anything that matters:
>>       bql.acquire
>>     device.lock.acquire
>>     do stuff
>>     device.lock.release
>>     if that big condition from above was true:
>>       bql.release
>
> Looks simpler than it is as you cannot wrap complete handlers with that
> pattern. An example where it would fail (until we solved the locking
> issues above):
>
> mmio_write:
>   bql.conditional_lock
>   device.lock
>   device.check_state
>   issue_dma
>   device.update_state
>   update_irq, play_with_timers, etc.
>   device.unlock
>   bql.conditional_unlock
>
> If that DMA request hits an unconverted MMIO region or one that takes
> BQL conditionally as above, we will lock up (or bail out as our mutexes
> detect the error). E1000's start_xmit looks like this so far, and that's
> a pretty import service.
>
> Moreover, I prefer having a representative cut-through over enjoying to
> merge a first step that excludes some 80% of the problems. For that
> reason I would be even be inclined to start with addressing the IRQ
> injection topic first (patch-wise), then the other necessary backend
> services for the e1000 or whatever and convert some device(s) last.
>
> IOW: cut out anything from this series that touches e1000 until the
> building blocks for converting it reasonably are finished. Carrying
> experimental, partially broken conversion on top is fine, try to merge
> pieces of that not, IMHO.
>
Agreed.  Just want to take this opportunity to discuss what is the
next and what is nut.

Regards,
pingfan

> Jan
>
> --
> Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
  2012-10-25 13:34       ` Jan Kiszka
  2012-10-25 16:23         ` Avi Kivity
@ 2012-10-29  5:24         ` liu ping fan
  2012-10-31  7:03           ` Jan Kiszka
  1 sibling, 1 reply; 102+ messages in thread
From: liu ping fan @ 2012-10-29  5:24 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

On Thu, Oct 25, 2012 at 9:34 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2012-10-24 09:29, liu ping fan wrote:
>> On Tue, Oct 23, 2012 at 5:04 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> On 2012-10-22 11:23, Liu Ping Fan wrote:
>>>> Use local lock to protect e1000. When calling the system function,
>>>> dropping the fine lock before acquiring the big lock. This will
>>>> introduce broken device state, which need extra effort to fix.
>>>>
>>>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>> ---
>>>>  hw/e1000.c |   24 +++++++++++++++++++++++-
>>>>  1 files changed, 23 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/hw/e1000.c b/hw/e1000.c
>>>> index ae8a6c5..5eddab5 100644
>>>> --- a/hw/e1000.c
>>>> +++ b/hw/e1000.c
>>>> @@ -85,6 +85,7 @@ typedef struct E1000State_st {
>>>>      NICConf conf;
>>>>      MemoryRegion mmio;
>>>>      MemoryRegion io;
>>>> +    QemuMutex e1000_lock;
>>>>
>>>>      uint32_t mac_reg[0x8000];
>>>>      uint16_t phy_reg[0x20];
>>>> @@ -223,13 +224,27 @@ static const uint32_t mac_reg_init[] = {
>>>>  static void
>>>>  set_interrupt_cause(E1000State *s, int index, uint32_t val)
>>>>  {
>>>> +    QemuThread *t;
>>>> +
>>>>      if (val && (E1000_DEVID >= E1000_DEV_ID_82547EI_MOBILE)) {
>>>>          /* Only for 8257x */
>>>>          val |= E1000_ICR_INT_ASSERTED;
>>>>      }
>>>>      s->mac_reg[ICR] = val;
>>>>      s->mac_reg[ICS] = val;
>>>> -    qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
>>>> +
>>>> +    t = pthread_getspecific(qemu_thread_key);
>>>> +    if (t->context_type == 1) {
>>>> +        qemu_mutex_unlock(&s->e1000_lock);
>>>> +        qemu_mutex_lock_iothread();
>>>> +    }
>>>> +    if (DEVICE(s)->state < DEV_STATE_STOPPING) {
>>>> +        qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
>>>> +    }
>>>> +    if (t->context_type == 1) {
>>>> +        qemu_mutex_unlock_iothread();
>>>> +        qemu_mutex_lock(&s->e1000_lock);
>>>> +    }
>>>
>>> This is ugly for many reasons. First of all, it is racy as the register
>>> content may change while dropping the device lock, no? Then you would
>>> raise or clear an IRQ spuriously.
>>>
>>> Second, it clearly shows that we need to address lock-less IRQ delivery.
>>> Almost nothing is won if we have to take the global lock again to push
>>> an IRQ event to the guest. I'm repeating myself, but the problem to be
>>> solved here is almost identical to fast IRQ delivery for assigned
>>> devices (which we only address pretty ad-hoc for PCI so far).
>>>
>> Interesting, could you show me more detail about it, so I can google...
>
> No need to look that far, just grep for pci_device_route_intx_to_irq,
> pci_device_set_intx_routing_notifier and related functions in the code.
>
I think, the major point here is to bypass the delivery process among
the irq chipset during runtime. Right?

Thanks and regards,

pingfan
> Jan
>
> --
> Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state
  2012-10-26 10:25                     ` Jan Kiszka
@ 2012-10-29  5:24                       ` liu ping fan
  2012-10-29  7:50                         ` Peter Maydell
  0 siblings, 1 reply; 102+ messages in thread
From: liu ping fan @ 2012-10-29  5:24 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Peter Maydell, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

On Fri, Oct 26, 2012 at 6:25 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2012-10-26 05:08, liu ping fan wrote:
>> On Fri, Oct 26, 2012 at 11:05 AM, liu ping fan <qemulist@gmail.com> wrote:
>>> On Thu, Oct 25, 2012 at 5:04 PM, Avi Kivity <avi@redhat.com> wrote:
>>>> On 10/25/2012 11:00 AM, Peter Maydell wrote:
>>>>> On 23 October 2012 10:37, Avi Kivity <avi@redhat.com> wrote:
>>>>>> On 10/23/2012 11:32 AM, liu ping fan wrote:
>>>>>>> On Tue, Oct 23, 2012 at 5:07 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>>>>>> On 2012-10-23 07:52, liu ping fan wrote:
>>>>>>>>> On Mon, Oct 22, 2012 at 6:40 PM, Avi Kivity <avi@redhat.com> wrote:
>>>>>>>>>> On 10/22/2012 11:23 AM, Liu Ping Fan wrote:
>>>>>>>>> It will only record and fix the issue on one thread. But guest can
>>>>>>>>> touch the emulated device on muti-threads.
>>>>>>>>
>>>>>>>> Sorry, what does that mean? A second VCPU accessing the device will
>>>>>>>> simply be ignored when it races with another VCPU? Specifically
>>>>>>>>
>>>>>>> Yes, just ignored.  For device which support many logic in parallel,
>>>>>>> it should use independent busy flag for each logic
>>>>>>
>>>>>> We don't actually know that e1000 doesn't.  Why won't writing into
>>>>>> different registers in parallel work?
>>>>>
>>>>> Unless the device we're emulating supports multiple in
>>>>> parallel accesses (and I bet 99.9% of the devices we model
>>>>> don't) then the memory framework needs to serialise the
>>>>> loads/stores. Otherwise it's just going to be excessively
>>>>> hard to write a reliable device model.
>>>>
>>>> That's why we have a per-device lock.  The busy flag breaks that model
>>>> by discarding accesses that occur in parallel.
>>>>
>>> I think by adopting the model, we can avoid this.
>>>
>>> struct device_logic {
>>>   bool busy;
>>>   qemu_mutex lock;
>>>   QemuCond wait;
>>> };
>>>
>>> LOCK(logic->lock)
>>> while (logic->busy) {
>>> qemu_cond_wait(&logic->wait, &logic->lock);
>>> }
>>> ....
>>> do hardware emulation
>>> ...
>>> logic->busy = false;
>> UNLOCK(lock); <-------------------------------------- forget
>>> qemu_cond_signal(&logic->wait);
>>>
>>> This is identical to the biglock's behavior for parallel access to
>>> device for nowadays. And then, the problem left is what level for
>>> parallel we want. If we expect more parallel, we need to degrade the
>>> device into more logic unit.
>
> But where is the remaining added-value of the busy flag then? Everyone
> could just as well be serialized by the lock itself. And even when
> dropping the lock while running the hw emulation, that doesn't change

The key is that local lock is broken, so we rely on a high level lock
-- busy flag.  After threading each subsystem, we will not face broken
local lock issue, and can throw away this design.

> anything to the semantic - and our ABBA problems I sketched yesterday.
>
Oh, ABBA problem can not be solved, I think we need clever deadlock detector.

Regards,
pingfan

> Jan
>
> --
> Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state
  2012-10-29  5:24                       ` liu ping fan
@ 2012-10-29  7:50                         ` Peter Maydell
  0 siblings, 0 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-29  7:50 UTC (permalink / raw)
  To: liu ping fan
  Cc: Stefan Hajnoczi, Marcelo Tosatti, qemu-devel, Avi Kivity,
	Anthony Liguori, Jan Kiszka, Paolo Bonzini

On 29 October 2012 05:24, liu ping fan <qemulist@gmail.com> wrote:
> Oh, ABBA problem can not be solved, I think we need clever deadlock detector.

If you cannot solve the problem then you must remain single threaded.

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock
  2012-10-25 17:13     ` Peter Maydell
  2012-10-25 18:13       ` Marcelo Tosatti
@ 2012-10-29 15:24       ` Avi Kivity
  1 sibling, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2012-10-29 15:24 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Jan Kiszka, Paolo Bonzini

On 10/25/2012 07:13 PM, Peter Maydell wrote:
> On 25 October 2012 18:07, Avi Kivity <avi@redhat.com> wrote:
>> On 10/25/2012 04:04 PM, Peter Maydell wrote:
>>> Is there a clear up to date description somewhere of the design and
>>> locking strategy here somewhere? I'd rather not have to try to
>>> reconstitute it by reading the whole patchset...
>>
>> It was described somewhere in a document by Marcelo and myself.
>> Basically the goal is to arrive at
>>
>> address_space_write():
>>   rcu_read_lock()
>>   mr = lookup()
>>   mr->ref()
>>   rcu_read_unlock()
>>
>>   mr->dispatch()
>>
>>   mr->unref()
>>
>> This is the same strategy used in many places in the kernel.
> 
> Yes, but this is rather short on the details

Until Jan fleshes this out:

> (eg, does every
> device have its own lock, 

No, devices which are not modified will continue to use the BQL.

> what are we doing with irqs, 

Eventually they will gain fine-grained threading too.  Until then, they
will be protected by the big lock (and any device which calls any irq
APIs must hold it).

> how about
> dma from devices, etc etc).

DMA will be unlocked, if done to a device which has its own lock (same
as mmio).

> It's the details of the design I'd
> like to see described...


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000
  2012-10-29  5:24         ` liu ping fan
@ 2012-10-31  7:03           ` Jan Kiszka
  0 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-31  7:03 UTC (permalink / raw)
  To: liu ping fan
  Cc: Liu Ping Fan, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Anthony Liguori, Paolo Bonzini

On 2012-10-29 06:24, liu ping fan wrote:
> On Thu, Oct 25, 2012 at 9:34 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2012-10-24 09:29, liu ping fan wrote:
>>> On Tue, Oct 23, 2012 at 5:04 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>> On 2012-10-22 11:23, Liu Ping Fan wrote:
>>>>> Use local lock to protect e1000. When calling the system function,
>>>>> dropping the fine lock before acquiring the big lock. This will
>>>>> introduce broken device state, which need extra effort to fix.
>>>>>
>>>>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>>> ---
>>>>>  hw/e1000.c |   24 +++++++++++++++++++++++-
>>>>>  1 files changed, 23 insertions(+), 1 deletions(-)
>>>>>
>>>>> diff --git a/hw/e1000.c b/hw/e1000.c
>>>>> index ae8a6c5..5eddab5 100644
>>>>> --- a/hw/e1000.c
>>>>> +++ b/hw/e1000.c
>>>>> @@ -85,6 +85,7 @@ typedef struct E1000State_st {
>>>>>      NICConf conf;
>>>>>      MemoryRegion mmio;
>>>>>      MemoryRegion io;
>>>>> +    QemuMutex e1000_lock;
>>>>>
>>>>>      uint32_t mac_reg[0x8000];
>>>>>      uint16_t phy_reg[0x20];
>>>>> @@ -223,13 +224,27 @@ static const uint32_t mac_reg_init[] = {
>>>>>  static void
>>>>>  set_interrupt_cause(E1000State *s, int index, uint32_t val)
>>>>>  {
>>>>> +    QemuThread *t;
>>>>> +
>>>>>      if (val && (E1000_DEVID >= E1000_DEV_ID_82547EI_MOBILE)) {
>>>>>          /* Only for 8257x */
>>>>>          val |= E1000_ICR_INT_ASSERTED;
>>>>>      }
>>>>>      s->mac_reg[ICR] = val;
>>>>>      s->mac_reg[ICS] = val;
>>>>> -    qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
>>>>> +
>>>>> +    t = pthread_getspecific(qemu_thread_key);
>>>>> +    if (t->context_type == 1) {
>>>>> +        qemu_mutex_unlock(&s->e1000_lock);
>>>>> +        qemu_mutex_lock_iothread();
>>>>> +    }
>>>>> +    if (DEVICE(s)->state < DEV_STATE_STOPPING) {
>>>>> +        qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
>>>>> +    }
>>>>> +    if (t->context_type == 1) {
>>>>> +        qemu_mutex_unlock_iothread();
>>>>> +        qemu_mutex_lock(&s->e1000_lock);
>>>>> +    }
>>>>
>>>> This is ugly for many reasons. First of all, it is racy as the register
>>>> content may change while dropping the device lock, no? Then you would
>>>> raise or clear an IRQ spuriously.
>>>>
>>>> Second, it clearly shows that we need to address lock-less IRQ delivery.
>>>> Almost nothing is won if we have to take the global lock again to push
>>>> an IRQ event to the guest. I'm repeating myself, but the problem to be
>>>> solved here is almost identical to fast IRQ delivery for assigned
>>>> devices (which we only address pretty ad-hoc for PCI so far).
>>>>
>>> Interesting, could you show me more detail about it, so I can google...
>>
>> No need to look that far, just grep for pci_device_route_intx_to_irq,
>> pci_device_set_intx_routing_notifier and related functions in the code.
>>
> I think, the major point here is to bypass the delivery process among
> the irq chipset during runtime. Right?

Right.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 08/16] QemuThread: make QemuThread as tls to store extra info
  2012-10-23 12:00           ` Paolo Bonzini
  2012-10-23 12:27             ` Peter Maydell
@ 2012-11-18 10:02             ` Brad Smith
  2012-11-18 16:14               ` Paolo Bonzini
  1 sibling, 1 reply; 102+ messages in thread
From: Brad Smith @ 2012-11-18 10:02 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Liu Ping Fan, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Avi Kivity, Anthony Liguori, Stefan Hajnoczi

On 10/23/12 08:00, Paolo Bonzini wrote:
> Il 23/10/2012 13:50, Peter Maydell ha scritto:
>> On 23 October 2012 12:48, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> Il 22/10/2012 19:13, Peter Maydell ha scritto:
>>>>>> Can't we enhance qemu-tls.h to work via pthread_setspecific in case
>>>>>> __thread is not working and use that abstraction (DECLARE/DEFINE_TLS)
>>>>>> directly?
>>>> Agreed. (There were prototype patches floating around for Win32
>>>> at least). The only reason qemu-tls.h has the dummy not-actually-tls
>>>> code for non-linux is that IIRC we wanted to get the linux bits
>>>> in quickly before a release and we never got round to going back
>>>> and doing it properly for the other targets.
>>>
>>> Which will be "never" for OpenBSD.  It just doesn't have enough support.
>>>
>>> Thread-wise OpenBSD is 100% crap, and we should stop supporting it IMHO
>>> until they finish their "new" thread library that's been in the works
>>> for 10 years or so.  FreeBSD is totally ok.
>>
>> It doesn't support any kind of TLS? Wow.
>
> It does support pthread_get/setspecific, but it didn't support something
> else so the qemu-tls.h variant that used pthread_get/setspecific didn't
> work either.
>
> And it doesn't support sigaltstack in threads, so it's the only platform
> where the gthread-based coroutines are used.  Those are buggy because
> the coroutines tend to get random signal masks.
>
> Paolo

I'd love to know what that something else is.

There is a diff pending to fix sigaltstack in threads which should
be going into -current very soon.

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 08/16] QemuThread: make QemuThread as tls to store extra info
  2012-11-18 10:02             ` Brad Smith
@ 2012-11-18 16:14               ` Paolo Bonzini
  2012-11-18 16:15                 ` Paolo Bonzini
  0 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2012-11-18 16:14 UTC (permalink / raw)
  To: Brad Smith
  Cc: Peter Maydell, Liu Ping Fan, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Avi Kivity, Anthony Liguori, Stefan Hajnoczi

> > It does support pthread_get/setspecific, but it didn't support something
> > else so the qemu-tls.h variant that used pthread_get/setspecific didn't
> > work either.
> >
> > And it doesn't support sigaltstack in threads, so it's the only platform
> > where the gthread-based coroutines are used.  Those are buggy because
> > the coroutines tend to get random signal masks.
> 
> I'd love to know what that something else is.

I think it is constructor priorities.  Probably not needed if I
look at the code again with a fresh mind. :)

But yes
Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [patch v4 08/16] QemuThread: make QemuThread as tls to store extra info
  2012-11-18 16:14               ` Paolo Bonzini
@ 2012-11-18 16:15                 ` Paolo Bonzini
  0 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-11-18 16:15 UTC (permalink / raw)
  To: Brad Smith
  Cc: Peter Maydell, Liu Ping Fan, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Avi Kivity, Anthony Liguori, Stefan Hajnoczi


> > > It does support pthread_get/setspecific, but it didn't support
> > > something
> > > else so the qemu-tls.h variant that used pthread_get/setspecific
> > > didn't
> > > work either.
> > >
> > > And it doesn't support sigaltstack in threads, so it's the only
> > > platform
> > > where the gthread-based coroutines are used.  Those are buggy
> > > because
> > > the coroutines tend to get random signal masks.
> > 
> > I'd love to know what that something else is.
> 
> I think it is constructor priorities.  Probably not needed if I
> look at the code again with a fresh mind. :)
> 
> But yes

... now that real pthreads are supported in OpenBSD it's a wholly
different story, and we should simply (in 1.4) stop supporting older
versions.  (Sincere) congratulations to the OpenBSD devs!

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2012-11-18 16:16 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-22  9:23 [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Liu Ping Fan
2012-10-22  9:23 ` [Qemu-devel] [patch v4 01/16] atomic: introduce atomic operations Liu Ping Fan
2012-10-22  9:23 ` [Qemu-devel] [patch v4 02/16] qom: apply atomic on object's refcount Liu Ping Fan
2012-10-22  9:23 ` [Qemu-devel] [patch v4 03/16] hotplug: introduce qdev_unplug_complete() to remove device from views Liu Ping Fan
2012-10-22  9:23 ` [Qemu-devel] [patch v4 04/16] pci: remove pci device from mem view when unplug Liu Ping Fan
2012-10-22  9:23 ` [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps Liu Ping Fan
2012-10-22  9:38   ` Avi Kivity
2012-10-23 11:51     ` Paolo Bonzini
2012-10-23 11:55       ` Avi Kivity
2012-10-23 11:57         ` Paolo Bonzini
2012-10-23 12:02           ` Avi Kivity
2012-10-23 12:06             ` Paolo Bonzini
2012-10-23 12:15               ` Avi Kivity
2012-10-23 12:32                 ` Paolo Bonzini
2012-10-23 14:49                   ` Avi Kivity
2012-10-23 15:26                     ` Paolo Bonzini
2012-10-23 16:09                       ` Avi Kivity
2012-10-24  7:29                         ` Paolo Bonzini
2012-10-25 16:28                           ` Avi Kivity
2012-10-26 15:05                             ` Paolo Bonzini
2012-10-23 12:04         ` Jan Kiszka
2012-10-23 12:12           ` Paolo Bonzini
2012-10-23 12:16             ` Jan Kiszka
2012-10-23 12:28               ` Avi Kivity
2012-10-23 12:40                 ` Jan Kiszka
2012-10-23 14:37                   ` Avi Kivity
2012-10-22  9:23 ` [Qemu-devel] [patch v4 06/16] memory: document ref, unref interface Liu Ping Fan
2012-10-22  9:23 ` [Qemu-devel] [patch v4 07/16] memory: make mmio dispatch able to be out of biglock Liu Ping Fan
2012-10-23 12:12   ` Jan Kiszka
2012-10-23 12:36     ` Avi Kivity
2012-10-24  6:31       ` liu ping fan
2012-10-24  6:56         ` liu ping fan
2012-10-25  8:57           ` Avi Kivity
2012-10-22  9:23 ` [Qemu-devel] [patch v4 08/16] QemuThread: make QemuThread as tls to store extra info Liu Ping Fan
2012-10-22  9:30   ` Jan Kiszka
2012-10-22 17:13     ` Peter Maydell
2012-10-23  5:58       ` liu ping fan
2012-10-23 11:48       ` Paolo Bonzini
2012-10-23 11:50         ` Peter Maydell
2012-10-23 11:51           ` Jan Kiszka
2012-10-23 12:00           ` Paolo Bonzini
2012-10-23 12:27             ` Peter Maydell
2012-11-18 10:02             ` Brad Smith
2012-11-18 16:14               ` Paolo Bonzini
2012-11-18 16:15                 ` Paolo Bonzini
2012-10-22  9:23 ` [Qemu-devel] [patch v4 09/16] memory: introduce mmio request pending to anti nested DMA Liu Ping Fan
2012-10-22 10:28   ` Avi Kivity
2012-10-23 12:38   ` Gleb Natapov
2012-10-24  6:31     ` liu ping fan
2012-10-22  9:23 ` [Qemu-devel] [patch v4 10/16] memory: introduce lock ops for MemoryRegionOps Liu Ping Fan
2012-10-22 10:30   ` Avi Kivity
2012-10-23  5:53     ` liu ping fan
2012-10-23  8:53       ` Jan Kiszka
2012-10-22  9:23 ` [Qemu-devel] [patch v4 11/16] vcpu: push mmio dispatcher out of big lock Liu Ping Fan
2012-10-22 10:31   ` Avi Kivity
2012-10-22 10:36     ` Jan Kiszka
2012-10-22  9:23 ` [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000 Liu Ping Fan
2012-10-22 10:37   ` Avi Kivity
2012-10-23  9:04   ` Jan Kiszka
2012-10-24  6:31     ` liu ping fan
2012-10-24  7:17       ` Jan Kiszka
2012-10-25  9:01         ` Avi Kivity
2012-10-25  9:31           ` Jan Kiszka
2012-10-25 16:21             ` Avi Kivity
2012-10-25 16:39               ` Jan Kiszka
2012-10-25 17:02                 ` Avi Kivity
2012-10-25 18:48                   ` Jan Kiszka
2012-10-29  5:24                     ` liu ping fan
2012-10-24  7:29     ` liu ping fan
2012-10-25 13:34       ` Jan Kiszka
2012-10-25 16:23         ` Avi Kivity
2012-10-25 16:41           ` Jan Kiszka
2012-10-25 17:03             ` Avi Kivity
2012-10-29  5:24         ` liu ping fan
2012-10-31  7:03           ` Jan Kiszka
2012-10-22  9:23 ` [Qemu-devel] [patch v4 13/16] e1000: add busy flag to anti broken device state Liu Ping Fan
2012-10-22 10:40   ` Avi Kivity
2012-10-23  5:52     ` liu ping fan
2012-10-23  9:06       ` Avi Kivity
2012-10-23  9:07       ` Jan Kiszka
2012-10-23  9:32         ` liu ping fan
2012-10-23  9:37           ` Avi Kivity
2012-10-24  6:36             ` liu ping fan
2012-10-25  8:55               ` Avi Kivity
2012-10-25  9:00             ` Peter Maydell
2012-10-25  9:04               ` Avi Kivity
2012-10-26  3:05                 ` liu ping fan
2012-10-26  3:08                   ` liu ping fan
2012-10-26 10:25                     ` Jan Kiszka
2012-10-29  5:24                       ` liu ping fan
2012-10-29  7:50                         ` Peter Maydell
2012-10-22  9:23 ` [Qemu-devel] [patch v4 14/16] qdev: introduce stopping state Liu Ping Fan
2012-10-22  9:23 ` [Qemu-devel] [patch v4 15/16] e1000: introduce unmap() to fix unplug issue Liu Ping Fan
2012-10-22  9:23 ` [Qemu-devel] [patch v4 16/16] e1000: implement MemoryRegionOps's ref&lock interface Liu Ping Fan
2012-10-25 14:04 ` [Qemu-devel] [patch v4 00/16] push mmio dispatch out of big lock Peter Maydell
2012-10-25 16:44   ` Jan Kiszka
2012-10-25 17:07   ` Avi Kivity
2012-10-25 17:13     ` Peter Maydell
2012-10-25 18:13       ` Marcelo Tosatti
2012-10-25 19:00         ` Jan Kiszka
2012-10-25 19:06           ` Peter Maydell
2012-10-29 15:24       ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.