[Qemu-devel] [PATCH for-2.4 0/9] KVM: Do I/O outside BQL whenever possible

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH for-2.4 0/9] KVM: Do I/O outside BQL whenever possible
@ 2015-06-18 16:47 Paolo Bonzini
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 1/9] main-loop: use qemu_mutex_lock_iothread consistently Paolo Bonzini
                   ` (8 more replies)
  0 siblings, 9 replies; 28+ messages in thread
From: Paolo Bonzini @ 2015-06-18 16:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: jan.kiszka

This is the rebased and updated version of the patches I posted a
couple months ago (well before soft freeze :)).

This version introduces a qemu_mutex_iothread_locked() primitive
in order to avoid recursive locking of the BQL.  The previous
attempts, which used functions such as address_space_rw_unlocked,
required the introduction of a multitude of *_unlocked functions
(e.g. address_space_ldl_unlocked or dma_buf_write_unlocked).

Note that adding unlocked access to TCG would require reverting
commit 3b64349 (memory: Replace io_mem_read/write with
memory_region_dispatch_read/write, 2015-04-26).

Paolo

Jan Kiszka (4):
  memory: Add global-locking property to memory regions
  memory: let address_space_rw/ld*/st* run outside the BQL
  kvm: First step to push iothread lock out of inner run loop
  kvm: Switch to unlocked PIO

Paolo Bonzini (5):
  main-loop: use qemu_mutex_lock_iothread consistently and simplify it
  main-loop: introduce qemu_mutex_iothread_locked
  exec: pull qemu_flush_coalesced_mmio_buffer() into
    address_space_rw/ld*/st*
  acpi: mark PMTIMER as unlocked
  kvm: Switch to unlocked MMIO

 cpus.c                   | 22 +++++++++++----
 exec.c                   | 69 ++++++++++++++++++++++++++++++++++++++++++++++++
 hw/acpi/core.c           |  1 +
 include/exec/memory.h    | 26 ++++++++++++++++++
 include/qemu/main-loop.h | 10 +++++++
 kvm-all.c                | 10 +++++--
 memory.c                 | 17 +++++++-----
 stubs/iothread-lock.c    |  5 ++++
 target-i386/kvm.c        | 18 +++++++++++++
 target-mips/kvm.c        |  4 +++
 target-ppc/kvm.c         |  4 +++
 11 files changed, 173 insertions(+), 13 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 1/9] main-loop: use qemu_mutex_lock_iothread consistently
  2015-06-18 16:47 [Qemu-devel] [PATCH for-2.4 0/9] KVM: Do I/O outside BQL whenever possible Paolo Bonzini
@ 2015-06-18 16:47 ` Paolo Bonzini
  2015-06-23 13:49   ` Frederic Konrad
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 2/9] main-loop: introduce qemu_mutex_iothread_locked Paolo Bonzini
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2015-06-18 16:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: jan.kiszka

The next patch will require the BQL to be always taken with
qemu_mutex_lock_iothread(), while right now this isn't the case.

Outside TCG mode this is not a problem.  In TCG mode, we need to be
careful and avoid the "prod out of compiled code" step if already
in a VCPU thread.  This is easily done with a check on current_cpu,
i.e. qemu_in_vcpu_thread().

Hopefully, multithreaded TCG will get rid of the whole logic to kick
VCPUs whenever an I/O event occurs!

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpus.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/cpus.c b/cpus.c
index de6469f..2e807f9 100644
--- a/cpus.c
+++ b/cpus.c
@@ -924,7 +924,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
     CPUState *cpu = arg;
     int r;
 
-    qemu_mutex_lock(&qemu_global_mutex);
+    qemu_mutex_lock_iothread();
     qemu_thread_get_self(cpu->thread);
     cpu->thread_id = qemu_get_thread_id();
     cpu->can_do_io = 1;
@@ -1004,10 +1004,10 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 {
     CPUState *cpu = arg;
 
+    qemu_mutex_lock_iothread();
     qemu_tcg_init_cpu_signals();
     qemu_thread_get_self(cpu->thread);
 
-    qemu_mutex_lock(&qemu_global_mutex);
     CPU_FOREACH(cpu) {
         cpu->thread_id = qemu_get_thread_id();
         cpu->created = true;
@@ -1118,7 +1118,11 @@ bool qemu_in_vcpu_thread(void)
 
 void qemu_mutex_lock_iothread(void)
 {
     atomic_inc(&iothread_requesting_mutex);
-    if (!tcg_enabled() || !first_cpu || !first_cpu->thread) {
+    /* In the simple case there is no need to bump the VCPU thread out of
+     * TCG code execution.
+     */
+    if (!tcg_enabled() || qemu_in_vcpu_thread() ||
+        !first_cpu || !first_cpu->thread) {
         qemu_mutex_lock(&qemu_global_mutex);
         atomic_dec(&iothread_requesting_mutex);
     } else {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 2/9] main-loop: introduce qemu_mutex_iothread_locked
  2015-06-18 16:47 [Qemu-devel] [PATCH for-2.4 0/9] KVM: Do I/O outside BQL whenever possible Paolo Bonzini
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 1/9] main-loop: use qemu_mutex_lock_iothread consistently Paolo Bonzini
@ 2015-06-18 16:47 ` Paolo Bonzini
  2015-06-23  8:48   ` Fam Zheng
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 3/9] memory: Add global-locking property to memory regions Paolo Bonzini
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2015-06-18 16:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: jan.kiszka

This function will be used to avoid recursive locking of the iothread lock
whenever address_space_rw/ld*/st* are called with the BQL held, which is
almost always the case.

Tracking whether the iothread is owned is very cheap (just use a TLS
variable) but requires some care because now the lock must always be
taken with qemu_mutex_lock_iothread().  Previously this wasn't the case.
Outside TCG mode this is not a problem.  In TCG mode, we need to be
careful and avoid the "prod out of compiled code" step if already
in a VCPU thread.  This is easily done with a check on current_cpu,
i.e. qemu_in_vcpu_thread().

Hopefully, multithreaded TCG will get rid of the whole logic to kick
VCPUs whenever an I/O event occurs!

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpus.c                   |  9 +++++++++
 include/qemu/main-loop.h | 10 ++++++++++
 stubs/iothread-lock.c    |  5 +++++
 3 files changed, 24 insertions(+)

diff --git a/cpus.c b/cpus.c
index 2e807f9..9531d03 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1116,6 +1116,13 @@ bool qemu_in_vcpu_thread(void)
     return current_cpu && qemu_cpu_is_self(current_cpu);
 }
 
+static __thread bool iothread_locked = false;
+
+bool qemu_mutex_iothread_locked(void)
+{
+    return iothread_locked;
+}
+
 void qemu_mutex_lock_iothread(void)
 {
     atomic_inc(&iothread_requesting_mutex);
@@ -1133,10 +1140,12 @@ void qemu_mutex_lock_iothread(void)
         atomic_dec(&iothread_requesting_mutex);
         qemu_cond_broadcast(&qemu_io_proceeded_cond);
     }
+    iothread_locked = true;
 }
 
 void qemu_mutex_unlock_iothread(void)
 {
+    iothread_locked = false;
     qemu_mutex_unlock(&qemu_global_mutex);
 }
 
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index 62c68c0..6b74eb9 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -270,6 +270,16 @@ int qemu_add_child_watch(pid_t pid);
 #endif
 
 /**
+ * qemu_mutex_iothread_locked: Return lock status of the main loop mutex.
+ *
+ * The main loop mutex is the coarsest lock in QEMU, and as such it
+ * must always be taken outside other locks.  This function helps
+ * functions take different paths depending on whether the current
+ * thread is running within the main loop mutex.
+ */
+bool qemu_mutex_iothread_locked(void);
+
+/**
  * qemu_mutex_lock_iothread: Lock the main loop mutex.
  *
  * This function locks the main loop mutex.  The mutex is taken by
diff --git a/stubs/iothread-lock.c b/stubs/iothread-lock.c
index 5d8aca1..dda6f6b 100644
--- a/stubs/iothread-lock.c
+++ b/stubs/iothread-lock.c
@@ -1,6 +1,11 @@
 #include "qemu-common.h"
 #include "qemu/main-loop.h"
 
+bool qemu_mutex_iothread_locked(void)
+{
+    return true;
+}
+
 void qemu_mutex_lock_iothread(void)
 {
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 3/9] memory: Add global-locking property to memory regions
  2015-06-18 16:47 [Qemu-devel] [PATCH for-2.4 0/9] KVM: Do I/O outside BQL whenever possible Paolo Bonzini
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 1/9] main-loop: use qemu_mutex_lock_iothread consistently Paolo Bonzini
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 2/9] main-loop: introduce qemu_mutex_iothread_locked Paolo Bonzini
@ 2015-06-18 16:47 ` Paolo Bonzini
  2015-06-23  8:51   ` Fam Zheng
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 4/9] exec: pull qemu_flush_coalesced_mmio_buffer() into address_space_rw/ld*/st* Paolo Bonzini
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2015-06-18 16:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: jan.kiszka

From: Jan Kiszka <jan.kiszka@siemens.com>

This introduces the memory region property "global_locking". It is true
by default. By setting it to false, a device model can request BQL-free
dispatching of region accesses to its r/w handlers. The actual BQL
break-up will be provided in a separate patch.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/exec/memory.h | 26 ++++++++++++++++++++++++++
 memory.c              | 11 +++++++++++
 2 files changed, 37 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index b61c84f..61791f8 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -180,6 +180,7 @@ struct MemoryRegion {
     bool rom_device;
     bool warning_printed; /* For reservations */
     bool flush_coalesced_mmio;
+    bool global_locking;
     MemoryRegion *alias;
     hwaddr alias_offset;
     int32_t priority;
@@ -812,6 +813,31 @@ void memory_region_set_flush_coalesced(MemoryRegion *mr);
 void memory_region_clear_flush_coalesced(MemoryRegion *mr);
 
 /**
+ * memory_region_set_global_locking: Declares the access processing requires
+ *                                   QEMU's global lock.
+ *
+ * When this is invoked, access to this memory regions will be processed while
+ * holding the global lock of QEMU. This is the default behavior of memory
+ * regions.
+ *
+ * @mr: the memory region to be updated.
+ */
+void memory_region_set_global_locking(MemoryRegion *mr);
+
+/**
+ * memory_region_clear_global_locking: Declares that access processing does
+ *                                     not depend on the QEMU global lock.
+ *
+ * By clearing this property, accesses to the memory region will be processed
+ * outside of QEMU's global lock (unless the lock is held on when issuing the
+ * access request). In this case, the device model implementing the access
+ * handlers is responsible for synchronization of concurrency.
+ *
+ * @mr: the memory region to be updated.
+ */
+void memory_region_clear_global_locking(MemoryRegion *mr);
+
+/**
  * memory_region_add_eventfd: Request an eventfd to be triggered when a word
  *                            is written to a location.
  *
diff --git a/memory.c b/memory.c
index 03c536b..6b77354 100644
--- a/memory.c
+++ b/memory.c
@@ -1004,6 +1004,7 @@ static void memory_region_initfn(Object *obj)
     mr->ops = &unassigned_mem_ops;
     mr->enabled = true;
     mr->romd_mode = true;
+    mr->global_locking = true;
     mr->destructor = memory_region_destructor_none;
     QTAILQ_INIT(&mr->subregions);
     QTAILQ_INIT(&mr->coalesced);
@@ -1627,6 +1628,16 @@ void memory_region_clear_flush_coalesced(MemoryRegion *mr)
     }
 }
 
+void memory_region_set_global_locking(MemoryRegion *mr)
+{
+    mr->global_locking = true;
+}
+
+void memory_region_clear_global_locking(MemoryRegion *mr)
+{
+    mr->global_locking = false;
+}
+
 void memory_region_add_eventfd(MemoryRegion *mr,
                                hwaddr addr,
                                unsigned size,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 4/9] exec: pull qemu_flush_coalesced_mmio_buffer() into address_space_rw/ld*/st*
  2015-06-18 16:47 [Qemu-devel] [PATCH for-2.4 0/9] KVM: Do I/O outside BQL whenever possible Paolo Bonzini
                   ` (2 preceding siblings ...)
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 3/9] memory: Add global-locking property to memory regions Paolo Bonzini
@ 2015-06-18 16:47 ` Paolo Bonzini
  2015-06-23  9:05   ` Fam Zheng
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 5/9] memory: let address_space_rw/ld*/st* run outside the BQL Paolo Bonzini
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2015-06-18 16:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: jan.kiszka

As memory_region_read/write_accessor will now be run also without BQL held,
we need to move coalesced MMIO flushing earlier in the dispatch process.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 exec.c   | 21 +++++++++++++++++++++
 memory.c |  6 ------
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index e19ab22..094f87e 100644
--- a/exec.c
+++ b/exec.c
@@ -2318,6 +2318,13 @@ static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
     return l;
 }
 
+static void prepare_mmio_access(MemoryRegion *mr)
+{
+    if (mr->flush_coalesced_mmio) {
+        qemu_flush_coalesced_mmio_buffer();
+    }
+}
+
 MemTxResult address_space_rw(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
                              uint8_t *buf, int len, bool is_write)
 {
@@ -2335,6 +2342,7 @@ MemTxResult address_space_rw(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
 
         if (is_write) {
             if (!memory_access_is_direct(mr, is_write)) {
+                prepare_mmio_access(mr);
                 l = memory_access_size(mr, l, addr1);
                 /* XXX: could force current_cpu to NULL to avoid
                    potential bugs */
@@ -2376,6 +2384,7 @@ MemTxResult address_space_rw(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
         } else {
             if (!memory_access_is_direct(mr, is_write)) {
                 /* I/O case */
+                prepare_mmio_access(mr);
                 l = memory_access_size(mr, l, addr1);
                 switch (l) {
                 case 8:
@@ -2741,6 +2750,8 @@ static inline uint32_t address_space_ldl_internal(AddressSpace *as, hwaddr addr,
     rcu_read_lock();
     mr = address_space_translate(as, addr, &addr1, &l, false);
     if (l < 4 || !memory_access_is_direct(mr, false)) {
+        prepare_mmio_access(mr);
+
         /* I/O case */
         r = memory_region_dispatch_read(mr, addr1, &val, 4, attrs);
 #if defined(TARGET_WORDS_BIGENDIAN)
@@ -2830,6 +2841,8 @@ static inline uint64_t address_space_ldq_internal(AddressSpace *as, hwaddr addr,
     mr = address_space_translate(as, addr, &addr1, &l,
                                  false);
     if (l < 8 || !memory_access_is_direct(mr, false)) {
+        prepare_mmio_access(mr);
+
         /* I/O case */
         r = memory_region_dispatch_read(mr, addr1, &val, 8, attrs);
 #if defined(TARGET_WORDS_BIGENDIAN)
@@ -2939,6 +2952,8 @@ static inline uint32_t address_space_lduw_internal(AddressSpace *as,
     mr = address_space_translate(as, addr, &addr1, &l,
                                  false);
     if (l < 2 || !memory_access_is_direct(mr, false)) {
+        prepare_mmio_access(mr);
+
         /* I/O case */
         r = memory_region_dispatch_read(mr, addr1, &val, 2, attrs);
 #if defined(TARGET_WORDS_BIGENDIAN)
@@ -3027,6 +3042,8 @@ void address_space_stl_notdirty(AddressSpace *as, hwaddr addr, uint32_t val,
     mr = address_space_translate(as, addr, &addr1, &l,
                                  true);
     if (l < 4 || !memory_access_is_direct(mr, true)) {
+        prepare_mmio_access(mr);
+
         r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
     } else {
         addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
@@ -3071,6 +3088,8 @@ static inline void address_space_stl_internal(AddressSpace *as,
     mr = address_space_translate(as, addr, &addr1, &l,
                                  true);
     if (l < 4 || !memory_access_is_direct(mr, true)) {
+        prepare_mmio_access(mr);
+
 #if defined(TARGET_WORDS_BIGENDIAN)
         if (endian == DEVICE_LITTLE_ENDIAN) {
             val = bswap32(val);
@@ -3175,6 +3194,8 @@ static inline void address_space_stw_internal(AddressSpace *as,
     rcu_read_lock();
     mr = address_space_translate(as, addr, &addr1, &l, true);
     if (l < 2 || !memory_access_is_direct(mr, true)) {
+        prepare_mmio_access(mr);
+
 #if defined(TARGET_WORDS_BIGENDIAN)
         if (endian == DEVICE_LITTLE_ENDIAN) {
             val = bswap16(val);
diff --git a/memory.c b/memory.c
index 6b77354..be385d4 100644
--- a/memory.c
+++ b/memory.c
@@ -414,9 +414,6 @@ static MemTxResult memory_region_read_with_attrs_accessor(MemoryRegion *mr,
     uint64_t tmp = 0;
     MemTxResult r;
 
-    if (mr->flush_coalesced_mmio) {
-        qemu_flush_coalesced_mmio_buffer();
-    }
     r = mr->ops->read_with_attrs(mr->opaque, addr, &tmp, size, attrs);
     trace_memory_region_ops_read(mr, addr, tmp, size);
     *value |= (tmp & mask) << shift;
@@ -449,9 +446,6 @@ static MemTxResult memory_region_write_accessor(MemoryRegion *mr,
 {
     uint64_t tmp;
 
-    if (mr->flush_coalesced_mmio) {
-        qemu_flush_coalesced_mmio_buffer();
-    }
     tmp = (*value >> shift) & mask;
     trace_memory_region_ops_write(mr, addr, tmp, size);
     mr->ops->write(mr->opaque, addr, tmp, size);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 5/9] memory: let address_space_rw/ld*/st* run outside the BQL
  2015-06-18 16:47 [Qemu-devel] [PATCH for-2.4 0/9] KVM: Do I/O outside BQL whenever possible Paolo Bonzini
                   ` (3 preceding siblings ...)
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 4/9] exec: pull qemu_flush_coalesced_mmio_buffer() into address_space_rw/ld*/st* Paolo Bonzini
@ 2015-06-18 16:47 ` Paolo Bonzini
  2015-06-24 16:56   ` Alex Bennée
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 6/9] kvm: First step to push iothread lock out of inner run loop Paolo Bonzini
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2015-06-18 16:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: jan.kiszka

From: Jan Kiszka <jan.kiszka@siemens.com>

The MMIO case is further broken up in two cases: if the caller does not
hold the BQL on invocation, the unlocked one takes or avoids BQL depending
on the locking strategy of the target memory region and its coalesced
MMIO handling.  In this case, the caller should not hold _any_ lock
(a friendly suggestion which is disregarded by virtio-scsi-dataplane).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 exec.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 57 insertions(+), 9 deletions(-)

diff --git a/exec.c b/exec.c
index 094f87e..78c99f6 100644
--- a/exec.c
+++ b/exec.c
@@ -48,6 +48,7 @@
 #endif
 #include "exec/cpu-all.h"
 #include "qemu/rcu_queue.h"
+#include "qemu/main-loop.h"
 #include "exec/cputlb.h"
 #include "translate-all.h"
 
@@ -2318,11 +2319,27 @@ static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
     return l;
 }
 
-static void prepare_mmio_access(MemoryRegion *mr)
+static bool prepare_mmio_access(MemoryRegion *mr)
 {
+    bool unlocked = !qemu_mutex_iothread_locked();
+    bool release_lock = false;
+
+    if (unlocked && mr->global_locking) {
+        qemu_mutex_lock_iothread();
+        unlocked = false;
+        release_lock = true;
+    }
     if (mr->flush_coalesced_mmio) {
+        if (unlocked) {
+            qemu_mutex_lock_iothread();
+        }
         qemu_flush_coalesced_mmio_buffer();
+        if (unlocked) {
+            qemu_mutex_unlock_iothread();
+        }
     }
+
+    return release_lock;
 }
 
 MemTxResult address_space_rw(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
@@ -2334,6 +2351,7 @@ MemTxResult address_space_rw(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
     hwaddr addr1;
     MemoryRegion *mr;
     MemTxResult result = MEMTX_OK;
+    bool release_lock = false;
 
     rcu_read_lock();
     while (len > 0) {
@@ -2342,7 +2360,7 @@ MemTxResult address_space_rw(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
 
         if (is_write) {
             if (!memory_access_is_direct(mr, is_write)) {
-                prepare_mmio_access(mr);
+                release_lock |= prepare_mmio_access(mr);
                 l = memory_access_size(mr, l, addr1);
                 /* XXX: could force current_cpu to NULL to avoid
                    potential bugs */
@@ -2384,7 +2402,7 @@ MemTxResult address_space_rw(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
         } else {
             if (!memory_access_is_direct(mr, is_write)) {
                 /* I/O case */
-                prepare_mmio_access(mr);
+                release_lock |= prepare_mmio_access(mr);
                 l = memory_access_size(mr, l, addr1);
                 switch (l) {
                 case 8:
@@ -2420,6 +2438,12 @@ MemTxResult address_space_rw(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
                 memcpy(buf, ptr, l);
             }
         }
+
+        if (release_lock) {
+            qemu_mutex_unlock_iothread();
+            release_lock = false;
+        }
+
         len -= l;
         buf += l;
         addr += l;
@@ -2746,11 +2770,12 @@ static inline uint32_t address_space_ldl_internal(AddressSpace *as, hwaddr addr,
     hwaddr l = 4;
     hwaddr addr1;
     MemTxResult r;
+    bool release_lock = false;
 
     rcu_read_lock();
     mr = address_space_translate(as, addr, &addr1, &l, false);
     if (l < 4 || !memory_access_is_direct(mr, false)) {
-        prepare_mmio_access(mr);
+        release_lock |= prepare_mmio_access(mr);
 
         /* I/O case */
         r = memory_region_dispatch_read(mr, addr1, &val, 4, attrs);
@@ -2784,6 +2809,9 @@ static inline uint32_t address_space_ldl_internal(AddressSpace *as, hwaddr addr,
     if (result) {
         *result = r;
     }
+    if (release_lock) {
+        qemu_mutex_unlock_iothread();
+    }
     rcu_read_unlock();
     return val;
 }
@@ -2836,12 +2864,13 @@ static inline uint64_t address_space_ldq_internal(AddressSpace *as, hwaddr addr,
     hwaddr l = 8;
     hwaddr addr1;
     MemTxResult r;
+    bool release_lock = false;
 
     rcu_read_lock();
     mr = address_space_translate(as, addr, &addr1, &l,
                                  false);
     if (l < 8 || !memory_access_is_direct(mr, false)) {
-        prepare_mmio_access(mr);
+        release_lock |= prepare_mmio_access(mr);
 
         /* I/O case */
         r = memory_region_dispatch_read(mr, addr1, &val, 8, attrs);
@@ -2875,6 +2904,9 @@ static inline uint64_t address_space_ldq_internal(AddressSpace *as, hwaddr addr,
     if (result) {
         *result = r;
     }
+    if (release_lock) {
+        qemu_mutex_unlock_iothread();
+    }
     rcu_read_unlock();
     return val;
 }
@@ -2947,12 +2979,13 @@ static inline uint32_t address_space_lduw_internal(AddressSpace *as,
     hwaddr l = 2;
     hwaddr addr1;
     MemTxResult r;
+    bool release_lock = false;
 
     rcu_read_lock();
     mr = address_space_translate(as, addr, &addr1, &l,
                                  false);
     if (l < 2 || !memory_access_is_direct(mr, false)) {
-        prepare_mmio_access(mr);
+        release_lock |= prepare_mmio_access(mr);
 
         /* I/O case */
         r = memory_region_dispatch_read(mr, addr1, &val, 2, attrs);
@@ -2986,6 +3019,9 @@ static inline uint32_t address_space_lduw_internal(AddressSpace *as,
     if (result) {
         *result = r;
     }
+    if (release_lock) {
+        qemu_mutex_unlock_iothread();
+    }
     rcu_read_unlock();
     return val;
 }
@@ -3037,12 +3073,13 @@ void address_space_stl_notdirty(AddressSpace *as, hwaddr addr, uint32_t val,
     hwaddr l = 4;
     hwaddr addr1;
     MemTxResult r;
+    bool release_lock = false;
 
     rcu_read_lock();
     mr = address_space_translate(as, addr, &addr1, &l,
                                  true);
     if (l < 4 || !memory_access_is_direct(mr, true)) {
-        prepare_mmio_access(mr);
+        release_lock |= prepare_mmio_access(mr);
 
         r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
     } else {
@@ -3063,6 +3100,9 @@ void address_space_stl_notdirty(AddressSpace *as, hwaddr addr, uint32_t val,
     if (result) {
         *result = r;
     }
+    if (release_lock) {
+        qemu_mutex_unlock_iothread();
+    }
     rcu_read_unlock();
 }
 
@@ -3083,12 +3123,13 @@ static inline void address_space_stl_internal(AddressSpace *as,
     hwaddr l = 4;
     hwaddr addr1;
     MemTxResult r;
+    bool release_lock = false;
 
     rcu_read_lock();
     mr = address_space_translate(as, addr, &addr1, &l,
                                  true);
     if (l < 4 || !memory_access_is_direct(mr, true)) {
-        prepare_mmio_access(mr);
+        release_lock |= prepare_mmio_access(mr);
 
 #if defined(TARGET_WORDS_BIGENDIAN)
         if (endian == DEVICE_LITTLE_ENDIAN) {
@@ -3121,6 +3162,9 @@ static inline void address_space_stl_internal(AddressSpace *as,
     if (result) {
         *result = r;
     }
+    if (release_lock) {
+        qemu_mutex_unlock_iothread();
+    }
     rcu_read_unlock();
 }
 
@@ -3190,11 +3234,12 @@ static inline void address_space_stw_internal(AddressSpace *as,
     hwaddr l = 2;
     hwaddr addr1;
     MemTxResult r;
+    bool release_lock = false;
 
     rcu_read_lock();
     mr = address_space_translate(as, addr, &addr1, &l, true);
     if (l < 2 || !memory_access_is_direct(mr, true)) {
-        prepare_mmio_access(mr);
+        release_lock |= prepare_mmio_access(mr);
 
 #if defined(TARGET_WORDS_BIGENDIAN)
         if (endian == DEVICE_LITTLE_ENDIAN) {
@@ -3227,6 +3272,9 @@ static inline void address_space_stw_internal(AddressSpace *as,
     if (result) {
         *result = r;
     }
+    if (release_lock) {
+        qemu_mutex_unlock_iothread();
+    }
     rcu_read_unlock();
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 6/9] kvm: First step to push iothread lock out of inner run loop
  2015-06-18 16:47 [Qemu-devel] [PATCH for-2.4 0/9] KVM: Do I/O outside BQL whenever possible Paolo Bonzini
                   ` (4 preceding siblings ...)
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 5/9] memory: let address_space_rw/ld*/st* run outside the BQL Paolo Bonzini
@ 2015-06-18 16:47 ` Paolo Bonzini
  2015-06-18 18:19   ` Christian Borntraeger
  2015-06-23  9:26   ` Fam Zheng
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 7/9] kvm: Switch to unlocked PIO Paolo Bonzini
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 28+ messages in thread
From: Paolo Bonzini @ 2015-06-18 16:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: jan.kiszka

From: Jan Kiszka <jan.kiszka@siemens.com>

This opens the path to get rid of the iothread lock on vmexits in KVM
mode. On x86, the in-kernel irqchips has to be used because we otherwise
need to synchronize APIC and other per-cpu state accesses that could be
changed concurrently.

s390x and ARM should be fine without specific locking as their
pre/post-run callbacks are empty. MIPS and POWER require locking for
the pre-run callback.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 kvm-all.c         | 14 ++++++++++++--
 target-i386/kvm.c | 18 ++++++++++++++++++
 target-mips/kvm.c |  4 ++++
 target-ppc/kvm.c  |  4 ++++
 4 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index b2b1bc3..2bd8e9b 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1795,6 +1795,8 @@ int kvm_cpu_exec(CPUState *cpu)
         return EXCP_HLT;
     }
 
+    qemu_mutex_unlock_iothread();
+
     do {
         MemTxAttrs attrs;
 
@@ -1813,11 +1815,9 @@ int kvm_cpu_exec(CPUState *cpu)
              */
             qemu_cpu_kick_self();
         }
-        qemu_mutex_unlock_iothread();
 
         run_ret = kvm_vcpu_ioctl(cpu, KVM_RUN, 0);
 
-        qemu_mutex_lock_iothread();
         attrs = kvm_arch_post_run(cpu, run);
 
         if (run_ret < 0) {
@@ -1836,20 +1836,24 @@ int kvm_cpu_exec(CPUState *cpu)
         switch (run->exit_reason) {
         case KVM_EXIT_IO:
             DPRINTF("handle_io\n");
+            qemu_mutex_lock_iothread();
             kvm_handle_io(run->io.port, attrs,
                           (uint8_t *)run + run->io.data_offset,
                           run->io.direction,
                           run->io.size,
                           run->io.count);
+            qemu_mutex_unlock_iothread();
             ret = 0;
             break;
         case KVM_EXIT_MMIO:
             DPRINTF("handle_mmio\n");
+            qemu_mutex_lock_iothread();
             address_space_rw(&address_space_memory,
                              run->mmio.phys_addr, attrs,
                              run->mmio.data,
                              run->mmio.len,
                              run->mmio.is_write);
+            qemu_mutex_unlock_iothread();
             ret = 0;
             break;
         case KVM_EXIT_IRQ_WINDOW_OPEN:
@@ -1858,7 +1862,9 @@ int kvm_cpu_exec(CPUState *cpu)
             break;
         case KVM_EXIT_SHUTDOWN:
             DPRINTF("shutdown\n");
+            qemu_mutex_lock_iothread();
             qemu_system_reset_request();
+            qemu_mutex_unlock_iothread();
             ret = EXCP_INTERRUPT;
             break;
         case KVM_EXIT_UNKNOWN:
@@ -1887,11 +1893,15 @@ int kvm_cpu_exec(CPUState *cpu)
             break;
         default:
             DPRINTF("kvm_arch_handle_exit\n");
+            qemu_mutex_lock_iothread();
             ret = kvm_arch_handle_exit(cpu, run);
+            qemu_mutex_unlock_iothread();
             break;
         }
     } while (ret == 0);
 
+    qemu_mutex_lock_iothread();
+
     if (ret < 0) {
         cpu_dump_state(cpu, stderr, fprintf, CPU_DUMP_CODE);
         vm_stop(RUN_STATE_INTERNAL_ERROR);
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index ca2da84..8c2a891 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -2192,7 +2192,10 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
 
     /* Inject NMI */
     if (cpu->interrupt_request & CPU_INTERRUPT_NMI) {
+        qemu_mutex_lock_iothread();
         cpu->interrupt_request &= ~CPU_INTERRUPT_NMI;
+        qemu_mutex_unlock_iothread();
+
         DPRINTF("injected NMI\n");
         ret = kvm_vcpu_ioctl(cpu, KVM_NMI);
         if (ret < 0) {
@@ -2201,6 +2204,10 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
         }
     }
 
+    if (!kvm_irqchip_in_kernel()) {
+        qemu_mutex_lock_iothread();
+    }
+
     /* Force the VCPU out of its inner loop to process any INIT requests
      * or (for userspace APIC, but it is cheap to combine the checks here)
      * pending TPR access reports.
@@ -2244,6 +2251,8 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
 
         DPRINTF("setting tpr\n");
         run->cr8 = cpu_get_apic_tpr(x86_cpu->apic_state);
+
+        qemu_mutex_unlock_iothread();
     }
 }
 
@@ -2257,8 +2266,17 @@ MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct kvm_run *run)
     } else {
         env->eflags &= ~IF_MASK;
     }
+
+    /* We need to protect the apic state against concurrent accesses from
+     * different threads in case the userspace irqchip is used. */
+    if (!kvm_irqchip_in_kernel()) {
+        qemu_mutex_lock_iothread();
+    }
     cpu_set_apic_tpr(x86_cpu->apic_state, run->cr8);
     cpu_set_apic_base(x86_cpu->apic_state, run->apic_base);
+    if (!kvm_irqchip_in_kernel()) {
+        qemu_mutex_unlock_iothread();
+    }
     return MEMTXATTRS_UNSPECIFIED;
 }
 
diff --git a/target-mips/kvm.c b/target-mips/kvm.c
index 948619f..7d2293d 100644
--- a/target-mips/kvm.c
+++ b/target-mips/kvm.c
@@ -99,6 +99,8 @@ void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
     int r;
     struct kvm_mips_interrupt intr;
 
+    qemu_mutex_lock_iothread();
+
     if ((cs->interrupt_request & CPU_INTERRUPT_HARD) &&
             cpu_mips_io_interrupts_pending(cpu)) {
         intr.cpu = -1;
@@ -109,6 +111,8 @@ void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
                          __func__, cs->cpu_index, intr.irq);
         }
     }
+
+    qemu_mutex_unlock_iothread();
 }
 
 MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index afb4696..1fa1529 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -1242,6 +1242,8 @@ void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
     int r;
     unsigned irq;
 
+    qemu_mutex_lock_iothread();
+
     /* PowerPC QEMU tracks the various core input pins (interrupt, critical
      * interrupt, reset, etc) in PPC-specific env->irq_input_state. */
     if (!cap_interrupt_level &&
@@ -1269,6 +1271,8 @@ void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
     /* We don't know if there are more interrupts pending after this. However,
      * the guest will return to userspace in the course of handling this one
      * anyways, so we will get a chance to deliver the rest. */
+
+    qemu_mutex_unlock_iothread();
 }
 
 MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 7/9] kvm: Switch to unlocked PIO
  2015-06-18 16:47 [Qemu-devel] [PATCH for-2.4 0/9] KVM: Do I/O outside BQL whenever possible Paolo Bonzini
                   ` (5 preceding siblings ...)
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 6/9] kvm: First step to push iothread lock out of inner run loop Paolo Bonzini
@ 2015-06-18 16:47 ` Paolo Bonzini
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 8/9] acpi: mark PMTIMER as unlocked Paolo Bonzini
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 9/9] kvm: Switch to unlocked MMIO Paolo Bonzini
  8 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2015-06-18 16:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: jan.kiszka

From: Jan Kiszka <jan.kiszka@siemens.com>

Do not take the BQL before dispatching PIO requests of KVM VCPUs.
Instead, address_space_rw will do it if necessary. This enables
completely BQL-free PIO handling in KVM mode for upcoming devices with
fine-grained locking.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 kvm-all.c | 3 +--
 1 file changed, 3 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 2bd8e9b..d3831c4 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1836,13 +1836,12 @@ int kvm_cpu_exec(CPUState *cpu)
         switch (run->exit_reason) {
         case KVM_EXIT_IO:
             DPRINTF("handle_io\n");
-            qemu_mutex_lock_iothread();
+            /* Called outside BQL */
             kvm_handle_io(run->io.port, attrs,
                           (uint8_t *)run + run->io.data_offset,
                           run->io.direction,
                           run->io.size,
                           run->io.count);
-            qemu_mutex_unlock_iothread();
             ret = 0;
             break;
         case KVM_EXIT_MMIO:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 8/9] acpi: mark PMTIMER as unlocked
  2015-06-18 16:47 [Qemu-devel] [PATCH for-2.4 0/9] KVM: Do I/O outside BQL whenever possible Paolo Bonzini
                   ` (6 preceding siblings ...)
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 7/9] kvm: Switch to unlocked PIO Paolo Bonzini
@ 2015-06-18 16:47 ` Paolo Bonzini
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 9/9] kvm: Switch to unlocked MMIO Paolo Bonzini
  8 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2015-06-18 16:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: jan.kiszka

Accessing QEMU_CLOCK_VIRTUAL is thread-safe.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/acpi/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/acpi/core.c b/hw/acpi/core.c
index 0f201d8..fe6215a 100644
--- a/hw/acpi/core.c
+++ b/hw/acpi/core.c
@@ -528,6 +528,7 @@ void acpi_pm_tmr_init(ACPIREGS *ar, acpi_update_sci_fn update_sci,
     ar->tmr.timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, acpi_pm_tmr_timer, ar);
     memory_region_init_io(&ar->tmr.io, memory_region_owner(parent),
                           &acpi_pm_tmr_ops, ar, "acpi-tmr", 4);
+    memory_region_clear_global_locking(&ar->tmr.io);
     memory_region_add_subregion(parent, 8, &ar->tmr.io);
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 9/9] kvm: Switch to unlocked MMIO
  2015-06-18 16:47 [Qemu-devel] [PATCH for-2.4 0/9] KVM: Do I/O outside BQL whenever possible Paolo Bonzini
                   ` (7 preceding siblings ...)
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 8/9] acpi: mark PMTIMER as unlocked Paolo Bonzini
@ 2015-06-18 16:47 ` Paolo Bonzini
  8 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2015-06-18 16:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: jan.kiszka

Do not take the BQL before dispatching MMIO requests of KVM VCPUs.
Instead, address_space_rw will do it if necessary. This enables completely
BQL-free MMIO handling in KVM mode for upcoming devices with fine-grained
locking.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 kvm-all.c | 3 +--
 1 file changed, 3 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index d3831c4..87b00b8 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1845,13 +1845,12 @@ int kvm_cpu_exec(CPUState *cpu)
             break;
         case KVM_EXIT_MMIO:
             DPRINTF("handle_mmio\n");
-            qemu_mutex_lock_iothread();
+            /* Called outside BQL */
             address_space_rw(&address_space_memory,
                              run->mmio.phys_addr, attrs,
                              run->mmio.data,
                              run->mmio.len,
                              run->mmio.is_write);
-            qemu_mutex_unlock_iothread();
             ret = 0;
             break;
         case KVM_EXIT_IRQ_WINDOW_OPEN:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 6/9] kvm: First step to push iothread lock out of inner run loop
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 6/9] kvm: First step to push iothread lock out of inner run loop Paolo Bonzini
@ 2015-06-18 18:19   ` Christian Borntraeger
  2015-06-23  9:26   ` Fam Zheng
  1 sibling, 0 replies; 28+ messages in thread
From: Christian Borntraeger @ 2015-06-18 18:19 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: jan.kiszka

Am 18.06.2015 um 18:47 schrieb Paolo Bonzini:

> @@ -1887,11 +1893,15 @@ int kvm_cpu_exec(CPUState *cpu)
>              break;
>          default:
>              DPRINTF("kvm_arch_handle_exit\n");
> +            qemu_mutex_lock_iothread();
>              ret = kvm_arch_handle_exit(cpu, run);
> +            qemu_mutex_unlock_iothread();
>              break;
>          }
>      } while (ret == 0);
> 


The next logical step would be to do a push down. Get rid of
these two new lines and do it in every kvm_arch_handle_exit 
function. This would allow arch maintainers to do their part
of lock breaking. Can be an addon patch, though.

Christian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] main-loop: introduce qemu_mutex_iothread_locked
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 2/9] main-loop: introduce qemu_mutex_iothread_locked Paolo Bonzini
@ 2015-06-23  8:48   ` Fam Zheng
  0 siblings, 0 replies; 28+ messages in thread
From: Fam Zheng @ 2015-06-23  8:48 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jan.kiszka, qemu-devel

On Thu, 06/18 18:47, Paolo Bonzini wrote:
> This function will be used to avoid recursive locking of the iothread lock
> whenever address_space_rw/ld*/st* are called with the BQL held, which is
> almost always the case.
> 
> Tracking whether the iothread is owned is very cheap (just use a TLS
> variable) but requires some care because now the lock must always be
> taken with qemu_mutex_lock_iothread().  Previously this wasn't the case.
> Outside TCG mode this is not a problem.  In TCG mode, we need to be
> careful and avoid the "prod out of compiled code" step if already
> in a VCPU thread.  This is easily done with a check on current_cpu,
> i.e. qemu_in_vcpu_thread().
> 
> Hopefully, multithreaded TCG will get rid of the whole logic to kick
> VCPUs whenever an I/O event occurs!
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  cpus.c                   |  9 +++++++++
>  include/qemu/main-loop.h | 10 ++++++++++
>  stubs/iothread-lock.c    |  5 +++++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/cpus.c b/cpus.c
> index 2e807f9..9531d03 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1116,6 +1116,13 @@ bool qemu_in_vcpu_thread(void)
>      return current_cpu && qemu_cpu_is_self(current_cpu);
>  }
>  
> +static __thread bool iothread_locked = false;
> +
> +bool qemu_mutex_iothread_locked(void)
> +{
> +    return iothread_locked;
> +}
> +
>  void qemu_mutex_lock_iothread(void)
>  {
>      atomic_inc(&iothread_requesting_mutex);
> @@ -1133,10 +1140,12 @@ void qemu_mutex_lock_iothread(void)
>          atomic_dec(&iothread_requesting_mutex);
>          qemu_cond_broadcast(&qemu_io_proceeded_cond);
>      }
> +    iothread_locked = true;
>  }
>  
>  void qemu_mutex_unlock_iothread(void)
>  {
> +    iothread_locked = false;
>      qemu_mutex_unlock(&qemu_global_mutex);
>  }
>  
> diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
> index 62c68c0..6b74eb9 100644
> --- a/include/qemu/main-loop.h
> +++ b/include/qemu/main-loop.h
> @@ -270,6 +270,16 @@ int qemu_add_child_watch(pid_t pid);
>  #endif
>  
>  /**
> + * qemu_mutex_iothread_locked: Return lock status of the main loop mutex.
> + *
> + * The main loop mutex is the coarsest lock in QEMU, and as such it
> + * must always be taken outside other locks.  This function helps
> + * functions take different paths depending on whether the current
> + * thread is running within the main loop mutex.
> + */
> +bool qemu_mutex_iothread_locked(void);
> +
> +/**
>   * qemu_mutex_lock_iothread: Lock the main loop mutex.
>   *
>   * This function locks the main loop mutex.  The mutex is taken by
> diff --git a/stubs/iothread-lock.c b/stubs/iothread-lock.c
> index 5d8aca1..dda6f6b 100644
> --- a/stubs/iothread-lock.c
> +++ b/stubs/iothread-lock.c
> @@ -1,6 +1,11 @@
>  #include "qemu-common.h"
>  #include "qemu/main-loop.h"
>  
> +bool qemu_mutex_iothread_locked(void)
> +{
> +    return true;
> +}
> +
>  void qemu_mutex_lock_iothread(void)
>  {
>  }
> -- 
> 1.8.3.1
> 
> 
> 

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 3/9] memory: Add global-locking property to memory regions
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 3/9] memory: Add global-locking property to memory regions Paolo Bonzini
@ 2015-06-23  8:51   ` Fam Zheng
  0 siblings, 0 replies; 28+ messages in thread
From: Fam Zheng @ 2015-06-23  8:51 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jan.kiszka, qemu-devel

On Thu, 06/18 18:47, Paolo Bonzini wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> This introduces the memory region property "global_locking". It is true
> by default. By setting it to false, a device model can request BQL-free
> dispatching of region accesses to its r/w handlers. The actual BQL
> break-up will be provided in a separate patch.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  include/exec/memory.h | 26 ++++++++++++++++++++++++++
>  memory.c              | 11 +++++++++++
>  2 files changed, 37 insertions(+)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index b61c84f..61791f8 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -180,6 +180,7 @@ struct MemoryRegion {
>      bool rom_device;
>      bool warning_printed; /* For reservations */
>      bool flush_coalesced_mmio;
> +    bool global_locking;
>      MemoryRegion *alias;
>      hwaddr alias_offset;
>      int32_t priority;
> @@ -812,6 +813,31 @@ void memory_region_set_flush_coalesced(MemoryRegion *mr);
>  void memory_region_clear_flush_coalesced(MemoryRegion *mr);
>  
>  /**
> + * memory_region_set_global_locking: Declares the access processing requires
> + *                                   QEMU's global lock.
> + *
> + * When this is invoked, access to this memory regions will be processed while
> + * holding the global lock of QEMU. This is the default behavior of memory
> + * regions.
> + *
> + * @mr: the memory region to be updated.
> + */
> +void memory_region_set_global_locking(MemoryRegion *mr);
> +
> +/**
> + * memory_region_clear_global_locking: Declares that access processing does
> + *                                     not depend on the QEMU global lock.
> + *
> + * By clearing this property, accesses to the memory region will be processed
> + * outside of QEMU's global lock (unless the lock is held on when issuing the
> + * access request). In this case, the device model implementing the access
> + * handlers is responsible for synchronization of concurrency.
> + *
> + * @mr: the memory region to be updated.
> + */
> +void memory_region_clear_global_locking(MemoryRegion *mr);
> +
> +/**
>   * memory_region_add_eventfd: Request an eventfd to be triggered when a word
>   *                            is written to a location.
>   *
> diff --git a/memory.c b/memory.c
> index 03c536b..6b77354 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1004,6 +1004,7 @@ static void memory_region_initfn(Object *obj)
>      mr->ops = &unassigned_mem_ops;
>      mr->enabled = true;
>      mr->romd_mode = true;
> +    mr->global_locking = true;
>      mr->destructor = memory_region_destructor_none;
>      QTAILQ_INIT(&mr->subregions);
>      QTAILQ_INIT(&mr->coalesced);
> @@ -1627,6 +1628,16 @@ void memory_region_clear_flush_coalesced(MemoryRegion *mr)
>      }
>  }
>  
> +void memory_region_set_global_locking(MemoryRegion *mr)
> +{
> +    mr->global_locking = true;
> +}
> +
> +void memory_region_clear_global_locking(MemoryRegion *mr)
> +{
> +    mr->global_locking = false;
> +}
> +
>  void memory_region_add_eventfd(MemoryRegion *mr,
>                                 hwaddr addr,
>                                 unsigned size,
> -- 
> 1.8.3.1
> 
> 
> 

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] exec: pull qemu_flush_coalesced_mmio_buffer() into address_space_rw/ld*/st*
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 4/9] exec: pull qemu_flush_coalesced_mmio_buffer() into address_space_rw/ld*/st* Paolo Bonzini
@ 2015-06-23  9:05   ` Fam Zheng
  2015-06-23  9:12     ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Fam Zheng @ 2015-06-23  9:05 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jan.kiszka, qemu-devel

On Thu, 06/18 18:47, Paolo Bonzini wrote:

[snip]

> diff --git a/memory.c b/memory.c
> index 6b77354..be385d4 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -414,9 +414,6 @@ static MemTxResult memory_region_read_with_attrs_accessor(MemoryRegion *mr,
>      uint64_t tmp = 0;
>      MemTxResult r;
>  
> -    if (mr->flush_coalesced_mmio) {
> -        qemu_flush_coalesced_mmio_buffer();
> -    }
>      r = mr->ops->read_with_attrs(mr->opaque, addr, &tmp, size, attrs);
>      trace_memory_region_ops_read(mr, addr, tmp, size);
>      *value |= (tmp & mask) << shift;
> @@ -449,9 +446,6 @@ static MemTxResult memory_region_write_accessor(MemoryRegion *mr,
>  {
>      uint64_t tmp;
>  
> -    if (mr->flush_coalesced_mmio) {
> -        qemu_flush_coalesced_mmio_buffer();
> -    }
>      tmp = (*value >> shift) & mask;
>      trace_memory_region_ops_write(mr, addr, tmp, size);
>      mr->ops->write(mr->opaque, addr, tmp, size);

Why are memory_region_read_accessor and memory_region_write_with_attrs_accessor
unchanged?

Fam

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] exec: pull qemu_flush_coalesced_mmio_buffer() into address_space_rw/ld*/st*
  2015-06-23  9:05   ` Fam Zheng
@ 2015-06-23  9:12     ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2015-06-23  9:12 UTC (permalink / raw)
  To: Fam Zheng; +Cc: jan.kiszka, qemu-devel



On 23/06/2015 11:05, Fam Zheng wrote:
>> > diff --git a/memory.c b/memory.c
>> > index 6b77354..be385d4 100644
>> > --- a/memory.c
>> > +++ b/memory.c
>> > @@ -414,9 +414,6 @@ static MemTxResult memory_region_read_with_attrs_accessor(MemoryRegion *mr,
>> >      uint64_t tmp = 0;
>> >      MemTxResult r;
>> >  
>> > -    if (mr->flush_coalesced_mmio) {
>> > -        qemu_flush_coalesced_mmio_buffer();
>> > -    }
>> >      r = mr->ops->read_with_attrs(mr->opaque, addr, &tmp, size, attrs);
>> >      trace_memory_region_ops_read(mr, addr, tmp, size);
>> >      *value |= (tmp & mask) << shift;
>> > @@ -449,9 +446,6 @@ static MemTxResult memory_region_write_accessor(MemoryRegion *mr,
>> >  {
>> >      uint64_t tmp;
>> >  
>> > -    if (mr->flush_coalesced_mmio) {
>> > -        qemu_flush_coalesced_mmio_buffer();
>> > -    }
>> >      tmp = (*value >> shift) & mask;
>> >      trace_memory_region_ops_write(mr, addr, tmp, size);
>> >      mr->ops->write(mr->opaque, addr, tmp, size);
> Why are memory_region_read_accessor and memory_region_write_with_attrs_accessor
> unchanged?

Good catch. Botched the rebasing.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 6/9] kvm: First step to push iothread lock out of inner run loop
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 6/9] kvm: First step to push iothread lock out of inner run loop Paolo Bonzini
  2015-06-18 18:19   ` Christian Borntraeger
@ 2015-06-23  9:26   ` Fam Zheng
  2015-06-23  9:29     ` Paolo Bonzini
  1 sibling, 1 reply; 28+ messages in thread
From: Fam Zheng @ 2015-06-23  9:26 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jan.kiszka, qemu-devel

On Thu, 06/18 18:47, Paolo Bonzini wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> This opens the path to get rid of the iothread lock on vmexits in KVM
> mode. On x86, the in-kernel irqchips has to be used because we otherwise
> need to synchronize APIC and other per-cpu state accesses that could be
> changed concurrently.
> 
> s390x and ARM should be fine without specific locking as their
> pre/post-run callbacks are empty. MIPS and POWER require locking for
> the pre-run callback.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  kvm-all.c         | 14 ++++++++++++--
>  target-i386/kvm.c | 18 ++++++++++++++++++
>  target-mips/kvm.c |  4 ++++
>  target-ppc/kvm.c  |  4 ++++
>  4 files changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/kvm-all.c b/kvm-all.c
> index b2b1bc3..2bd8e9b 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -1795,6 +1795,8 @@ int kvm_cpu_exec(CPUState *cpu)
>          return EXCP_HLT;
>      }
>  
> +    qemu_mutex_unlock_iothread();
> +
>      do {
>          MemTxAttrs attrs;
>  
> @@ -1813,11 +1815,9 @@ int kvm_cpu_exec(CPUState *cpu)
>               */
>              qemu_cpu_kick_self();
>          }
> -        qemu_mutex_unlock_iothread();
>  
>          run_ret = kvm_vcpu_ioctl(cpu, KVM_RUN, 0);
>  
> -        qemu_mutex_lock_iothread();
>          attrs = kvm_arch_post_run(cpu, run);
>  
>          if (run_ret < 0) {
> @@ -1836,20 +1836,24 @@ int kvm_cpu_exec(CPUState *cpu)
>          switch (run->exit_reason) {
>          case KVM_EXIT_IO:
>              DPRINTF("handle_io\n");
> +            qemu_mutex_lock_iothread();
>              kvm_handle_io(run->io.port, attrs,
>                            (uint8_t *)run + run->io.data_offset,
>                            run->io.direction,
>                            run->io.size,
>                            run->io.count);
> +            qemu_mutex_unlock_iothread();
>              ret = 0;
>              break;
>          case KVM_EXIT_MMIO:
>              DPRINTF("handle_mmio\n");
> +            qemu_mutex_lock_iothread();
>              address_space_rw(&address_space_memory,
>                               run->mmio.phys_addr, attrs,
>                               run->mmio.data,
>                               run->mmio.len,
>                               run->mmio.is_write);
> +            qemu_mutex_unlock_iothread();
>              ret = 0;
>              break;
>          case KVM_EXIT_IRQ_WINDOW_OPEN:
> @@ -1858,7 +1862,9 @@ int kvm_cpu_exec(CPUState *cpu)
>              break;
>          case KVM_EXIT_SHUTDOWN:
>              DPRINTF("shutdown\n");
> +            qemu_mutex_lock_iothread();
>              qemu_system_reset_request();
> +            qemu_mutex_unlock_iothread();
>              ret = EXCP_INTERRUPT;
>              break;
>          case KVM_EXIT_UNKNOWN:

More context:

           fprintf(stderr, "KVM: unknown exit, hardware reason %" PRIx64 "\n",
                   (uint64_t)run->hw.hardware_exit_reason);
           ret = -1;
           break;
       case KVM_EXIT_INTERNAL_ERROR:
*          ret = kvm_handle_internal_error(cpu, run);
           break;
       case KVM_EXIT_SYSTEM_EVENT:
           switch (run->system_event.type) {
           case KVM_SYSTEM_EVENT_SHUTDOWN:
*              qemu_system_shutdown_request();
               ret = EXCP_INTERRUPT;
               break;
           case KVM_SYSTEM_EVENT_RESET:
*              qemu_system_reset_request();
               ret = EXCP_INTERRUPT;
>              break;
>          default:
>              DPRINTF("kvm_arch_handle_exit\n");
> +            qemu_mutex_lock_iothread();
>              ret = kvm_arch_handle_exit(cpu, run);
> +            qemu_mutex_unlock_iothread();
>              break;
>          }
>      } while (ret == 0);
>  
> +    qemu_mutex_lock_iothread();
> +
>      if (ret < 0) {
>          cpu_dump_state(cpu, stderr, fprintf, CPU_DUMP_CODE);
>          vm_stop(RUN_STATE_INTERNAL_ERROR);

Could you explain why above three "*" calls are safe?

Fam

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 6/9] kvm: First step to push iothread lock out of inner run loop
  2015-06-23  9:26   ` Fam Zheng
@ 2015-06-23  9:29     ` Paolo Bonzini
  2015-06-23  9:45       ` Fam Zheng
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2015-06-23  9:29 UTC (permalink / raw)
  To: Fam Zheng; +Cc: jan.kiszka, qemu-devel



On 23/06/2015 11:26, Fam Zheng wrote:
> On Thu, 06/18 18:47, Paolo Bonzini wrote:
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> This opens the path to get rid of the iothread lock on vmexits in KVM
>> mode. On x86, the in-kernel irqchips has to be used because we otherwise
>> need to synchronize APIC and other per-cpu state accesses that could be
>> changed concurrently.
>>
>> s390x and ARM should be fine without specific locking as their
>> pre/post-run callbacks are empty. MIPS and POWER require locking for
>> the pre-run callback.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  kvm-all.c         | 14 ++++++++++++--
>>  target-i386/kvm.c | 18 ++++++++++++++++++
>>  target-mips/kvm.c |  4 ++++
>>  target-ppc/kvm.c  |  4 ++++
>>  4 files changed, 38 insertions(+), 2 deletions(-)
>>
>> diff --git a/kvm-all.c b/kvm-all.c
>> index b2b1bc3..2bd8e9b 100644
>> --- a/kvm-all.c
>> +++ b/kvm-all.c
>> @@ -1795,6 +1795,8 @@ int kvm_cpu_exec(CPUState *cpu)
>>          return EXCP_HLT;
>>      }
>>  
>> +    qemu_mutex_unlock_iothread();
>> +
>>      do {
>>          MemTxAttrs attrs;
>>  
>> @@ -1813,11 +1815,9 @@ int kvm_cpu_exec(CPUState *cpu)
>>               */
>>              qemu_cpu_kick_self();
>>          }
>> -        qemu_mutex_unlock_iothread();
>>  
>>          run_ret = kvm_vcpu_ioctl(cpu, KVM_RUN, 0);
>>  
>> -        qemu_mutex_lock_iothread();
>>          attrs = kvm_arch_post_run(cpu, run);
>>  
>>          if (run_ret < 0) {
>> @@ -1836,20 +1836,24 @@ int kvm_cpu_exec(CPUState *cpu)
>>          switch (run->exit_reason) {
>>          case KVM_EXIT_IO:
>>              DPRINTF("handle_io\n");
>> +            qemu_mutex_lock_iothread();
>>              kvm_handle_io(run->io.port, attrs,
>>                            (uint8_t *)run + run->io.data_offset,
>>                            run->io.direction,
>>                            run->io.size,
>>                            run->io.count);
>> +            qemu_mutex_unlock_iothread();
>>              ret = 0;
>>              break;
>>          case KVM_EXIT_MMIO:
>>              DPRINTF("handle_mmio\n");
>> +            qemu_mutex_lock_iothread();
>>              address_space_rw(&address_space_memory,
>>                               run->mmio.phys_addr, attrs,
>>                               run->mmio.data,
>>                               run->mmio.len,
>>                               run->mmio.is_write);
>> +            qemu_mutex_unlock_iothread();
>>              ret = 0;
>>              break;
>>          case KVM_EXIT_IRQ_WINDOW_OPEN:
>> @@ -1858,7 +1862,9 @@ int kvm_cpu_exec(CPUState *cpu)
>>              break;
>>          case KVM_EXIT_SHUTDOWN:
>>              DPRINTF("shutdown\n");
>> +            qemu_mutex_lock_iothread();
>>              qemu_system_reset_request();
>> +            qemu_mutex_unlock_iothread();
>>              ret = EXCP_INTERRUPT;
>>              break;
>>          case KVM_EXIT_UNKNOWN:
> 
> More context:
> 
>            fprintf(stderr, "KVM: unknown exit, hardware reason %" PRIx64 "\n",
>                    (uint64_t)run->hw.hardware_exit_reason);
>            ret = -1;
>            break;
>        case KVM_EXIT_INTERNAL_ERROR:
> *          ret = kvm_handle_internal_error(cpu, run);

This one only accesses data internal to the VCPU thread.

>            break;
>        case KVM_EXIT_SYSTEM_EVENT:
>            switch (run->system_event.type) {
>            case KVM_SYSTEM_EVENT_SHUTDOWN:
> *              qemu_system_shutdown_request();
>                ret = EXCP_INTERRUPT;
>                break;
>            case KVM_SYSTEM_EVENT_RESET:
> *              qemu_system_reset_request();

These two are thread-safe.

Paolo

>                ret = EXCP_INTERRUPT;
>>              break;
>>          default:
>>              DPRINTF("kvm_arch_handle_exit\n");
>> +            qemu_mutex_lock_iothread();
>>              ret = kvm_arch_handle_exit(cpu, run);
>> +            qemu_mutex_unlock_iothread();
>>              break;
>>          }
>>      } while (ret == 0);
>>  
>> +    qemu_mutex_lock_iothread();
>> +
>>      if (ret < 0) {
>>          cpu_dump_state(cpu, stderr, fprintf, CPU_DUMP_CODE);
>>          vm_stop(RUN_STATE_INTERNAL_ERROR);
> 
> Could you explain why above three "*" calls are safe?
> 
> Fam
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 6/9] kvm: First step to push iothread lock out of inner run loop
  2015-06-23  9:29     ` Paolo Bonzini
@ 2015-06-23  9:45       ` Fam Zheng
  2015-06-23  9:49         ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Fam Zheng @ 2015-06-23  9:45 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jan.kiszka, qemu-devel

On Tue, 06/23 11:29, Paolo Bonzini wrote:
> 
> 
> On 23/06/2015 11:26, Fam Zheng wrote:
> > On Thu, 06/18 18:47, Paolo Bonzini wrote:
> >> From: Jan Kiszka <jan.kiszka@siemens.com>
> >>
> >> This opens the path to get rid of the iothread lock on vmexits in KVM
> >> mode. On x86, the in-kernel irqchips has to be used because we otherwise
> >> need to synchronize APIC and other per-cpu state accesses that could be
> >> changed concurrently.
> >>
> >> s390x and ARM should be fine without specific locking as their
> >> pre/post-run callbacks are empty. MIPS and POWER require locking for
> >> the pre-run callback.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >> ---
> >>  kvm-all.c         | 14 ++++++++++++--
> >>  target-i386/kvm.c | 18 ++++++++++++++++++
> >>  target-mips/kvm.c |  4 ++++
> >>  target-ppc/kvm.c  |  4 ++++
> >>  4 files changed, 38 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/kvm-all.c b/kvm-all.c
> >> index b2b1bc3..2bd8e9b 100644
> >> --- a/kvm-all.c
> >> +++ b/kvm-all.c
> >> @@ -1795,6 +1795,8 @@ int kvm_cpu_exec(CPUState *cpu)
> >>          return EXCP_HLT;
> >>      }
> >>  
> >> +    qemu_mutex_unlock_iothread();
> >> +
> >>      do {
> >>          MemTxAttrs attrs;
> >>  
> >> @@ -1813,11 +1815,9 @@ int kvm_cpu_exec(CPUState *cpu)
> >>               */
> >>              qemu_cpu_kick_self();
> >>          }
> >> -        qemu_mutex_unlock_iothread();
> >>  
> >>          run_ret = kvm_vcpu_ioctl(cpu, KVM_RUN, 0);
> >>  
> >> -        qemu_mutex_lock_iothread();
> >>          attrs = kvm_arch_post_run(cpu, run);
> >>  
> >>          if (run_ret < 0) {
> >> @@ -1836,20 +1836,24 @@ int kvm_cpu_exec(CPUState *cpu)
> >>          switch (run->exit_reason) {
> >>          case KVM_EXIT_IO:
> >>              DPRINTF("handle_io\n");
> >> +            qemu_mutex_lock_iothread();
> >>              kvm_handle_io(run->io.port, attrs,
> >>                            (uint8_t *)run + run->io.data_offset,
> >>                            run->io.direction,
> >>                            run->io.size,
> >>                            run->io.count);
> >> +            qemu_mutex_unlock_iothread();
> >>              ret = 0;
> >>              break;
> >>          case KVM_EXIT_MMIO:
> >>              DPRINTF("handle_mmio\n");
> >> +            qemu_mutex_lock_iothread();
> >>              address_space_rw(&address_space_memory,
> >>                               run->mmio.phys_addr, attrs,
> >>                               run->mmio.data,
> >>                               run->mmio.len,
> >>                               run->mmio.is_write);
> >> +            qemu_mutex_unlock_iothread();
> >>              ret = 0;
> >>              break;
> >>          case KVM_EXIT_IRQ_WINDOW_OPEN:
> >> @@ -1858,7 +1862,9 @@ int kvm_cpu_exec(CPUState *cpu)
> >>              break;
> >>          case KVM_EXIT_SHUTDOWN:
> >>              DPRINTF("shutdown\n");
> >> +            qemu_mutex_lock_iothread();
> >>              qemu_system_reset_request();
> >> +            qemu_mutex_unlock_iothread();
> >>              ret = EXCP_INTERRUPT;
> >>              break;
> >>          case KVM_EXIT_UNKNOWN:
> > 
> > More context:
> > 
> >            fprintf(stderr, "KVM: unknown exit, hardware reason %" PRIx64 "\n",
> >                    (uint64_t)run->hw.hardware_exit_reason);
> >            ret = -1;
> >            break;
> >        case KVM_EXIT_INTERNAL_ERROR:
> > *          ret = kvm_handle_internal_error(cpu, run);
> 
> This one only accesses data internal to the VCPU thread.
> 
> >            break;
> >        case KVM_EXIT_SYSTEM_EVENT:
> >            switch (run->system_event.type) {
> >            case KVM_SYSTEM_EVENT_SHUTDOWN:
> > *              qemu_system_shutdown_request();
> >                ret = EXCP_INTERRUPT;
> >                break;
> >            case KVM_SYSTEM_EVENT_RESET:
> > *              qemu_system_reset_request();
> 
> These two are thread-safe.

But above you add lock/unlock around the qemu_system_reset_request() under
KVM_EXIT_SHUTDOWN.  What's different?

Fam

> 
> Paolo
> 
> >                ret = EXCP_INTERRUPT;
> >>              break;
> >>          default:
> >>              DPRINTF("kvm_arch_handle_exit\n");
> >> +            qemu_mutex_lock_iothread();
> >>              ret = kvm_arch_handle_exit(cpu, run);
> >> +            qemu_mutex_unlock_iothread();
> >>              break;
> >>          }
> >>      } while (ret == 0);
> >>  
> >> +    qemu_mutex_lock_iothread();
> >> +
> >>      if (ret < 0) {
> >>          cpu_dump_state(cpu, stderr, fprintf, CPU_DUMP_CODE);
> >>          vm_stop(RUN_STATE_INTERNAL_ERROR);
> > 
> > Could you explain why above three "*" calls are safe?
> > 
> > Fam
> > 
> > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 6/9] kvm: First step to push iothread lock out of inner run loop
  2015-06-23  9:45       ` Fam Zheng
@ 2015-06-23  9:49         ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2015-06-23  9:49 UTC (permalink / raw)
  To: Fam Zheng; +Cc: jan.kiszka, qemu-devel



On 23/06/2015 11:45, Fam Zheng wrote:
>>> > >            break;
>>> > >        case KVM_EXIT_SYSTEM_EVENT:
>>> > >            switch (run->system_event.type) {
>>> > >            case KVM_SYSTEM_EVENT_SHUTDOWN:
>>> > > *              qemu_system_shutdown_request();
>>> > >                ret = EXCP_INTERRUPT;
>>> > >                break;
>>> > >            case KVM_SYSTEM_EVENT_RESET:
>>> > > *              qemu_system_reset_request();
>> > 
>> > These two are thread-safe.
> But above you add lock/unlock around the qemu_system_reset_request() under
> KVM_EXIT_SHUTDOWN.  What's different?

That's unnecessary.

Also, just below this switch there's another call to
kvm_arch_handle_exit that should be wrapped by lock/unlock (it works
anyway because no one handles KVM_EXIT_SYSTEM_EVENT in
kvm_arch_handle_exit, but it's not clean).

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] main-loop: use qemu_mutex_lock_iothread consistently
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 1/9] main-loop: use qemu_mutex_lock_iothread consistently Paolo Bonzini
@ 2015-06-23 13:49   ` Frederic Konrad
  2015-06-23 13:56     ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Frederic Konrad @ 2015-06-23 13:49 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: jan.kiszka, Mark Burton, Guillaume Delbergue

On 18/06/2015 18:47, Paolo Bonzini wrote:
> The next patch will require the BQL to be always taken with
> qemu_mutex_lock_iothread(), while right now this isn't the case.
>
> Outside TCG mode this is not a problem.  In TCG mode, we need to be
> careful and avoid the "prod out of compiled code" step if already
> in a VCPU thread.  This is easily done with a check on current_cpu,
> i.e. qemu_in_vcpu_thread().
>
> Hopefully, multithreaded TCG will get rid of the whole logic to kick
> VCPUs whenever an I/O event occurs!
Hopefully :), this means dropping the iothread mutex as soon as possible and
removing the iothread_requesting_mutex I guess..

Fred

>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   cpus.c | 13 ++++++++-----
>   1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index de6469f..2e807f9 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -924,7 +924,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>       CPUState *cpu = arg;
>       int r;
>   
> -    qemu_mutex_lock(&qemu_global_mutex);
> +    qemu_mutex_lock_iothread();
>       qemu_thread_get_self(cpu->thread);
>       cpu->thread_id = qemu_get_thread_id();
>       cpu->can_do_io = 1;
> @@ -1004,10 +1004,10 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>   {
>       CPUState *cpu = arg;
>   
> +    qemu_mutex_lock_iothread();
>       qemu_tcg_init_cpu_signals();
>       qemu_thread_get_self(cpu->thread);
>   
> -    qemu_mutex_lock(&qemu_global_mutex);
>       CPU_FOREACH(cpu) {
>           cpu->thread_id = qemu_get_thread_id();
>           cpu->created = true;
> @@ -1118,7 +1118,11 @@ bool qemu_in_vcpu_thread(void)
>   
>   void qemu_mutex_lock_iothread(void)
>   {
>       atomic_inc(&iothread_requesting_mutex);
> -    if (!tcg_enabled() || !first_cpu || !first_cpu->thread) {
> +    /* In the simple case there is no need to bump the VCPU thread out of
> +     * TCG code execution.
> +     */
> +    if (!tcg_enabled() || qemu_in_vcpu_thread() ||
> +        !first_cpu || !first_cpu->thread) {
>           qemu_mutex_lock(&qemu_global_mutex);
>           atomic_dec(&iothread_requesting_mutex);
>       } else {

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] main-loop: use qemu_mutex_lock_iothread consistently
  2015-06-23 13:49   ` Frederic Konrad
@ 2015-06-23 13:56     ` Paolo Bonzini
  2015-06-23 14:18       ` Frederic Konrad
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2015-06-23 13:56 UTC (permalink / raw)
  To: Frederic Konrad, qemu-devel; +Cc: jan.kiszka, Mark Burton, Guillaume Delbergue



On 23/06/2015 15:49, Frederic Konrad wrote:
>>
>> Hopefully, multithreaded TCG will get rid of the whole logic to kick
>> VCPUs whenever an I/O event occurs!
> Hopefully :), this means dropping the iothread mutex as soon as possible
> and removing the iothread_requesting_mutex I guess..

Yes---running most of cpu_exec outside the BQL, like KVM.  io_read and
io_write would have to get and release the lock if necessary.

cpu_resume_from_signal also might have to release the lock.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] main-loop: use qemu_mutex_lock_iothread consistently
  2015-06-23 13:56     ` Paolo Bonzini
@ 2015-06-23 14:18       ` Frederic Konrad
  0 siblings, 0 replies; 28+ messages in thread
From: Frederic Konrad @ 2015-06-23 14:18 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: jan.kiszka, Mark Burton, Guillaume Delbergue

On 23/06/2015 15:56, Paolo Bonzini wrote:
>
> On 23/06/2015 15:49, Frederic Konrad wrote:
>>> Hopefully, multithreaded TCG will get rid of the whole logic to kick
>>> VCPUs whenever an I/O event occurs!
>> Hopefully :), this means dropping the iothread mutex as soon as possible
>> and removing the iothread_requesting_mutex I guess..
> Yes---running most of cpu_exec outside the BQL, like KVM.  io_read and
> io_write would have to get and release the lock if necessary.
>
> cpu_resume_from_signal also might have to release the lock.
>
> Paolo

Ok good,
Can you add me in CC to this series so I can see when it's pulled etc 
please.

Thanks,
Fred

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 5/9] memory: let address_space_rw/ld*/st* run outside the BQL
  2015-06-18 16:47 ` [Qemu-devel] [PATCH 5/9] memory: let address_space_rw/ld*/st* run outside the BQL Paolo Bonzini
@ 2015-06-24 16:56   ` Alex Bennée
  2015-06-24 17:21     ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Bennée @ 2015-06-24 16:56 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jan.kiszka, qemu-devel


Paolo Bonzini <pbonzini@redhat.com> writes:

> From: Jan Kiszka <jan.kiszka@siemens.com>
>
> The MMIO case is further broken up in two cases: if the caller does not
> hold the BQL on invocation, the unlocked one takes or avoids BQL depending
> on the locking strategy of the target memory region and its coalesced
> MMIO handling.  In this case, the caller should not hold _any_ lock
> (a friendly suggestion which is disregarded by virtio-scsi-dataplane).
>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  exec.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 57 insertions(+), 9 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 094f87e..78c99f6 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -48,6 +48,7 @@
>  #endif
>  #include "exec/cpu-all.h"
>  #include "qemu/rcu_queue.h"
> +#include "qemu/main-loop.h"
>  #include "exec/cputlb.h"
>  #include "translate-all.h"
>  
> @@ -2318,11 +2319,27 @@ static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
>      return l;
>  }
>  
> -static void prepare_mmio_access(MemoryRegion *mr)
> +static bool prepare_mmio_access(MemoryRegion *mr)
>  {
> +    bool unlocked = !qemu_mutex_iothread_locked();
> +    bool release_lock = false;
> +
> +    if (unlocked && mr->global_locking) {
> +        qemu_mutex_lock_iothread();
> +        unlocked = false;
> +        release_lock = true;
> +    }
>      if (mr->flush_coalesced_mmio) {
> +        if (unlocked) {
> +            qemu_mutex_lock_iothread();
> +        }
>          qemu_flush_coalesced_mmio_buffer();
> +        if (unlocked) {
> +            qemu_mutex_unlock_iothread();
> +        }
>      }
> +
> +    return release_lock;
>  }

This is where I get confused between the advantage of this over however
same pid recursive locking. If you use recursive locking you don't need
to add a bunch of state to remind you of when to release the lock. Then
you'd just need:

static void prepare_mmio_access(MemoryRegion *mr)
{
    if (mr->global_locking) {
        qemu_mutex_lock_iothread();
    }
    if (mr->flush_coalesced_mmio) {
        qemu_mutex_lock_iothread();
        qemu_flush_coalesced_mmio_buffer();
        qemu_mutex_unlock_iothread();
    }
}

and a bunch of:

if (mr->global_locking)
   qemu_mutex_unlock_iothread();

in the access functions. Although I suspect you could push the
mr->global_locking up to the dispatch functions.

Am I missing something here?

>  
>  MemTxResult address_space_rw(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
> @@ -2334,6 +2351,7 @@ MemTxResult address_space_rw(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
>      hwaddr addr1;
>      MemoryRegion *mr;
>      MemTxResult result = MEMTX_OK;
> +    bool release_lock = false;
>  
>      rcu_read_lock();
>      while (len > 0) {
> @@ -2342,7 +2360,7 @@ MemTxResult address_space_rw(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
>  
>          if (is_write) {
>              if (!memory_access_is_direct(mr, is_write)) {
> -                prepare_mmio_access(mr);
> +                release_lock |= prepare_mmio_access(mr);
>                  l = memory_access_size(mr, l, addr1);
>                  /* XXX: could force current_cpu to NULL to avoid
>                     potential bugs */
> @@ -2384,7 +2402,7 @@ MemTxResult address_space_rw(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
>          } else {
>              if (!memory_access_is_direct(mr, is_write)) {
>                  /* I/O case */
> -                prepare_mmio_access(mr);
> +                release_lock |= prepare_mmio_access(mr);
>                  l = memory_access_size(mr, l, addr1);
>                  switch (l) {
>                  case 8:
> @@ -2420,6 +2438,12 @@ MemTxResult address_space_rw(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
>                  memcpy(buf, ptr, l);
>              }
>          }
> +
> +        if (release_lock) {
> +            qemu_mutex_unlock_iothread();
> +            release_lock = false;
> +        }
> +
>          len -= l;
>          buf += l;
>          addr += l;
> @@ -2746,11 +2770,12 @@ static inline uint32_t address_space_ldl_internal(AddressSpace *as, hwaddr addr,
>      hwaddr l = 4;
>      hwaddr addr1;
>      MemTxResult r;
> +    bool release_lock = false;
>  
>      rcu_read_lock();
>      mr = address_space_translate(as, addr, &addr1, &l, false);
>      if (l < 4 || !memory_access_is_direct(mr, false)) {
> -        prepare_mmio_access(mr);
> +        release_lock |= prepare_mmio_access(mr);
>  
>          /* I/O case */
>          r = memory_region_dispatch_read(mr, addr1, &val, 4, attrs);
> @@ -2784,6 +2809,9 @@ static inline uint32_t address_space_ldl_internal(AddressSpace *as, hwaddr addr,
>      if (result) {
>          *result = r;
>      }
> +    if (release_lock) {
> +        qemu_mutex_unlock_iothread();
> +    }
>      rcu_read_unlock();
>      return val;
>  }
> @@ -2836,12 +2864,13 @@ static inline uint64_t address_space_ldq_internal(AddressSpace *as, hwaddr addr,
>      hwaddr l = 8;
>      hwaddr addr1;
>      MemTxResult r;
> +    bool release_lock = false;
>  
>      rcu_read_lock();
>      mr = address_space_translate(as, addr, &addr1, &l,
>                                   false);
>      if (l < 8 || !memory_access_is_direct(mr, false)) {
> -        prepare_mmio_access(mr);
> +        release_lock |= prepare_mmio_access(mr);
>  
>          /* I/O case */
>          r = memory_region_dispatch_read(mr, addr1, &val, 8, attrs);
> @@ -2875,6 +2904,9 @@ static inline uint64_t address_space_ldq_internal(AddressSpace *as, hwaddr addr,
>      if (result) {
>          *result = r;
>      }
> +    if (release_lock) {
> +        qemu_mutex_unlock_iothread();
> +    }
>      rcu_read_unlock();
>      return val;
>  }
> @@ -2947,12 +2979,13 @@ static inline uint32_t address_space_lduw_internal(AddressSpace *as,
>      hwaddr l = 2;
>      hwaddr addr1;
>      MemTxResult r;
> +    bool release_lock = false;
>  
>      rcu_read_lock();
>      mr = address_space_translate(as, addr, &addr1, &l,
>                                   false);
>      if (l < 2 || !memory_access_is_direct(mr, false)) {
> -        prepare_mmio_access(mr);
> +        release_lock |= prepare_mmio_access(mr);
>  
>          /* I/O case */
>          r = memory_region_dispatch_read(mr, addr1, &val, 2, attrs);
> @@ -2986,6 +3019,9 @@ static inline uint32_t address_space_lduw_internal(AddressSpace *as,
>      if (result) {
>          *result = r;
>      }
> +    if (release_lock) {
> +        qemu_mutex_unlock_iothread();
> +    }
>      rcu_read_unlock();
>      return val;
>  }
> @@ -3037,12 +3073,13 @@ void address_space_stl_notdirty(AddressSpace *as, hwaddr addr, uint32_t val,
>      hwaddr l = 4;
>      hwaddr addr1;
>      MemTxResult r;
> +    bool release_lock = false;
>  
>      rcu_read_lock();
>      mr = address_space_translate(as, addr, &addr1, &l,
>                                   true);
>      if (l < 4 || !memory_access_is_direct(mr, true)) {
> -        prepare_mmio_access(mr);
> +        release_lock |= prepare_mmio_access(mr);
>  
>          r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
>      } else {
> @@ -3063,6 +3100,9 @@ void address_space_stl_notdirty(AddressSpace *as, hwaddr addr, uint32_t val,
>      if (result) {
>          *result = r;
>      }
> +    if (release_lock) {
> +        qemu_mutex_unlock_iothread();
> +    }
>      rcu_read_unlock();
>  }
>  
> @@ -3083,12 +3123,13 @@ static inline void address_space_stl_internal(AddressSpace *as,
>      hwaddr l = 4;
>      hwaddr addr1;
>      MemTxResult r;
> +    bool release_lock = false;
>  
>      rcu_read_lock();
>      mr = address_space_translate(as, addr, &addr1, &l,
>                                   true);
>      if (l < 4 || !memory_access_is_direct(mr, true)) {
> -        prepare_mmio_access(mr);
> +        release_lock |= prepare_mmio_access(mr);
>  
>  #if defined(TARGET_WORDS_BIGENDIAN)
>          if (endian == DEVICE_LITTLE_ENDIAN) {
> @@ -3121,6 +3162,9 @@ static inline void address_space_stl_internal(AddressSpace *as,
>      if (result) {
>          *result = r;
>      }
> +    if (release_lock) {
> +        qemu_mutex_unlock_iothread();
> +    }
>      rcu_read_unlock();
>  }
>  
> @@ -3190,11 +3234,12 @@ static inline void address_space_stw_internal(AddressSpace *as,
>      hwaddr l = 2;
>      hwaddr addr1;
>      MemTxResult r;
> +    bool release_lock = false;
>  
>      rcu_read_lock();
>      mr = address_space_translate(as, addr, &addr1, &l, true);
>      if (l < 2 || !memory_access_is_direct(mr, true)) {
> -        prepare_mmio_access(mr);
> +        release_lock |= prepare_mmio_access(mr);
>  
>  #if defined(TARGET_WORDS_BIGENDIAN)
>          if (endian == DEVICE_LITTLE_ENDIAN) {
> @@ -3227,6 +3272,9 @@ static inline void address_space_stw_internal(AddressSpace *as,
>      if (result) {
>          *result = r;
>      }
> +    if (release_lock) {
> +        qemu_mutex_unlock_iothread();
> +    }
>      rcu_read_unlock();
>  }

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 5/9] memory: let address_space_rw/ld*/st* run outside the BQL
  2015-06-24 16:56   ` Alex Bennée
@ 2015-06-24 17:21     ` Paolo Bonzini
  2015-06-24 18:50       ` Alex Bennée
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2015-06-24 17:21 UTC (permalink / raw)
  To: Alex Bennée; +Cc: jan.kiszka, qemu-devel

On 24/06/2015 18:56, Alex Bennée wrote:
> This is where I get confused between the advantage of this over however
> same pid recursive locking. If you use recursive locking you don't need
> to add a bunch of state to remind you of when to release the lock. Then
> you'd just need:
> 
> static void prepare_mmio_access(MemoryRegion *mr)
> {
>     if (mr->global_locking) {
>         qemu_mutex_lock_iothread();
>     }
>     if (mr->flush_coalesced_mmio) {
>         qemu_mutex_lock_iothread();
>         qemu_flush_coalesced_mmio_buffer();
>         qemu_mutex_unlock_iothread();
>     }
> }
> 
> and a bunch of:
> 
> if (mr->global_locking)
>    qemu_mutex_unlock_iothread();
> 
> in the access functions. Although I suspect you could push the
> mr->global_locking up to the dispatch functions.
> 
> Am I missing something here?

The semantics of recursive locking with respect to condition variables
are not clear.  Either cond_wait releases all locks, and then the mutex
can be released when the code doesn't expect it to be, or cond_wait
doesn't release all locks and then you have deadlocks.

POSIX says to do the latter:

	It is advised that an application should not use a
	PTHREAD_MUTEX_RECURSIVE mutex with condition variables because
	the implicit unlock performed for a pthread_cond_timedwait() or
	pthread_cond_wait() may not actually release the mutex (if it
	had been locked multiple times). If this happens, no other
	thread can satisfy the condition of the predicate."

So, recursive locking is okay if you don't have condition variables
attached to the lock (and if you cannot do without it), but
qemu_global_mutex does have them.

QEMU has so far tried to use the solution that Stevens outlines here:
https://books.google.it/books?id=kCTMFpEcIOwC&pg=PA434

	... Leave the interfaces to func1 and func2 unchanged, and avoid
	a recursive mutex by providing a private version of func2,
	called func2_locked.  To call func2_locked, hold the mutex
	embedded in the data structure whose address we pass as the
	argument.

as a way to avoid recursive locking.  This is much better because a) it
is more efficient---taking locks can be expensive even if they're
uncontended, especially if your VM spans multiple NUMA nodes on the
host; b) it is always clear when a lock is taken and when it isn't.

Note that Stevens has another example right after this one of recursive
locking, involving callbacks, but it's ill-defined.  There's no reason
for the "timeout" function in page 437 to hold the mutex when it calls
"func".  It can unlock before and re-lock afterwards, like QEMU's own
timerlist_run_timers function.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 5/9] memory: let address_space_rw/ld*/st* run outside the BQL
  2015-06-24 17:21     ` Paolo Bonzini
@ 2015-06-24 18:50       ` Alex Bennée
  2015-06-25  8:13         ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Bennée @ 2015-06-24 18:50 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jan.kiszka, qemu-devel


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 24/06/2015 18:56, Alex Bennée wrote:
>> This is where I get confused between the advantage of this over however
>> same pid recursive locking. If you use recursive locking you don't need
>> to add a bunch of state to remind you of when to release the lock. Then
>> you'd just need:
>> 
>> static void prepare_mmio_access(MemoryRegion *mr)
>> {
>>     if (mr->global_locking) {
>>         qemu_mutex_lock_iothread();
>>     }
>>     if (mr->flush_coalesced_mmio) {
>>         qemu_mutex_lock_iothread();
>>         qemu_flush_coalesced_mmio_buffer();
>>         qemu_mutex_unlock_iothread();
>>     }
>> }
>> 
>> and a bunch of:
>> 
>> if (mr->global_locking)
>>    qemu_mutex_unlock_iothread();
>> 
>> in the access functions. Although I suspect you could push the
>> mr->global_locking up to the dispatch functions.
>> 
>> Am I missing something here?
>
> The semantics of recursive locking with respect to condition variables
> are not clear.  Either cond_wait releases all locks, and then the mutex
> can be released when the code doesn't expect it to be, or cond_wait
> doesn't release all locks and then you have deadlocks.
>
> POSIX says to do the latter:
>
> 	It is advised that an application should not use a
> 	PTHREAD_MUTEX_RECURSIVE mutex with condition variables because
> 	the implicit unlock performed for a pthread_cond_timedwait() or
> 	pthread_cond_wait() may not actually release the mutex (if it
> 	had been locked multiple times). If this happens, no other
> 	thread can satisfy the condition of the predicate."
>
> So, recursive locking is okay if you don't have condition variables
> attached to the lock (and if you cannot do without it), but
> qemu_global_mutex does have them.

Ahh OK, so I was missing something ;-)

>
> QEMU has so far tried to use the solution that Stevens outlines here:
> https://books.google.it/books?id=kCTMFpEcIOwC&pg=PA434
>
> 	... Leave the interfaces to func1 and func2 unchanged, and avoid
> 	a recursive mutex by providing a private version of func2,
> 	called func2_locked.  To call func2_locked, hold the mutex
> 	embedded in the data structure whose address we pass as the
> 	argument.
>
> as a way to avoid recursive locking.  This is much better because a) it
> is more efficient---taking locks can be expensive even if they're
> uncontended, especially if your VM spans multiple NUMA nodes on the
> host; b) it is always clear when a lock is taken and when it isn't.
>
> Note that Stevens has another example right after this one of recursive
> locking, involving callbacks, but it's ill-defined.  There's no reason
> for the "timeout" function in page 437 to hold the mutex when it calls
> "func".  It can unlock before and re-lock afterwards, like QEMU's own
> timerlist_run_timers function.

Unfortunately I can't read that link but it sounds like I should get
myself a copy of the book. I take it that approach wouldn't approve of:

static __thread int iothread_lock_count;

void qemu_mutex_lock_iothread(void)
{
    if (iothread_lock_count == 0) {
        qemu_mutex_lock(&qemu_global_mutex);
    }
    iothread_lock_count++;
}

void qemu_mutex_unlock_iothread(void)
{
    iothread_lock_count--;
    if (iothread_lock_count==0) {
        qemu_mutex_unlock(&qemu_global_mutex);
    }
    if (iothread_lock_count < 0) {
        fprintf(stderr,"%s: error, too many unlocks %d\n", __func__,
                iothread_lock_count);
    }
}

Which should achieve the same "only one lock held" semantics but still
make the calling code a little less worried about tracking the state.

I guess it depends if there is ever going to be a situation where we say
"lock is held, do something different"?

>
> Paolo

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 5/9] memory: let address_space_rw/ld*/st* run outside the BQL
  2015-06-24 18:50       ` Alex Bennée
@ 2015-06-25  8:13         ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2015-06-25  8:13 UTC (permalink / raw)
  To: Alex Bennée; +Cc: jan.kiszka, qemu-devel



On 24/06/2015 20:50, Alex Bennée wrote:
> 
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>> On 24/06/2015 18:56, Alex Bennée wrote:
>>> This is where I get confused between the advantage of this over however
>>> same pid recursive locking. If you use recursive locking you don't need
>>> to add a bunch of state to remind you of when to release the lock.
>>>
>>> Am I missing something here?
>>
>> The semantics of recursive locking with respect to condition variables
>> are not clear.  Either cond_wait releases all locks, and then the mutex
>> can be released when the code doesn't expect it to be, or cond_wait
>> doesn't release all locks and then you have deadlocks.
>>
>> So, recursive locking is okay if you don't have condition variables
>> attached to the lock (and if you cannot do without it), but
>> qemu_global_mutex does have them.
> 
> Ahh OK, so I was missing something ;-)
> 
>> QEMU has so far tried to use the solution that Stevens outlines here:
>> https://books.google.it/books?id=kCTMFpEcIOwC&pg=PA434
> 
> Unfortunately I can't read that link but it sounds like I should get
> myself a copy of the book.

Try following the link from
https://en.wikipedia.org/wiki/Reentrant_mutex#References.

> I take it that approach wouldn't approve of:
> 
> static __thread int iothread_lock_count;
> 
> void qemu_mutex_lock_iothread(void)
> {
>     if (iothread_lock_count == 0) {
>         qemu_mutex_lock(&qemu_global_mutex);
>     }
>     iothread_lock_count++;
> }
> 
> void qemu_mutex_unlock_iothread(void)
> {
>     iothread_lock_count--;
>     if (iothread_lock_count==0) {
>         qemu_mutex_unlock(&qemu_global_mutex);
>     }
>     if (iothread_lock_count < 0) {
>         fprintf(stderr,"%s: error, too many unlocks %d\n", __func__,
>                 iothread_lock_count);
>     }
> }
> 
> Which should achieve the same "only one lock held" semantics but still
> make the calling code a little less worried about tracking the state.

This is effectively implementing the "other" semantics: cond_wait always
drops the lock.

BTW, fine-grained recursive mutexes are bad for another reason: you can
think of "getting the mutex" as "ensuring all the data structure's
invariant are respected" (At the time you acquire the lock, no other
thread is modifying the state, so any invariant left at unlock time must
still be there).  This is not true if you can get the mutex recursively.
 But I'm honestly not sure how much of this argument applies for
something as coarse as the iothread lock.

The best argument I have against recursive mutexes is that it's really a
one-way street.  Once you've decided to make a mutex recursive, it's
really hard to make it non-recursive.

Paolo

>>
>> Paolo
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 7/9] kvm: Switch to unlocked PIO
  2015-07-02  8:20 [Qemu-devel] [PATCH for-2.4 0/9 v3] KVM: Do I/O outside BQL whenever possible Paolo Bonzini
@ 2015-07-02  8:20 ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2015-07-02  8:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Kiszka, famz

From: Jan Kiszka <jan.kiszka@siemens.com>

Do not take the BQL before dispatching PIO requests of KVM VCPUs.
Instead, address_space_rw will do it if necessary. This enables
completely BQL-free PIO handling in KVM mode for upcoming devices with
fine-grained locking.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 kvm-all.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index ca428ca..ad5ac5e 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1804,13 +1804,12 @@ int kvm_cpu_exec(CPUState *cpu)
         switch (run->exit_reason) {
         case KVM_EXIT_IO:
             DPRINTF("handle_io\n");
-            qemu_mutex_lock_iothread();
+            /* Called outside BQL */
             kvm_handle_io(run->io.port, attrs,
                           (uint8_t *)run + run->io.data_offset,
                           run->io.direction,
                           run->io.size,
                           run->io.count);
-            qemu_mutex_unlock_iothread();
             ret = 0;
             break;
         case KVM_EXIT_MMIO:
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 7/9] kvm: Switch to unlocked PIO
  2015-06-24 16:25 [Qemu-devel] [PATCH for-2.4 v2 0/9] KVM: Do I/O outside BQL whenever possible Paolo Bonzini
@ 2015-06-24 16:25 ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2015-06-24 16:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: borntraeger, famz, Jan Kiszka

From: Jan Kiszka <jan.kiszka@siemens.com>

Do not take the BQL before dispatching PIO requests of KVM VCPUs.
Instead, address_space_rw will do it if necessary. This enables
completely BQL-free PIO handling in KVM mode for upcoming devices with
fine-grained locking.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1434646046-27150-8-git-send-email-pbonzini@redhat.com>
---
 kvm-all.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index af3d10b..07bdcfa 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1801,13 +1801,12 @@ int kvm_cpu_exec(CPUState *cpu)
         switch (run->exit_reason) {
         case KVM_EXIT_IO:
             DPRINTF("handle_io\n");
-            qemu_mutex_lock_iothread();
+            /* Called outside BQL */
             kvm_handle_io(run->io.port, attrs,
                           (uint8_t *)run + run->io.data_offset,
                           run->io.direction,
                           run->io.size,
                           run->io.count);
-            qemu_mutex_unlock_iothread();
             ret = 0;
             break;
         case KVM_EXIT_MMIO:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2015-07-02  8:21 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-18 16:47 [Qemu-devel] [PATCH for-2.4 0/9] KVM: Do I/O outside BQL whenever possible Paolo Bonzini
2015-06-18 16:47 ` [Qemu-devel] [PATCH 1/9] main-loop: use qemu_mutex_lock_iothread consistently Paolo Bonzini
2015-06-23 13:49   ` Frederic Konrad
2015-06-23 13:56     ` Paolo Bonzini
2015-06-23 14:18       ` Frederic Konrad
2015-06-18 16:47 ` [Qemu-devel] [PATCH 2/9] main-loop: introduce qemu_mutex_iothread_locked Paolo Bonzini
2015-06-23  8:48   ` Fam Zheng
2015-06-18 16:47 ` [Qemu-devel] [PATCH 3/9] memory: Add global-locking property to memory regions Paolo Bonzini
2015-06-23  8:51   ` Fam Zheng
2015-06-18 16:47 ` [Qemu-devel] [PATCH 4/9] exec: pull qemu_flush_coalesced_mmio_buffer() into address_space_rw/ld*/st* Paolo Bonzini
2015-06-23  9:05   ` Fam Zheng
2015-06-23  9:12     ` Paolo Bonzini
2015-06-18 16:47 ` [Qemu-devel] [PATCH 5/9] memory: let address_space_rw/ld*/st* run outside the BQL Paolo Bonzini
2015-06-24 16:56   ` Alex Bennée
2015-06-24 17:21     ` Paolo Bonzini
2015-06-24 18:50       ` Alex Bennée
2015-06-25  8:13         ` Paolo Bonzini
2015-06-18 16:47 ` [Qemu-devel] [PATCH 6/9] kvm: First step to push iothread lock out of inner run loop Paolo Bonzini
2015-06-18 18:19   ` Christian Borntraeger
2015-06-23  9:26   ` Fam Zheng
2015-06-23  9:29     ` Paolo Bonzini
2015-06-23  9:45       ` Fam Zheng
2015-06-23  9:49         ` Paolo Bonzini
2015-06-18 16:47 ` [Qemu-devel] [PATCH 7/9] kvm: Switch to unlocked PIO Paolo Bonzini
2015-06-18 16:47 ` [Qemu-devel] [PATCH 8/9] acpi: mark PMTIMER as unlocked Paolo Bonzini
2015-06-18 16:47 ` [Qemu-devel] [PATCH 9/9] kvm: Switch to unlocked MMIO Paolo Bonzini
2015-06-24 16:25 [Qemu-devel] [PATCH for-2.4 v2 0/9] KVM: Do I/O outside BQL whenever possible Paolo Bonzini
2015-06-24 16:25 ` [Qemu-devel] [PATCH 7/9] kvm: Switch to unlocked PIO Paolo Bonzini
2015-07-02  8:20 [Qemu-devel] [PATCH for-2.4 0/9 v3] KVM: Do I/O outside BQL whenever possible Paolo Bonzini
2015-07-02  8:20 ` [Qemu-devel] [PATCH 7/9] kvm: Switch to unlocked PIO Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.