All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer
@ 2013-07-15 14:42 Stefan Hajnoczi
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 01/13] dataplane: sync virtio.c and vring.c virtqueue state Stefan Hajnoczi
                   ` (13 more replies)
  0 siblings, 14 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-07-15 14:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

v2:
 * Rebased onto qemu.git/master
 * Added comment explaining how the dataplane thread is restarted after draining [pbonzini]

This series adds image format, QMP 'transation', and QMP 'block_resize' support
to dataplane.  This is done by replacing custom Linux AIO code with QEMU block
layer APIs.

In order for block layer APIs to be safe, bdrv_drain_all() is modified to be
aware of device emulation threads (like dataplane).
BlockDevOps->drain_threads_cb() is introduced as a way to stop device emulation
threads temporarily.  Device emulation threads may resume after this main loop
iteration completes, which is enough to allow code running under the QEMU
global mutex a chance to use the block device temporarily.

bdrv_get_aio_context() is dropped in favor of a thread-local AioContext
pointer.  This allows qemu_bh_new(), qemu_aio_wait(), and friends to
automatically use the right AioContext.  When we introduce a BlockDriverState
lock in the future, it will be possible to use block devices from multiple
threads arbitrarily (right now we only allow it if bdrv_drain_all() is used).

Three open issues:

 * Timers only work in the main loop, so I/O throttling is ignored and QED is
   unsafe with x-data-plane=on.  I am tackling this in a separate series that
   makes QEMUTimer thread-safe and can then re-enable I/O throttling.

 * Peformance analysis in progress.

 * Need to test with NBD, Gluster, and other non-file protocols.

Paolo Bonzini (2):
  exec: do not use qemu/tls.h
  qemu-thread: add TLS wrappers

Stefan Hajnoczi (11):
  dataplane: sync virtio.c and vring.c virtqueue state
  block: add BlockDevOps->drain_threads_cb()
  virtio-blk: implement BlockDevOps->drain_threads_cb()
  block: add thread_aio_context TLS variable
  block: drop bdrv_get_aio_context()
  main-loop: use thread-local AioContext
  linux-aio: bind EventNotifier to current AioContext
  block: disable I/O throttling outside main loop
  dataplane: use block layer for I/O
  dataplane: drop ioq Linux AIO request queue
  block: drop raw_get_aio_fd()

 async.c                             |   2 +
 block.c                             |  17 ++-
 block/linux-aio.c                   |  19 ++-
 block/raw-posix.c                   |  38 +----
 block/raw-win32.c                   |   2 +-
 configure                           |  21 +++
 cpus.c                              |   2 +
 exec.c                              |  10 +-
 hw/block/dataplane/Makefile.objs    |   2 +-
 hw/block/dataplane/ioq.c            | 117 ---------------
 hw/block/dataplane/ioq.h            |  57 -------
 hw/block/dataplane/virtio-blk.c     | 290 ++++++++++++------------------------
 hw/block/virtio-blk.c               |  16 ++
 hw/virtio/dataplane/vring.c         |   8 +-
 include/block/aio.h                 |   4 +
 include/block/block.h               |  17 ++-
 include/block/block_int.h           |   7 -
 include/hw/virtio/dataplane/vring.h |   2 +-
 include/qemu/tls.h                  | 125 +++++++++++++---
 include/qom/cpu.h                   |  14 +-
 main-loop.c                         |  16 +-
 tests/Makefile                      |   3 +
 tests/test-tls.c                    |  87 +++++++++++
 util/qemu-thread-win32.c            |  17 +++
 24 files changed, 426 insertions(+), 467 deletions(-)
 delete mode 100644 hw/block/dataplane/ioq.c
 delete mode 100644 hw/block/dataplane/ioq.h
 create mode 100644 tests/test-tls.c

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 01/13] dataplane: sync virtio.c and vring.c virtqueue state
  2013-07-15 14:42 [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer Stefan Hajnoczi
@ 2013-07-15 14:42 ` Stefan Hajnoczi
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 02/13] block: add BlockDevOps->drain_threads_cb() Stefan Hajnoczi
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-07-15 14:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

Load the virtio.c state into vring.c when we start dataplane mode and
vice versa when stopping dataplane mode.  This patch makes it possible
to start and stop dataplane any time while the guest is running.

This is very useful since it will allow us to go back to QEMU main loop
for bdrv_drain_all() and live migration.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
Note this patch was sent separately but I am including it here to make testing easier.

 hw/block/dataplane/virtio-blk.c     | 2 +-
 hw/virtio/dataplane/vring.c         | 8 +++++---
 include/hw/virtio/dataplane/vring.h | 2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 0356665..2faed43 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -537,7 +537,7 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s)
     /* Clean up guest notifier (irq) */
     k->set_guest_notifiers(qbus->parent, 1, false);
 
-    vring_teardown(&s->vring);
+    vring_teardown(&s->vring, s->vdev, 0);
     s->started = false;
     s->stopping = false;
 }
diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c
index e0d6e83..82cc151 100644
--- a/hw/virtio/dataplane/vring.c
+++ b/hw/virtio/dataplane/vring.c
@@ -39,8 +39,8 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
 
     vring_init(&vring->vr, virtio_queue_get_num(vdev, n), vring_ptr, 4096);
 
-    vring->last_avail_idx = 0;
-    vring->last_used_idx = 0;
+    vring->last_avail_idx = virtio_queue_get_last_avail_idx(vdev, n);
+    vring->last_used_idx = vring->vr.used->idx;
     vring->signalled_used = 0;
     vring->signalled_used_valid = false;
 
@@ -49,8 +49,10 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
     return true;
 }
 
-void vring_teardown(Vring *vring)
+void vring_teardown(Vring *vring, VirtIODevice *vdev, int n)
 {
+    virtio_queue_set_last_avail_idx(vdev, n, vring->last_avail_idx);
+
     hostmem_finalize(&vring->hostmem);
 }
 
diff --git a/include/hw/virtio/dataplane/vring.h b/include/hw/virtio/dataplane/vring.h
index 9380cb5..c0b69ff 100644
--- a/include/hw/virtio/dataplane/vring.h
+++ b/include/hw/virtio/dataplane/vring.h
@@ -50,7 +50,7 @@ static inline void vring_set_broken(Vring *vring)
 }
 
 bool vring_setup(Vring *vring, VirtIODevice *vdev, int n);
-void vring_teardown(Vring *vring);
+void vring_teardown(Vring *vring, VirtIODevice *vdev, int n);
 void vring_disable_notification(VirtIODevice *vdev, Vring *vring);
 bool vring_enable_notification(VirtIODevice *vdev, Vring *vring);
 bool vring_should_notify(VirtIODevice *vdev, Vring *vring);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 02/13] block: add BlockDevOps->drain_threads_cb()
  2013-07-15 14:42 [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer Stefan Hajnoczi
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 01/13] dataplane: sync virtio.c and vring.c virtqueue state Stefan Hajnoczi
@ 2013-07-15 14:42 ` Stefan Hajnoczi
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 03/13] virtio-blk: implement BlockDevOps->drain_threads_cb() Stefan Hajnoczi
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-07-15 14:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

There are times when the QEMU main loop wishes to drain I/O requests.
Up until now bdrv_drain_all() meant that no new guest I/O will be
processed until the next event loop iteration.

This is no longer true with dataplane since it runs outside the QEMU
global mutex.  The BlockDevOps->drain_threads_cb() interface allows the
device model to drain and stop threads.  Once draining completes, the
QEMU main loop can be sure that no further I/O will take place until
next main loop iteration.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c               | 6 ++++++
 include/block/block.h | 9 +++++++++
 2 files changed, 15 insertions(+)

diff --git a/block.c b/block.c
index 183fec8..7c5931d 100644
--- a/block.c
+++ b/block.c
@@ -1431,6 +1431,12 @@ void bdrv_drain_all(void)
     BlockDriverState *bs;
     bool busy;
 
+    QTAILQ_FOREACH(bs, &bdrv_states, list) {
+        if (bs->dev_ops && bs->dev_ops->drain_threads_cb) {
+            bs->dev_ops->drain_threads_cb(bs->dev_opaque);
+        }
+    }
+
     do {
         busy = qemu_aio_wait();
 
diff --git a/include/block/block.h b/include/block/block.h
index dd8eca1..977d5f6 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -60,6 +60,15 @@ typedef struct BlockDevOps {
      * Runs when the size changed (e.g. monitor command block_resize)
      */
     void (*resize_cb)(void *opaque);
+    /*
+     * Notifies the device model to drain any emulation threads
+     *
+     * Upon return, there must be no new or pending requests outside the QEMU
+     * main loop.  The device model may restart emulation threads after this
+     * main loop iteration, for example in a hardware register handler
+     * function.
+     */
+    void (*drain_threads_cb)(void *opaque);
 } BlockDevOps;
 
 #define BDRV_O_RDWR        0x0002
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 03/13] virtio-blk: implement BlockDevOps->drain_threads_cb()
  2013-07-15 14:42 [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer Stefan Hajnoczi
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 01/13] dataplane: sync virtio.c and vring.c virtqueue state Stefan Hajnoczi
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 02/13] block: add BlockDevOps->drain_threads_cb() Stefan Hajnoczi
@ 2013-07-15 14:42 ` Stefan Hajnoczi
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 04/13] exec: do not use qemu/tls.h Stefan Hajnoczi
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-07-15 14:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

Drain and stop the dataplane thread when bdrv_drain_all() is called.
Note that the thread will be restarted in virtio_blk_handle_output() the
next time the guest kicks the virtqueue.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/block/virtio-blk.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index cf12469..f4ad528 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -618,8 +618,24 @@ static void virtio_blk_resize(void *opaque)
     virtio_notify_config(vdev);
 }
 
+static void virtio_blk_drain_threads(void *opaque)
+{
+#ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
+    VirtIOBlock *s = VIRTIO_BLK(opaque);
+
+    if (s->dataplane) {
+        /* Drain I/O and stop thread.  Next time the guest kicks the virtqueue
+         * we restart the thread.  In the meantime the main loop may access the
+         * block device.
+         */
+        virtio_blk_data_plane_stop(s->dataplane);
+    }
+#endif
+}
+
 static const BlockDevOps virtio_block_ops = {
     .resize_cb = virtio_blk_resize,
+    .drain_threads_cb = virtio_blk_drain_threads,
 };
 
 void virtio_blk_set_conf(DeviceState *dev, VirtIOBlkConf *blk)
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 04/13] exec: do not use qemu/tls.h
  2013-07-15 14:42 [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 03/13] virtio-blk: implement BlockDevOps->drain_threads_cb() Stefan Hajnoczi
@ 2013-07-15 14:42 ` Stefan Hajnoczi
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 05/13] qemu-thread: add TLS wrappers Stefan Hajnoczi
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-07-15 14:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini

From: Paolo Bonzini <pbonzini@redhat.com>

The next patch will change qemu/tls.h to support more platforms, but at
some performance cost.  Declare current_cpu directly instead of using
the tls.h abstractions.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
Paolo may have further improvements for these patches.  This is my local version.

 exec.c             | 10 ++++++++--
 include/qemu/tls.h | 52 ----------------------------------------------------
 include/qom/cpu.h  | 14 +++++++++++---
 3 files changed, 19 insertions(+), 57 deletions(-)
 delete mode 100644 include/qemu/tls.h

diff --git a/exec.c b/exec.c
index 80ee2ab..b8db730 100644
--- a/exec.c
+++ b/exec.c
@@ -70,9 +70,15 @@ static MemoryRegion io_mem_unassigned;
 #endif
 
 CPUState *first_cpu;
+
 /* current CPU in the current thread. It is only valid inside
-   cpu_exec() */
-DEFINE_TLS(CPUState *, current_cpu);
+ * cpu_exec().  See comment in include/exec/cpu-all.h.  */
+#if defined CONFIG_KVM || (defined CONFIG_USER_ONLY && defined CONFIG_USE_NPTL)
+__thread CPUState *current_cpu;
+#else
+CPUState *current_cpu;
+#endif
+
 /* 0 = Do not count executed instructions.
    1 = Precise instruction counting.
    2 = Adaptive rate instruction counting.  */
diff --git a/include/qemu/tls.h b/include/qemu/tls.h
deleted file mode 100644
index b92ea9d..0000000
--- a/include/qemu/tls.h
+++ /dev/null
@@ -1,52 +0,0 @@
-/*
- * Abstraction layer for defining and using TLS variables
- *
- * Copyright (c) 2011 Red Hat, Inc
- * Copyright (c) 2011 Linaro Limited
- *
- * Authors:
- *  Paolo Bonzini <pbonzini@redhat.com>
- *  Peter Maydell <peter.maydell@linaro.org>
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation; either version 2 of
- * the License, or (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program; if not, see <http://www.gnu.org/licenses/>.
- */
-
-#ifndef QEMU_TLS_H
-#define QEMU_TLS_H
-
-/* Per-thread variables. Note that we only have implementations
- * which are really thread-local on Linux; the dummy implementations
- * define plain global variables.
- *
- * This means that for the moment use should be restricted to
- * per-VCPU variables, which are OK because:
- *  - the only -user mode supporting multiple VCPU threads is linux-user
- *  - TCG system mode is single-threaded regarding VCPUs
- *  - KVM system mode is multi-threaded but limited to Linux
- *
- * TODO: proper implementations via Win32 .tls sections and
- * POSIX pthread_getspecific.
- */
-#ifdef __linux__
-#define DECLARE_TLS(type, x) extern DEFINE_TLS(type, x)
-#define DEFINE_TLS(type, x)  __thread __typeof__(type) tls__##x
-#define tls_var(x)           tls__##x
-#else
-/* Dummy implementations which define plain global variables */
-#define DECLARE_TLS(type, x) extern DEFINE_TLS(type, x)
-#define DEFINE_TLS(type, x)  __typeof__(type) tls__##x
-#define tls_var(x)           tls__##x
-#endif
-
-#endif
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 147c256..cb26e1a 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -24,7 +24,6 @@
 #include "hw/qdev-core.h"
 #include "exec/hwaddr.h"
 #include "qemu/thread.h"
-#include "qemu/tls.h"
 #include "qemu/typedefs.h"
 
 typedef int (*WriteCoreDumpFunction)(void *buf, size_t size, void *opaque);
@@ -163,8 +162,17 @@ struct CPUState {
 
 extern CPUState *first_cpu;
 
-DECLARE_TLS(CPUState *, current_cpu);
-#define current_cpu tls_var(current_cpu)
+/* This is thread-local depending on __linux__ because:
+ *  - the only -user mode supporting multiple VCPU threads is linux-user
+ *  - TCG system mode is single-threaded regarding VCPUs
+ *  - KVM system mode is multi-threaded but limited to Linux
+ */
+#if defined CONFIG_KVM || (defined CONFIG_USER_ONLY && defined CONFIG_USE_NPTL)
+extern __thread CPUState *current_cpu;
+#else
+extern CPUState *current_cpu;
+#endif
+
 
 /**
  * cpu_paging_enabled:
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 05/13] qemu-thread: add TLS wrappers
  2013-07-15 14:42 [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 04/13] exec: do not use qemu/tls.h Stefan Hajnoczi
@ 2013-07-15 14:42 ` Stefan Hajnoczi
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 06/13] block: add thread_aio_context TLS variable Stefan Hajnoczi
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-07-15 14:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

From: Paolo Bonzini <pbonzini@redhat.com>

Fast TLS is not available on some platforms, but it is always nice to
use it.  This wrapper implementation falls back to pthread_get/setspecific
on POSIX systems that lack __thread, but uses the dynamic linker's TLS
support on Linux and Windows.

The user shall call tls_alloc_foo() in every thread that needs to access
the variable---exactly once and before any access.  foo is the name of
the variable as passed to DECLARE_TLS and DEFINE_TLS.  Then,
tls_get_foo() will return the address of the variable.  It is guaranteed
to remain the same across the lifetime of a thread, so you can cache it.

[Renamed alloc_foo()/get_foo() to tls_alloc_foo()/tls_get_foo()
-- Stefan]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 configure                |  21 +++++++
 include/qemu/tls.h       | 139 +++++++++++++++++++++++++++++++++++++++++++++++
 tests/Makefile           |   3 +
 tests/test-tls.c         |  87 +++++++++++++++++++++++++++++
 util/qemu-thread-win32.c |  17 ++++++
 5 files changed, 267 insertions(+)
 create mode 100644 include/qemu/tls.h
 create mode 100644 tests/test-tls.c

diff --git a/configure b/configure
index cb0f870..e215e36 100755
--- a/configure
+++ b/configure
@@ -284,6 +284,7 @@ fi
 ar="${AR-${cross_prefix}ar}"
 as="${AS-${cross_prefix}as}"
 cpp="${CPP-$cc -E}"
+nm="${NM-${cross_prefix}nm}"
 objcopy="${OBJCOPY-${cross_prefix}objcopy}"
 ld="${LD-${cross_prefix}ld}"
 libtool="${LIBTOOL-${cross_prefix}libtool}"
@@ -3163,6 +3164,22 @@ if test "$trace_backend" = "dtrace"; then
 fi
 
 ##########################################
+# check for TLS runtime
+
+# Some versions of mingw include the "magic" definitions that make
+# TLS work, some don't.  Check for it.
+
+if test "$mingw32" = yes; then
+  cat > $TMPC << EOF
+int main(void) {}
+EOF
+  compile_prog "" ""
+  if $nm $TMPE | grep _tls_used > /dev/null 2>&1; then
+    mingw32_tls_runtime=yes
+  fi
+fi
+
+##########################################
 # check and set a backend for coroutine
 
 # We prefer ucontext, but it's not always possible. The fallback
@@ -3651,6 +3668,9 @@ if test "$mingw32" = "yes" ; then
   version_micro=0
   echo "CONFIG_FILEVERSION=$version_major,$version_minor,$version_subminor,$version_micro" >> $config_host_mak
   echo "CONFIG_PRODUCTVERSION=$version_major,$version_minor,$version_subminor,$version_micro" >> $config_host_mak
+  if test "$mingw32_tls_runtime" = yes; then
+    echo "CONFIG_MINGW32_TLS_RUNTIME=y" >> $config_host_mak
+  fi
 else
   echo "CONFIG_POSIX=y" >> $config_host_mak
 fi
@@ -4073,6 +4093,7 @@ echo "OBJCC=$objcc" >> $config_host_mak
 echo "AR=$ar" >> $config_host_mak
 echo "AS=$as" >> $config_host_mak
 echo "CPP=$cpp" >> $config_host_mak
+echo "NM=$nm" >> $config_host_mak
 echo "OBJCOPY=$objcopy" >> $config_host_mak
 echo "LD=$ld" >> $config_host_mak
 echo "WINDRES=$windres" >> $config_host_mak
diff --git a/include/qemu/tls.h b/include/qemu/tls.h
new file mode 100644
index 0000000..750ccb3
--- /dev/null
+++ b/include/qemu/tls.h
@@ -0,0 +1,139 @@
+/*
+ * Abstraction layer for defining and using TLS variables
+ *
+ * Copyright (c) 2011, 2013 Red Hat, Inc
+ * Copyright (c) 2011 Linaro Limited
+ *
+ * Authors:
+ *  Paolo Bonzini <pbonzini@redhat.com>
+ *  Peter Maydell <peter.maydell@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef QEMU_TLS_H
+#define QEMU_TLS_H
+
+#if defined __linux__
+#define DECLARE_TLS(type, x)                     \
+extern __thread typeof(type) x;                  \
+                                                 \
+static inline typeof(type) *tls_get_##x(void)    \
+{                                                \
+    return &x;                                   \
+}                                                \
+                                                 \
+static inline typeof(type) *tls_alloc_##x(void)  \
+{                                                \
+    return &x;                                   \
+}                                                \
+                                                 \
+extern int dummy_##__LINE__
+
+#define DEFINE_TLS(type, x)                  \
+__thread typeof(type) x
+
+#elif defined CONFIG_POSIX
+typedef struct QEMUTLSValue {
+    pthread_key_t k;
+    pthread_once_t o;
+} QEMUTLSValue;
+
+#define DECLARE_TLS(type, x)                     \
+extern QEMUTLSValue x;                           \
+extern void init_##x(void);                      \
+                                                 \
+static inline typeof(type) *tls_get_##x(void)    \
+{                                                \
+    return pthread_getspecific(x.k);             \
+}                                                \
+                                                 \
+static inline typeof(type) *tls_alloc_##x(void)  \
+{                                                \
+    void *datum = g_malloc0(sizeof(type));       \
+    pthread_once(&x.o, init_##x);                \
+    pthread_setspecific(x.k, datum);             \
+    return datum;                                \
+}                                                \
+                                                 \
+extern int dummy_##__LINE__
+
+#define DEFINE_TLS(type, x)                  \
+void init_##x(void) {                        \
+    pthread_key_create(&x.k, g_free);        \
+}                                            \
+                                             \
+QEMUTLSValue x = { .o = PTHREAD_ONCE_INIT }
+
+#elif defined CONFIG_WIN32
+
+/* The initial contents of TLS variables are placed in the .tls section.
+ * The linker takes all section starting with ".tls$", sorts them and puts
+ * the contents in a single ".tls" section.  qemu-thread-win32.c defines
+ * special symbols in .tls$000 and .tls$ZZZ that represent the beginning
+ * and end of TLS memory.  The linker and run-time library then cooperate
+ * to copy memory between those symbols in the TLS area of new threads.
+ *
+ * _tls_index holds the number of our module.  The executable should be
+ * zero, DLLs are numbered 1 and up.  The loader fills it in for us.
+ *
+ * Thus, Teb->ThreadLocalStoragePointer[_tls_index] is the base of
+ * the TLS segment for this (thread, module) pair.  Each segment has
+ * the same layout as this module's .tls segment and is initialized
+ * with the content of the .tls segment; 0 is the _tls_start variable.
+ * So, tls_get_##x passes us the offset of the passed variable relative to
+ * _tls_start, and we return that same offset plus the base of segment.
+ */
+
+typedef struct _TEB {
+    NT_TIB NtTib;
+    void *EnvironmentPointer;
+    void *x[3];
+    char **ThreadLocalStoragePointer;
+} TEB, *PTEB;
+
+extern int _tls_index;
+extern int _tls_start;
+
+static inline void *tls_var(size_t offset)
+{
+    PTEB Teb = NtCurrentTeb();
+    return (char *)(Teb->ThreadLocalStoragePointer[_tls_index]) + offset;
+}
+
+#define DECLARE_TLS(type, x)                                         \
+extern typeof(type) tls_##x __attribute__((section(".tls$QEMU")));   \
+                                                                     \
+static inline typeof(type) *tls_get_##x(void)                        \
+{                                                                    \
+    return tls_var((ULONG_PTR)&(tls_##x) - (ULONG_PTR)&_tls_start);  \
+}                                                                    \
+                                                                     \
+static inline typeof(type) *tls_alloc_##x(void)                      \
+{                                                                    \
+    typeof(type) *addr = tls_get_##x();                              \
+    memset((void *)addr, 0, sizeof(type));                           \
+    return addr;                                                     \
+}                                                                    \
+                                                                     \
+extern int dummy_##__LINE__
+
+#define DEFINE_TLS(type, x)                                          \
+typeof(type) tls_##x __attribute__((section(".tls$QEMU")))
+
+#else
+#error No TLS abstraction available on this platform
+#endif
+
+#endif
diff --git a/tests/Makefile b/tests/Makefile
index 279d5f8..11b272e 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -42,6 +42,8 @@ check-unit-y += tests/test-xbzrle$(EXESUF)
 gcov-files-test-xbzrle-y = xbzrle.c
 check-unit-y += tests/test-cutils$(EXESUF)
 gcov-files-test-cutils-y += util/cutils.c
+check-unit-y += tests/test-tls$(EXESUF)
+gcov-files-test-tls-y +=
 check-unit-y += tests/test-mul64$(EXESUF)
 gcov-files-test-mul64-y = util/host-utils.c
 check-unit-y += tests/test-int128$(EXESUF)
@@ -102,6 +104,7 @@ tests/test-x86-cpuid$(EXESUF): tests/test-x86-cpuid.o
 tests/test-xbzrle$(EXESUF): tests/test-xbzrle.o xbzrle.o page_cache.o libqemuutil.a
 tests/test-cutils$(EXESUF): tests/test-cutils.o util/cutils.o
 tests/test-int128$(EXESUF): tests/test-int128.o
+tests/test-tls$(EXESUF): tests/test-tls.o libqemuutil.a
 
 tests/test-qapi-types.c tests/test-qapi-types.h :\
 $(SRC_PATH)/qapi-schema-test.json $(SRC_PATH)/scripts/qapi-types.py
diff --git a/tests/test-tls.c b/tests/test-tls.c
new file mode 100644
index 0000000..26a9ec7
--- /dev/null
+++ b/tests/test-tls.c
@@ -0,0 +1,87 @@
+/*
+ * Unit-tests for TLS wrappers
+ *
+ * Copyright (C) 2013 Red Hat Inc.
+ *
+ * Authors:
+ *  Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <glib.h>
+#include <errno.h>
+#include <string.h>
+
+#include "qemu-common.h"
+#include "qemu/atomic.h"
+#include "qemu/thread.h"
+#include "qemu/tls.h"
+
+DECLARE_TLS(volatile long long, cnt);
+DEFINE_TLS(volatile long long, cnt);
+
+#define NUM_THREADS 10
+
+int stop;
+
+static void *test_thread(void *arg)
+{
+    volatile long long *p_cnt = tls_alloc_cnt();
+    volatile long long **p_ret = arg;
+    long long exp = 0;
+
+    g_assert(tls_get_cnt() == p_cnt);
+    *p_ret = p_cnt;
+    g_assert(*p_cnt == 0);
+    while (atomic_mb_read(&stop) == 0) {
+        exp++;
+        (*p_cnt)++;
+        g_assert(*tls_get_cnt() == exp);
+    }
+
+    return NULL;
+}
+
+static void test_tls(void)
+{
+    volatile long long *addr[NUM_THREADS];
+    QemuThread t[NUM_THREADS];
+    int i;
+
+    for (i = 0; i < NUM_THREADS; i++) {
+        qemu_thread_create(&t[i], test_thread, &addr[i], QEMU_THREAD_JOINABLE);
+    }
+    g_usleep(1000000);
+    atomic_mb_set(&stop, 1);
+    for (i = 0; i < NUM_THREADS; i++) {
+        qemu_thread_join(&t[i]);
+    }
+    for (i = 1; i < NUM_THREADS; i++) {
+        g_assert(addr[i] != addr[i - 1]);
+    }
+}
+
+int main(int argc, char **argv)
+{
+    g_test_init(&argc, &argv, NULL);
+
+    g_test_add_func("/tls", test_tls);
+    return g_test_run();
+}
diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
index 517878d..f75e404 100644
--- a/util/qemu-thread-win32.c
+++ b/util/qemu-thread-win32.c
@@ -16,6 +16,23 @@
 #include <assert.h>
 #include <limits.h>
 
+/* TLS support.  Some versions of mingw32 provide it, others do not.  */
+
+#ifndef CONFIG_MINGW32_TLS_RUNTIME
+int __attribute__((section(".tls$AAA"))) _tls_start = 0;
+int __attribute__((section(".tls$ZZZ"))) _tls_end = 0;
+int _tls_index = 0;
+
+const IMAGE_TLS_DIRECTORY _tls_used __attribute__((used, section(".rdata$T"))) = {
+ (ULONG)(ULONG_PTR) &_tls_start, /* start of tls data */
+ (ULONG)(ULONG_PTR) &_tls_end,   /* end of tls data */
+ (ULONG)(ULONG_PTR) &_tls_index, /* address of tls_index */
+ (ULONG) 0,                      /* pointer to callbacks */
+ (ULONG) 0,                      /* size of tls zero fill */
+ (ULONG) 0                       /* characteristics */
+};
+#endif
+
 static void error_exit(int err, const char *msg)
 {
     char *pstr;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 06/13] block: add thread_aio_context TLS variable
  2013-07-15 14:42 [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 05/13] qemu-thread: add TLS wrappers Stefan Hajnoczi
@ 2013-07-15 14:42 ` Stefan Hajnoczi
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 07/13] block: drop bdrv_get_aio_context() Stefan Hajnoczi
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-07-15 14:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

BH and fd handler APIs need to know which AioContext (event loop) to run
inside.  Passing it around everywhere is not feasible since it would
require adding arguments to any call-chain that invokes BH and fd
handler APIs (hint: there are many and they change).

Instead make the AioContext pointer thread-local.  This way, any
function that needs to use an AioContext can find it.  In particular:
the iothread and vcpu threads share the main AioContext since they hold
the global mutex while running.  Dataplane threads will use their own
AioContexts.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 async.c             | 2 ++
 cpus.c              | 2 ++
 include/block/aio.h | 4 ++++
 main-loop.c         | 1 +
 4 files changed, 9 insertions(+)

diff --git a/async.c b/async.c
index 90fe906..765cbc0 100644
--- a/async.c
+++ b/async.c
@@ -27,6 +27,8 @@
 #include "block/thread-pool.h"
 #include "qemu/main-loop.h"
 
+DEFINE_TLS(AioContext*, thread_aio_context);
+
 /***********************************************************/
 /* bottom halves (can be seen as timers which expire ASAP) */
 
diff --git a/cpus.c b/cpus.c
index f141428..61e86a8 100644
--- a/cpus.c
+++ b/cpus.c
@@ -733,6 +733,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
     qemu_thread_get_self(cpu->thread);
     cpu->thread_id = qemu_get_thread_id();
     current_cpu = cpu;
+    *tls_alloc_thread_aio_context() = qemu_get_aio_context();
 
     r = kvm_init_vcpu(cpu);
     if (r < 0) {
@@ -806,6 +807,7 @@ static void tcg_exec_all(void);
 static void tcg_signal_cpu_creation(CPUState *cpu, void *data)
 {
     cpu->thread_id = qemu_get_thread_id();
+    *tls_alloc_thread_aio_context() = qemu_get_aio_context();
     cpu->created = true;
 }
 
diff --git a/include/block/aio.h b/include/block/aio.h
index 1836793..6ee9369 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -17,6 +17,7 @@
 #include "qemu-common.h"
 #include "qemu/queue.h"
 #include "qemu/event_notifier.h"
+#include "qemu/tls.h"
 
 typedef struct BlockDriverAIOCB BlockDriverAIOCB;
 typedef void BlockDriverCompletionFunc(void *opaque, int ret);
@@ -74,6 +75,9 @@ typedef struct AioContext {
 /* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
 typedef int (AioFlushEventNotifierHandler)(EventNotifier *e);
 
+/* Each thread can have a default AioContext */
+DECLARE_TLS(AioContext*, thread_aio_context);
+
 /**
  * aio_context_new: Allocate a new AioContext.
  *
diff --git a/main-loop.c b/main-loop.c
index a44fff6..5af3963 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -143,6 +143,7 @@ int qemu_init_main_loop(void)
 
     gpollfds = g_array_new(FALSE, FALSE, sizeof(GPollFD));
     qemu_aio_context = aio_context_new();
+    *tls_get_thread_aio_context() = qemu_aio_context;
     src = aio_get_g_source(qemu_aio_context);
     g_source_attach(src, NULL);
     g_source_unref(src);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 07/13] block: drop bdrv_get_aio_context()
  2013-07-15 14:42 [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer Stefan Hajnoczi
                   ` (5 preceding siblings ...)
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 06/13] block: add thread_aio_context TLS variable Stefan Hajnoczi
@ 2013-07-15 14:42 ` Stefan Hajnoczi
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 08/13] main-loop: use thread-local AioContext Stefan Hajnoczi
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-07-15 14:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

Associating a BlockDriverState with a single AioContext is not flexible
enough.  Once we make BlockDriverState thread-safe, it will be possible
to call bdrv_*() functions from multiple event loops.

Use the thread-local AioContext pointer instead of
bdrv_get_aio_context().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c                   | 6 ------
 block/raw-posix.c         | 4 ++--
 block/raw-win32.c         | 2 +-
 include/block/block_int.h | 7 -------
 4 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/block.c b/block.c
index 7c5931d..26644ec 100644
--- a/block.c
+++ b/block.c
@@ -4584,12 +4584,6 @@ out:
     }
 }
 
-AioContext *bdrv_get_aio_context(BlockDriverState *bs)
-{
-    /* Currently BlockDriverState always uses the main loop AioContext */
-    return qemu_get_aio_context();
-}
-
 void bdrv_add_before_write_notifier(BlockDriverState *bs,
                                     NotifierWithReturn *notifier)
 {
diff --git a/block/raw-posix.c b/block/raw-posix.c
index ba721d3..e8eb396 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -796,7 +796,7 @@ static BlockDriverAIOCB *paio_submit(BlockDriverState *bs, int fd,
     acb->aio_offset = sector_num * 512;
 
     trace_paio_submit(acb, opaque, sector_num, nb_sectors, type);
-    pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
+    pool = aio_get_thread_pool(*tls_get_thread_aio_context());
     return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
 }
 
@@ -1462,7 +1462,7 @@ static BlockDriverAIOCB *hdev_aio_ioctl(BlockDriverState *bs,
     acb->aio_offset = 0;
     acb->aio_ioctl_buf = buf;
     acb->aio_ioctl_cmd = req;
-    pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
+    pool = aio_get_thread_pool(*tls_get_thread_aio_context());
     return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
 }
 
diff --git a/block/raw-win32.c b/block/raw-win32.c
index 9b5b2af..2d57d75 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -158,7 +158,7 @@ static BlockDriverAIOCB *paio_submit(BlockDriverState *bs, HANDLE hfile,
     acb->aio_offset = sector_num * 512;
 
     trace_paio_submit(acb, opaque, sector_num, nb_sectors, type);
-    pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
+    pool = aio_get_thread_pool(*tls_get_thread_aio_context());
     return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
 }
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index c6ac871..cd0e0a8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -319,13 +319,6 @@ void bdrv_set_io_limits(BlockDriverState *bs,
 void bdrv_add_before_write_notifier(BlockDriverState *bs,
                                     NotifierWithReturn *notifier);
 
-/**
- * bdrv_get_aio_context:
- *
- * Returns: the currently bound #AioContext
- */
-AioContext *bdrv_get_aio_context(BlockDriverState *bs);
-
 #ifdef _WIN32
 int is_windows_drive(const char *filename);
 #endif
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 08/13] main-loop: use thread-local AioContext
  2013-07-15 14:42 [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer Stefan Hajnoczi
                   ` (6 preceding siblings ...)
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 07/13] block: drop bdrv_get_aio_context() Stefan Hajnoczi
@ 2013-07-15 14:42 ` Stefan Hajnoczi
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 09/13] linux-aio: bind EventNotifier to current AioContext Stefan Hajnoczi
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-07-15 14:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

qemu_bh_new() and several other functions use qemu_aio_context, the
global QEMU main loop AioContext.  Now that we have introduced threads
with their own AioContext and a thread-local AioContext pointer, convert
qemu_bh_new() and friends to use the thread-local AioContext pointer.

qemu_bh_new() now creates a QEMUBH for the current thread's AioContext.
This is the right thing to do - it allows existing code to work outside
the main loop thread.

One exception: qemu_notify_event() still uses the global
qemu_aio_context because all callers expect to kick the main loop, not
some other event loop.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 main-loop.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/main-loop.c b/main-loop.c
index 5af3963..5fbdd4a 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -320,7 +320,9 @@ void qemu_del_wait_object(HANDLE handle, WaitObjectFunc *func, void *opaque)
 
 void qemu_fd_register(int fd)
 {
-    WSAEventSelect(fd, event_notifier_get_handle(&qemu_aio_context->notifier),
+    AioContext *ctx = *tls_get_thread_aio_context();
+    HANDLE h = event_notifier_get_handle(&ctx->notifier);
+    WSAEventSelect(fd, h,
                    FD_READ | FD_ACCEPT | FD_CLOSE |
                    FD_CONNECT | FD_WRITE | FD_OOB);
 }
@@ -478,12 +480,12 @@ int main_loop_wait(int nonblocking)
 
 QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
 {
-    return aio_bh_new(qemu_aio_context, cb, opaque);
+    return aio_bh_new(*tls_get_thread_aio_context(), cb, opaque);
 }
 
 bool qemu_aio_wait(void)
 {
-    return aio_poll(qemu_aio_context, true);
+    return aio_poll(*tls_get_thread_aio_context(), true);
 }
 
 #ifdef CONFIG_POSIX
@@ -493,8 +495,8 @@ void qemu_aio_set_fd_handler(int fd,
                              AioFlushHandler *io_flush,
                              void *opaque)
 {
-    aio_set_fd_handler(qemu_aio_context, fd, io_read, io_write, io_flush,
-                       opaque);
+    aio_set_fd_handler(*tls_get_thread_aio_context(), fd,
+                       io_read, io_write, io_flush, opaque);
 }
 #endif
 
@@ -502,5 +504,6 @@ void qemu_aio_set_event_notifier(EventNotifier *notifier,
                                  EventNotifierHandler *io_read,
                                  AioFlushEventNotifierHandler *io_flush)
 {
-    aio_set_event_notifier(qemu_aio_context, notifier, io_read, io_flush);
+    aio_set_event_notifier(*tls_get_thread_aio_context(), notifier,
+                           io_read, io_flush);
 }
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 09/13] linux-aio: bind EventNotifier to current AioContext
  2013-07-15 14:42 [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer Stefan Hajnoczi
                   ` (7 preceding siblings ...)
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 08/13] main-loop: use thread-local AioContext Stefan Hajnoczi
@ 2013-07-15 14:42 ` Stefan Hajnoczi
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 10/13] block: disable I/O throttling outside main loop Stefan Hajnoczi
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-07-15 14:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

Each block/raw-posix.c BlockDriverState has a struct qemu_laio_state
that holds the Linux AIO context.  Currently the EventNotifier is
assigned to the main loop.

In order to use block/linux-aio.c from other threads/event loops, we
need to assign the EventNotifier to the current event loop.  This patch
implements this by assigning the EventNotifier upon the first pending
request and deassigning it on the last pending request.

It assumes that we flush requests before issuing I/O from another event
loop (this is true).

This new approach could affect performance since we assign/deassign the
EventNotifier while processing I/O.  However, it is the simplest way to
support Linux AIO from multiple threads/event loops, so I think it's a
reasonable start.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/linux-aio.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index ee0f8d1..36633a2 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -56,6 +56,9 @@ static void qemu_laio_process_completion(struct qemu_laio_state *s,
     int ret;
 
     s->count--;
+    if (s->count == 0) {
+        qemu_aio_set_event_notifier(&s->e, NULL, NULL);
+    }
 
     ret = laiocb->ret;
     if (ret != -ECANCELED) {
@@ -176,6 +179,16 @@ BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
                         __func__, type);
         goto out_free_aiocb;
     }
+
+    /* Assign our EventNotifier to the current thread's AioContext.  This
+     * allows Linux AIO to be used from multiple threads so long as they flush
+     * before switching threads.
+     */
+    if (s->count == 0) {
+        qemu_aio_set_event_notifier(&s->e, qemu_laio_completion_cb,
+                                    qemu_laio_flush_cb);
+    }
+
     io_set_eventfd(&laiocb->iocb, event_notifier_get_fd(&s->e));
     s->count++;
 
@@ -185,6 +198,9 @@ BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
 
 out_dec_count:
     s->count--;
+    if (s->count == 0) {
+        qemu_aio_set_event_notifier(&s->e, NULL, NULL);
+    }
 out_free_aiocb:
     qemu_aio_release(laiocb);
     return NULL;
@@ -203,9 +219,6 @@ void *laio_init(void)
         goto out_close_efd;
     }
 
-    qemu_aio_set_event_notifier(&s->e, qemu_laio_completion_cb,
-                                qemu_laio_flush_cb);
-
     return s;
 
 out_close_efd:
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 10/13] block: disable I/O throttling outside main loop
  2013-07-15 14:42 [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer Stefan Hajnoczi
                   ` (8 preceding siblings ...)
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 09/13] linux-aio: bind EventNotifier to current AioContext Stefan Hajnoczi
@ 2013-07-15 14:42 ` Stefan Hajnoczi
  2013-07-15 14:43 ` [Qemu-devel] [PATCH v2 11/13] dataplane: use block layer for I/O Stefan Hajnoczi
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-07-15 14:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

Timers only work in the main loop.  This means threads running their own
AioContext cannot use I/O throttling for now.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block.c b/block.c
index 26644ec..daf5717 100644
--- a/block.c
+++ b/block.c
@@ -169,6 +169,11 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
 {
     int64_t wait_time = -1;
 
+    /* Timers currently only work in the main loop */
+    if (*tls_get_thread_aio_context() != qemu_get_aio_context()) {
+        return;
+    }
+
     if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
         qemu_co_queue_wait(&bs->throttled_reqs);
     }
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 11/13] dataplane: use block layer for I/O
  2013-07-15 14:42 [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer Stefan Hajnoczi
                   ` (9 preceding siblings ...)
  2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 10/13] block: disable I/O throttling outside main loop Stefan Hajnoczi
@ 2013-07-15 14:43 ` Stefan Hajnoczi
  2013-07-15 14:43 ` [Qemu-devel] [PATCH v2 12/13] dataplane: drop ioq Linux AIO request queue Stefan Hajnoczi
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-07-15 14:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

Use the QEMU block layer instead of custom Linux AIO code.  A couple of
changes are necessary to make request processing work in this new
environment:

1. Replace ioq_*() calls with bdrv_aio_*().  We finally support
   asynchronous flush since we get it for free from the QEMU block
   layer!

2. Add a data buffer QEMUIOVector field to struct VirtIOBlockRequest.
   We previously relied on io_submit(2) semantics where the kernel
   copied in iovecs, and we therefore didn't keep them around.  The QEMU
   block layer *does* require that the iovecs are alive across the
   duration of the request.

Note that we still prevent block jobs, hot unplug, and live migration
since there is no synchronization with the main thread.  The next patch
series will introduce a mechanism to arbitrate block layer calls between
threads.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/block/dataplane/virtio-blk.c | 288 +++++++++++++---------------------------
 1 file changed, 92 insertions(+), 196 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 2faed43..5fde06f 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -17,7 +17,6 @@
 #include "qemu/thread.h"
 #include "qemu/error-report.h"
 #include "hw/virtio/dataplane/vring.h"
-#include "ioq.h"
 #include "migration/migration.h"
 #include "block/block.h"
 #include "hw/virtio/virtio-blk.h"
@@ -34,11 +33,10 @@ enum {
 };
 
 typedef struct {
-    struct iocb iocb;               /* Linux AIO control block */
-    QEMUIOVector *inhdr;            /* iovecs for virtio_blk_inhdr */
+    VirtIOBlockDataPlane *dataplane;
+    QEMUIOVector data;              /* data buffer */
+    QEMUIOVector inhdr;             /* iovecs for virtio_blk_inhdr */
     unsigned int head;              /* vring descriptor index */
-    struct iovec *bounce_iov;       /* used if guest buffers are unaligned */
-    QEMUIOVector *read_qiov;        /* for read completion /w bounce buffer */
 } VirtIOBlockRequest;
 
 struct VirtIOBlockDataPlane {
@@ -48,7 +46,7 @@ struct VirtIOBlockDataPlane {
     QemuThread thread;
 
     VirtIOBlkConf *blk;
-    int fd;                         /* image file descriptor */
+    BlockDriverState *bs;           /* block device */
 
     VirtIODevice *vdev;
     Vring vring;                    /* virtqueue vring */
@@ -60,14 +58,8 @@ struct VirtIOBlockDataPlane {
      * use it).
      */
     AioContext *ctx;
-    EventNotifier io_notifier;      /* Linux AIO completion */
     EventNotifier host_notifier;    /* doorbell */
 
-    IOQueue ioqueue;                /* Linux AIO queue (should really be per
-                                       dataplane thread) */
-    VirtIOBlockRequest requests[REQ_MAX]; /* pool of requests, managed by the
-                                             queue */
-
     unsigned int num_reqs;
 
     Error *migration_blocker;
@@ -83,133 +75,113 @@ static void notify_guest(VirtIOBlockDataPlane *s)
     event_notifier_set(s->guest_notifier);
 }
 
-static void complete_request(struct iocb *iocb, ssize_t ret, void *opaque)
+static void complete_request(VirtIOBlockRequest *req,
+                             unsigned char status,
+                             int len)
 {
-    VirtIOBlockDataPlane *s = opaque;
-    VirtIOBlockRequest *req = container_of(iocb, VirtIOBlockRequest, iocb);
-    struct virtio_blk_inhdr hdr;
-    int len;
-
-    if (likely(ret >= 0)) {
-        hdr.status = VIRTIO_BLK_S_OK;
-        len = ret;
-    } else {
-        hdr.status = VIRTIO_BLK_S_IOERR;
-        len = 0;
-    }
-
-    trace_virtio_blk_data_plane_complete_request(s, req->head, ret);
-
-    if (req->read_qiov) {
-        assert(req->bounce_iov);
-        qemu_iovec_from_buf(req->read_qiov, 0, req->bounce_iov->iov_base, len);
-        qemu_iovec_destroy(req->read_qiov);
-        g_slice_free(QEMUIOVector, req->read_qiov);
-    }
-
-    if (req->bounce_iov) {
-        qemu_vfree(req->bounce_iov->iov_base);
-        g_slice_free(struct iovec, req->bounce_iov);
-    }
+    VirtIOBlockDataPlane *s = req->dataplane;
+    unsigned int head = req->head;
+    struct virtio_blk_inhdr hdr = {
+        .status = status,
+    };
 
-    qemu_iovec_from_buf(req->inhdr, 0, &hdr, sizeof(hdr));
-    qemu_iovec_destroy(req->inhdr);
-    g_slice_free(QEMUIOVector, req->inhdr);
+    qemu_iovec_from_buf(&req->inhdr, 0, &hdr, sizeof(hdr));
+    qemu_iovec_destroy(&req->inhdr);
+    qemu_iovec_destroy(&req->data);
+    g_slice_free(VirtIOBlockRequest, req);
 
     /* According to the virtio specification len should be the number of bytes
      * written to, but for virtio-blk it seems to be the number of bytes
      * transferred plus the status bytes.
      */
-    vring_push(&s->vring, req->head, len + sizeof(hdr));
+    vring_push(&s->vring, head, len + sizeof(hdr));
+    notify_guest(s);
 
     s->num_reqs--;
 }
 
-static void complete_request_early(VirtIOBlockDataPlane *s, unsigned int head,
-                                   QEMUIOVector *inhdr, unsigned char status)
+static void request_cb(void *opaque, int ret)
 {
-    struct virtio_blk_inhdr hdr = {
-        .status = status,
-    };
+    VirtIOBlockRequest *req = opaque;
+    VirtIOBlockDataPlane *s = req->dataplane;
+    unsigned char status;
+    int len;
 
-    qemu_iovec_from_buf(inhdr, 0, &hdr, sizeof(hdr));
-    qemu_iovec_destroy(inhdr);
-    g_slice_free(QEMUIOVector, inhdr);
+    /* Completion must be invoked in our thread */
+    assert(*tls_get_thread_aio_context() == s->ctx);
 
-    vring_push(&s->vring, head, sizeof(hdr));
-    notify_guest(s);
+    if (likely(ret >= 0)) {
+        status = VIRTIO_BLK_S_OK;
+        len = ret;
+    } else {
+        status = VIRTIO_BLK_S_IOERR;
+        len = 0;
+    }
+
+    trace_virtio_blk_data_plane_complete_request(s, req->head, ret);
+
+    complete_request(req, status, len);
 }
 
 /* Get disk serial number */
-static void do_get_id_cmd(VirtIOBlockDataPlane *s,
-                          struct iovec *iov, unsigned int iov_cnt,
-                          unsigned int head, QEMUIOVector *inhdr)
+static void do_get_id_cmd(VirtIOBlockRequest *req,
+                          struct iovec *iov, unsigned int iov_cnt)
 {
+    VirtIOBlockDataPlane *s = req->dataplane;
     char id[VIRTIO_BLK_ID_BYTES];
 
     /* Serial number not NUL-terminated when shorter than buffer */
     strncpy(id, s->blk->serial ? s->blk->serial : "", sizeof(id));
     iov_from_buf(iov, iov_cnt, 0, id, sizeof(id));
-    complete_request_early(s, head, inhdr, VIRTIO_BLK_S_OK);
+    complete_request(req, VIRTIO_BLK_S_OK, 0);
 }
 
-static int do_rdwr_cmd(VirtIOBlockDataPlane *s, bool read,
+static int do_rdwr_cmd(VirtIOBlockRequest *req, bool read,
                        struct iovec *iov, unsigned int iov_cnt,
-                       long long offset, unsigned int head,
-                       QEMUIOVector *inhdr)
+                       long long offset)
 {
-    struct iocb *iocb;
-    QEMUIOVector qiov;
-    struct iovec *bounce_iov = NULL;
-    QEMUIOVector *read_qiov = NULL;
-
-    qemu_iovec_init_external(&qiov, iov, iov_cnt);
-    if (!bdrv_qiov_is_aligned(s->blk->conf.bs, &qiov)) {
-        void *bounce_buffer = qemu_blockalign(s->blk->conf.bs, qiov.size);
-
-        if (read) {
-            /* Need to copy back from bounce buffer on completion */
-            read_qiov = g_slice_new(QEMUIOVector);
-            qemu_iovec_init(read_qiov, iov_cnt);
-            qemu_iovec_concat_iov(read_qiov, iov, iov_cnt, 0, qiov.size);
-        } else {
-            qemu_iovec_to_buf(&qiov, 0, bounce_buffer, qiov.size);
-        }
+    VirtIOBlockDataPlane *s = req->dataplane;
+    int64_t sector_num = offset / BDRV_SECTOR_SIZE;
+    int nb_sectors;
+    unsigned int i;
 
-        /* Redirect I/O to aligned bounce buffer */
-        bounce_iov = g_slice_new(struct iovec);
-        bounce_iov->iov_base = bounce_buffer;
-        bounce_iov->iov_len = qiov.size;
-        iov = bounce_iov;
-        iov_cnt = 1;
+    for (i = 0; i < iov_cnt; i++) {
+        qemu_iovec_add(&req->data, iov[i].iov_base, iov[i].iov_len);
     }
+    nb_sectors = req->data.size / BDRV_SECTOR_SIZE;
 
-    iocb = ioq_rdwr(&s->ioqueue, read, iov, iov_cnt, offset);
-
-    /* Fill in virtio block metadata needed for completion */
-    VirtIOBlockRequest *req = container_of(iocb, VirtIOBlockRequest, iocb);
-    req->head = head;
-    req->inhdr = inhdr;
-    req->bounce_iov = bounce_iov;
-    req->read_qiov = read_qiov;
+    if (read) {
+        bdrv_aio_readv(s->bs, sector_num, &req->data, nb_sectors,
+                       request_cb, req);
+    } else {
+        bdrv_aio_writev(s->bs, sector_num, &req->data, nb_sectors,
+                        request_cb, req);
+    }
     return 0;
 }
 
-static int process_request(IOQueue *ioq, struct iovec iov[],
+static int process_request(VirtIOBlockDataPlane *s, struct iovec iov[],
                            unsigned int out_num, unsigned int in_num,
                            unsigned int head)
 {
-    VirtIOBlockDataPlane *s = container_of(ioq, VirtIOBlockDataPlane, ioqueue);
+    VirtIOBlockRequest *req;
     struct iovec *in_iov = &iov[out_num];
     struct virtio_blk_outhdr outhdr;
-    QEMUIOVector *inhdr;
     size_t in_size;
 
+    req = g_slice_new(VirtIOBlockRequest);
+    req->dataplane = s;
+    req->head = head;
+    qemu_iovec_init(&req->data, out_num + in_num);
+    qemu_iovec_init(&req->inhdr, 1);
+
+    s->num_reqs++;
+
     /* Copy in outhdr */
     if (unlikely(iov_to_buf(iov, out_num, 0, &outhdr,
                             sizeof(outhdr)) != sizeof(outhdr))) {
         error_report("virtio-blk request outhdr too short");
-        return -EFAULT;
+        goto fault;
     }
     iov_discard_front(&iov, &out_num, sizeof(outhdr));
 
@@ -217,11 +189,9 @@ static int process_request(IOQueue *ioq, struct iovec iov[],
     in_size = iov_size(in_iov, in_num);
     if (in_size < sizeof(struct virtio_blk_inhdr)) {
         error_report("virtio_blk request inhdr too short");
-        return -EFAULT;
+        goto fault;
     }
-    inhdr = g_slice_new(QEMUIOVector);
-    qemu_iovec_init(inhdr, 1);
-    qemu_iovec_concat_iov(inhdr, in_iov, in_num,
+    qemu_iovec_concat_iov(&req->inhdr, in_iov, in_num,
             in_size - sizeof(struct virtio_blk_inhdr),
             sizeof(struct virtio_blk_inhdr));
     iov_discard_back(in_iov, &in_num, sizeof(struct virtio_blk_inhdr));
@@ -231,37 +201,37 @@ static int process_request(IOQueue *ioq, struct iovec iov[],
 
     switch (outhdr.type) {
     case VIRTIO_BLK_T_IN:
-        do_rdwr_cmd(s, true, in_iov, in_num, outhdr.sector * 512, head, inhdr);
+        do_rdwr_cmd(req, true, in_iov, in_num, outhdr.sector * 512);
         return 0;
 
     case VIRTIO_BLK_T_OUT:
-        do_rdwr_cmd(s, false, iov, out_num, outhdr.sector * 512, head, inhdr);
+        do_rdwr_cmd(req, false, iov, out_num, outhdr.sector * 512);
         return 0;
 
     case VIRTIO_BLK_T_SCSI_CMD:
         /* TODO support SCSI commands */
-        complete_request_early(s, head, inhdr, VIRTIO_BLK_S_UNSUPP);
+        complete_request(req, VIRTIO_BLK_S_UNSUPP, 0);
         return 0;
 
     case VIRTIO_BLK_T_FLUSH:
-        /* TODO fdsync not supported by Linux AIO, do it synchronously here! */
-        if (qemu_fdatasync(s->fd) < 0) {
-            complete_request_early(s, head, inhdr, VIRTIO_BLK_S_IOERR);
-        } else {
-            complete_request_early(s, head, inhdr, VIRTIO_BLK_S_OK);
-        }
+        bdrv_aio_flush(s->bs, request_cb, req);
         return 0;
 
     case VIRTIO_BLK_T_GET_ID:
-        do_get_id_cmd(s, in_iov, in_num, head, inhdr);
+        do_get_id_cmd(req, in_iov, in_num);
         return 0;
 
     default:
         error_report("virtio-blk unsupported request type %#x", outhdr.type);
-        qemu_iovec_destroy(inhdr);
-        g_slice_free(QEMUIOVector, inhdr);
-        return -EFAULT;
+        goto fault;
     }
+
+fault:
+    qemu_iovec_destroy(&req->data);
+    qemu_iovec_destroy(&req->inhdr);
+    g_slice_free(VirtIOBlockRequest, req);
+    s->num_reqs--;
+    return -EFAULT;
 }
 
 static int flush_true(EventNotifier *e)
@@ -273,20 +243,8 @@ static void handle_notify(EventNotifier *e)
 {
     VirtIOBlockDataPlane *s = container_of(e, VirtIOBlockDataPlane,
                                            host_notifier);
-
-    /* There is one array of iovecs into which all new requests are extracted
-     * from the vring.  Requests are read from the vring and the translated
-     * descriptors are written to the iovecs array.  The iovecs do not have to
-     * persist across handle_notify() calls because the kernel copies the
-     * iovecs on io_submit().
-     *
-     * Handling io_submit() EAGAIN may require storing the requests across
-     * handle_notify() calls until the kernel has sufficient resources to
-     * accept more I/O.  This is not implemented yet.
-     */
-    struct iovec iovec[VRING_MAX];
-    struct iovec *end = &iovec[VRING_MAX];
-    struct iovec *iov = iovec;
+    struct iovec iov[VRING_MAX];
+    struct iovec *end = &iov[VRING_MAX];
 
     /* When a request is read from the vring, the index of the first descriptor
      * (aka head) is returned so that the completed request can be pushed onto
@@ -297,7 +255,6 @@ static void handle_notify(EventNotifier *e)
      */
     int head;
     unsigned int out_num = 0, in_num = 0;
-    unsigned int num_queued;
 
     event_notifier_test_and_clear(&s->host_notifier);
     for (;;) {
@@ -313,11 +270,10 @@ static void handle_notify(EventNotifier *e)
             trace_virtio_blk_data_plane_process_request(s, out_num, in_num,
                                                         head);
 
-            if (process_request(&s->ioqueue, iov, out_num, in_num, head) < 0) {
+            if (process_request(s, iov, out_num, in_num, head) < 0) {
                 vring_set_broken(&s->vring);
                 break;
             }
-            iov += out_num + in_num;
         }
 
         if (likely(head == -EAGAIN)) { /* vring emptied */
@@ -327,58 +283,18 @@ static void handle_notify(EventNotifier *e)
             if (vring_enable_notification(s->vdev, &s->vring)) {
                 break;
             }
-        } else { /* head == -ENOBUFS or fatal error, iovecs[] is depleted */
-            /* Since there are no iovecs[] left, stop processing for now.  Do
-             * not re-enable guest->host notifies since the I/O completion
-             * handler knows to check for more vring descriptors anyway.
-             */
-            break;
-        }
-    }
-
-    num_queued = ioq_num_queued(&s->ioqueue);
-    if (num_queued > 0) {
-        s->num_reqs += num_queued;
-
-        int rc = ioq_submit(&s->ioqueue);
-        if (unlikely(rc < 0)) {
-            fprintf(stderr, "ioq_submit failed %d\n", rc);
-            exit(1);
+        } else {
+            return; /* Fatal error, leave ring in broken state */
         }
     }
 }
 
-static int flush_io(EventNotifier *e)
-{
-    VirtIOBlockDataPlane *s = container_of(e, VirtIOBlockDataPlane,
-                                           io_notifier);
-
-    return s->num_reqs > 0;
-}
-
-static void handle_io(EventNotifier *e)
-{
-    VirtIOBlockDataPlane *s = container_of(e, VirtIOBlockDataPlane,
-                                           io_notifier);
-
-    event_notifier_test_and_clear(&s->io_notifier);
-    if (ioq_run_completion(&s->ioqueue, complete_request, s) > 0) {
-        notify_guest(s);
-    }
-
-    /* If there were more requests than iovecs, the vring will not be empty yet
-     * so check again.  There should now be enough resources to process more
-     * requests.
-     */
-    if (unlikely(vring_more_avail(&s->vring))) {
-        handle_notify(&s->host_notifier);
-    }
-}
-
 static void *data_plane_thread(void *opaque)
 {
     VirtIOBlockDataPlane *s = opaque;
 
+    *tls_alloc_thread_aio_context() = s->ctx;
+
     do {
         aio_poll(s->ctx, true);
     } while (!s->stopping || s->num_reqs > 0);
@@ -399,7 +315,6 @@ bool virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *blk,
                                   VirtIOBlockDataPlane **dataplane)
 {
     VirtIOBlockDataPlane *s;
-    int fd;
 
     *dataplane = NULL;
 
@@ -418,20 +333,13 @@ bool virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *blk,
         return false;
     }
 
-    fd = raw_get_aio_fd(blk->conf.bs);
-    if (fd < 0) {
-        error_report("drive is incompatible with x-data-plane, "
-                     "use format=raw,cache=none,aio=native");
-        return false;
-    }
-
     s = g_new0(VirtIOBlockDataPlane, 1);
     s->vdev = vdev;
-    s->fd = fd;
     s->blk = blk;
+    s->bs = blk->conf.bs;
 
     /* Prevent block operations that conflict with data plane thread */
-    bdrv_set_in_use(blk->conf.bs, 1);
+    bdrv_set_in_use(s->bs, 1);
 
     error_setg(&s->migration_blocker,
             "x-data-plane does not support migration");
@@ -450,7 +358,7 @@ void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane *s)
     virtio_blk_data_plane_stop(s);
     migrate_del_blocker(s->migration_blocker);
     error_free(s->migration_blocker);
-    bdrv_set_in_use(s->blk->conf.bs, 0);
+    bdrv_set_in_use(s->bs, 0);
     g_free(s);
 }
 
@@ -459,7 +367,6 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
     BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(s->vdev)));
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     VirtQueue *vq;
-    int i;
 
     if (s->started) {
         return;
@@ -488,14 +395,6 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
     s->host_notifier = *virtio_queue_get_host_notifier(vq);
     aio_set_event_notifier(s->ctx, &s->host_notifier, handle_notify, flush_true);
 
-    /* Set up ioqueue */
-    ioq_init(&s->ioqueue, s->fd, REQ_MAX);
-    for (i = 0; i < ARRAY_SIZE(s->requests); i++) {
-        ioq_put_iocb(&s->ioqueue, &s->requests[i].iocb);
-    }
-    s->io_notifier = *ioq_get_notifier(&s->ioqueue);
-    aio_set_event_notifier(s->ctx, &s->io_notifier, handle_io, flush_io);
-
     s->started = true;
     trace_virtio_blk_data_plane_start(s);
 
@@ -526,9 +425,6 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s)
         qemu_thread_join(&s->thread);
     }
 
-    aio_set_event_notifier(s->ctx, &s->io_notifier, NULL, NULL);
-    ioq_cleanup(&s->ioqueue);
-
     aio_set_event_notifier(s->ctx, &s->host_notifier, NULL, NULL);
     k->set_host_notifier(qbus->parent, 0, false);
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 12/13] dataplane: drop ioq Linux AIO request queue
  2013-07-15 14:42 [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer Stefan Hajnoczi
                   ` (10 preceding siblings ...)
  2013-07-15 14:43 ` [Qemu-devel] [PATCH v2 11/13] dataplane: use block layer for I/O Stefan Hajnoczi
@ 2013-07-15 14:43 ` Stefan Hajnoczi
  2013-07-15 14:43 ` [Qemu-devel] [PATCH v2 13/13] block: drop raw_get_aio_fd() Stefan Hajnoczi
  2013-07-15 17:05 ` [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer Paolo Bonzini
  13 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-07-15 14:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

virtio-blk data plane used a custom Linux AIO request queue called
"ioq".  Now that virtio-blk data plane uses the QEMU block layer we can
drop ioq.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/block/dataplane/Makefile.objs |   2 +-
 hw/block/dataplane/ioq.c         | 117 ---------------------------------------
 hw/block/dataplane/ioq.h         |  57 -------------------
 3 files changed, 1 insertion(+), 175 deletions(-)
 delete mode 100644 hw/block/dataplane/ioq.c
 delete mode 100644 hw/block/dataplane/ioq.h

diff --git a/hw/block/dataplane/Makefile.objs b/hw/block/dataplane/Makefile.objs
index 9da2eb8..e786f66 100644
--- a/hw/block/dataplane/Makefile.objs
+++ b/hw/block/dataplane/Makefile.objs
@@ -1 +1 @@
-obj-y += ioq.o virtio-blk.o
+obj-y += virtio-blk.o
diff --git a/hw/block/dataplane/ioq.c b/hw/block/dataplane/ioq.c
deleted file mode 100644
index f709f87..0000000
--- a/hw/block/dataplane/ioq.c
+++ /dev/null
@@ -1,117 +0,0 @@
-/*
- * Linux AIO request queue
- *
- * Copyright 2012 IBM, Corp.
- * Copyright 2012 Red Hat, Inc. and/or its affiliates
- *
- * Authors:
- *   Stefan Hajnoczi <stefanha@redhat.com>
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#include "ioq.h"
-
-void ioq_init(IOQueue *ioq, int fd, unsigned int max_reqs)
-{
-    int rc;
-
-    ioq->fd = fd;
-    ioq->max_reqs = max_reqs;
-
-    memset(&ioq->io_ctx, 0, sizeof ioq->io_ctx);
-    rc = io_setup(max_reqs, &ioq->io_ctx);
-    if (rc != 0) {
-        fprintf(stderr, "ioq io_setup failed %d\n", rc);
-        exit(1);
-    }
-
-    rc = event_notifier_init(&ioq->io_notifier, 0);
-    if (rc != 0) {
-        fprintf(stderr, "ioq io event notifier creation failed %d\n", rc);
-        exit(1);
-    }
-
-    ioq->freelist = g_malloc0(sizeof ioq->freelist[0] * max_reqs);
-    ioq->freelist_idx = 0;
-
-    ioq->queue = g_malloc0(sizeof ioq->queue[0] * max_reqs);
-    ioq->queue_idx = 0;
-}
-
-void ioq_cleanup(IOQueue *ioq)
-{
-    g_free(ioq->freelist);
-    g_free(ioq->queue);
-
-    event_notifier_cleanup(&ioq->io_notifier);
-    io_destroy(ioq->io_ctx);
-}
-
-EventNotifier *ioq_get_notifier(IOQueue *ioq)
-{
-    return &ioq->io_notifier;
-}
-
-struct iocb *ioq_get_iocb(IOQueue *ioq)
-{
-    /* Underflow cannot happen since ioq is sized for max_reqs */
-    assert(ioq->freelist_idx != 0);
-
-    struct iocb *iocb = ioq->freelist[--ioq->freelist_idx];
-    ioq->queue[ioq->queue_idx++] = iocb;
-    return iocb;
-}
-
-void ioq_put_iocb(IOQueue *ioq, struct iocb *iocb)
-{
-    /* Overflow cannot happen since ioq is sized for max_reqs */
-    assert(ioq->freelist_idx != ioq->max_reqs);
-
-    ioq->freelist[ioq->freelist_idx++] = iocb;
-}
-
-struct iocb *ioq_rdwr(IOQueue *ioq, bool read, struct iovec *iov,
-                      unsigned int count, long long offset)
-{
-    struct iocb *iocb = ioq_get_iocb(ioq);
-
-    if (read) {
-        io_prep_preadv(iocb, ioq->fd, iov, count, offset);
-    } else {
-        io_prep_pwritev(iocb, ioq->fd, iov, count, offset);
-    }
-    io_set_eventfd(iocb, event_notifier_get_fd(&ioq->io_notifier));
-    return iocb;
-}
-
-int ioq_submit(IOQueue *ioq)
-{
-    int rc = io_submit(ioq->io_ctx, ioq->queue_idx, ioq->queue);
-    ioq->queue_idx = 0; /* reset */
-    return rc;
-}
-
-int ioq_run_completion(IOQueue *ioq, IOQueueCompletion *completion,
-                       void *opaque)
-{
-    struct io_event events[ioq->max_reqs];
-    int nevents, i;
-
-    do {
-        nevents = io_getevents(ioq->io_ctx, 0, ioq->max_reqs, events, NULL);
-    } while (nevents < 0 && errno == EINTR);
-    if (nevents < 0) {
-        return nevents;
-    }
-
-    for (i = 0; i < nevents; i++) {
-        ssize_t ret = ((uint64_t)events[i].res2 << 32) | events[i].res;
-
-        completion(events[i].obj, ret, opaque);
-        ioq_put_iocb(ioq, events[i].obj);
-    }
-    return nevents;
-}
diff --git a/hw/block/dataplane/ioq.h b/hw/block/dataplane/ioq.h
deleted file mode 100644
index b49b5de..0000000
--- a/hw/block/dataplane/ioq.h
+++ /dev/null
@@ -1,57 +0,0 @@
-/*
- * Linux AIO request queue
- *
- * Copyright 2012 IBM, Corp.
- * Copyright 2012 Red Hat, Inc. and/or its affiliates
- *
- * Authors:
- *   Stefan Hajnoczi <stefanha@redhat.com>
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#ifndef IOQ_H
-#define IOQ_H
-
-#include <libaio.h>
-#include "qemu/event_notifier.h"
-
-typedef struct {
-    int fd;                         /* file descriptor */
-    unsigned int max_reqs;          /* max length of freelist and queue */
-
-    io_context_t io_ctx;            /* Linux AIO context */
-    EventNotifier io_notifier;      /* Linux AIO eventfd */
-
-    /* Requests can complete in any order so a free list is necessary to manage
-     * available iocbs.
-     */
-    struct iocb **freelist;         /* free iocbs */
-    unsigned int freelist_idx;
-
-    /* Multiple requests are queued up before submitting them all in one go */
-    struct iocb **queue;            /* queued iocbs */
-    unsigned int queue_idx;
-} IOQueue;
-
-void ioq_init(IOQueue *ioq, int fd, unsigned int max_reqs);
-void ioq_cleanup(IOQueue *ioq);
-EventNotifier *ioq_get_notifier(IOQueue *ioq);
-struct iocb *ioq_get_iocb(IOQueue *ioq);
-void ioq_put_iocb(IOQueue *ioq, struct iocb *iocb);
-struct iocb *ioq_rdwr(IOQueue *ioq, bool read, struct iovec *iov,
-                      unsigned int count, long long offset);
-int ioq_submit(IOQueue *ioq);
-
-static inline unsigned int ioq_num_queued(IOQueue *ioq)
-{
-    return ioq->queue_idx;
-}
-
-typedef void IOQueueCompletion(struct iocb *iocb, ssize_t ret, void *opaque);
-int ioq_run_completion(IOQueue *ioq, IOQueueCompletion *completion,
-                       void *opaque);
-
-#endif /* IOQ_H */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 13/13] block: drop raw_get_aio_fd()
  2013-07-15 14:42 [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer Stefan Hajnoczi
                   ` (11 preceding siblings ...)
  2013-07-15 14:43 ` [Qemu-devel] [PATCH v2 12/13] dataplane: drop ioq Linux AIO request queue Stefan Hajnoczi
@ 2013-07-15 14:43 ` Stefan Hajnoczi
  2013-07-15 17:05 ` [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer Paolo Bonzini
  13 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-07-15 14:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

virtio-blk data plane only worked with -drive format=raw,aio=native.  It
used raw_get_aio_fd() to fetch the file descriptor from raw-posix.  This
was a layering violation and is no longer necessary now that virtio-blk
data plane uses the QEMU block layer.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/raw-posix.c     | 34 ----------------------------------
 include/block/block.h |  8 --------
 2 files changed, 42 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index e8eb396..116b4d3 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1903,40 +1903,6 @@ static BlockDriver bdrv_host_cdrom = {
 };
 #endif /* __FreeBSD__ */
 
-#ifdef CONFIG_LINUX_AIO
-/**
- * Return the file descriptor for Linux AIO
- *
- * This function is a layering violation and should be removed when it becomes
- * possible to call the block layer outside the global mutex.  It allows the
- * caller to hijack the file descriptor so I/O can be performed outside the
- * block layer.
- */
-int raw_get_aio_fd(BlockDriverState *bs)
-{
-    BDRVRawState *s;
-
-    if (!bs->drv) {
-        return -ENOMEDIUM;
-    }
-
-    if (bs->drv == bdrv_find_format("raw")) {
-        bs = bs->file;
-    }
-
-    /* raw-posix has several protocols so just check for raw_aio_readv */
-    if (bs->drv->bdrv_aio_readv != raw_aio_readv) {
-        return -ENOTSUP;
-    }
-
-    s = bs->opaque;
-    if (!s->use_aio) {
-        return -ENOTSUP;
-    }
-    return s->fd;
-}
-#endif /* CONFIG_LINUX_AIO */
-
 static void bdrv_file_init(void)
 {
     /*
diff --git a/include/block/block.h b/include/block/block.h
index 977d5f6..cb44f27 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -365,14 +365,6 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs);
 void bdrv_set_in_use(BlockDriverState *bs, int in_use);
 int bdrv_in_use(BlockDriverState *bs);
 
-#ifdef CONFIG_LINUX_AIO
-int raw_get_aio_fd(BlockDriverState *bs);
-#else
-static inline int raw_get_aio_fd(BlockDriverState *bs)
-{
-    return -ENOTSUP;
-}
-#endif
 
 enum BlockAcctType {
     BDRV_ACCT_READ,
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer
  2013-07-15 14:42 [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer Stefan Hajnoczi
                   ` (12 preceding siblings ...)
  2013-07-15 14:43 ` [Qemu-devel] [PATCH v2 13/13] block: drop raw_get_aio_fd() Stefan Hajnoczi
@ 2013-07-15 17:05 ` Paolo Bonzini
  2013-07-17  9:22   ` Stefan Hajnoczi
  13 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2013-07-15 17:05 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-devel

Il 15/07/2013 16:42, Stefan Hajnoczi ha scritto:
> v2:
>  * Rebased onto qemu.git/master
>  * Added comment explaining how the dataplane thread is restarted after draining [pbonzini]
> 
> This series adds image format, QMP 'transation', and QMP 'block_resize' support
> to dataplane.  This is done by replacing custom Linux AIO code with QEMU block
> layer APIs.
> 
> In order for block layer APIs to be safe, bdrv_drain_all() is modified to be
> aware of device emulation threads (like dataplane).
> BlockDevOps->drain_threads_cb() is introduced as a way to stop device emulation
> threads temporarily.  Device emulation threads may resume after this main loop
> iteration completes, which is enough to allow code running under the QEMU
> global mutex a chance to use the block device temporarily.
> 
> bdrv_get_aio_context() is dropped in favor of a thread-local AioContext
> pointer.  This allows qemu_bh_new(), qemu_aio_wait(), and friends to
> automatically use the right AioContext.  When we introduce a BlockDriverState
> lock in the future, it will be possible to use block devices from multiple
> threads arbitrarily (right now we only allow it if bdrv_drain_all() is used).
> 
> Three open issues:
> 
>  * Timers only work in the main loop, so I/O throttling is ignored and QED is
>    unsafe with x-data-plane=on.  I am tackling this in a separate series that
>    makes QEMUTimer thread-safe and can then re-enable I/O throttling.
> 
>  * Peformance analysis in progress.
> 
>  * Need to test with NBD, Gluster, and other non-file protocols.

Since this is for 1.6, I am still not sure this is the right approach
(especially a block device lock scares me a bit...).

Quoting a private mail from Kevin (I guess he doesn't mind), the
possible approaches include:

* Stopping the dataplane event loop while another user accesses the
BlockDriverState

* Make other users use Bottom Halves to do a context switch into the
dataplane event loop and do their operation from there

* Add locks the BlockDriverState and just do the accesses from different
threads

and I believe a mix of these would be the way to go.

Stopping the dataplane event loop is not really necessary for
correctness; that only requires stopping the ioeventfd, and indeed this
series already includes patches for this.  So you definitely have my
"approval" (in quotes because you're the maintainer...) for patches 1-2-3.

However, I'm not sure it is feasible to have a single AioContext per
thread.  Consider a backup job whose target is not running in the same
AioContext as the source; for optimization it might use bdrv_aio_*
instead of coroutine-based I/O.

So, once you have stopped the ioeventfd to correctly support
bdrv_drain_all(), I would like to see code that makes other threads use
bottom halves to do a context switch.  Ping Fan's patches for a
thread-safe BH list are useful for this.

I think there are two ways to implement synchronous operations, there
are two approaches:

* A lock, just like Kevin mentioned (though I would place it on the
AioContext, not the BlockDriverState).  Then you can call aio_wait under
the lock in bdrv_read and friends.

* A "complementary" bottom half that is scheduled in the
qemu_aio_context.  This second bottom half terminates processing in the
dataplane thread and restarts the thread starting the synchronous
operation.  I'm not sure how easy it would be to write infrastructure
for this, but basically it'd mean adding a third path ("in coroutine
context, but in the wrong AioContext") to the current "if
(qemu_in_coroutine())" tests that block.c has.

Paolo


> Paolo Bonzini (2):
>   exec: do not use qemu/tls.h
>   qemu-thread: add TLS wrappers
> 
> Stefan Hajnoczi (11):
>   dataplane: sync virtio.c and vring.c virtqueue state
>   block: add BlockDevOps->drain_threads_cb()
>   virtio-blk: implement BlockDevOps->drain_threads_cb()
>   block: add thread_aio_context TLS variable
>   block: drop bdrv_get_aio_context()
>   main-loop: use thread-local AioContext
>   linux-aio: bind EventNotifier to current AioContext
>   block: disable I/O throttling outside main loop
>   dataplane: use block layer for I/O
>   dataplane: drop ioq Linux AIO request queue
>   block: drop raw_get_aio_fd()
> 
>  async.c                             |   2 +
>  block.c                             |  17 ++-
>  block/linux-aio.c                   |  19 ++-
>  block/raw-posix.c                   |  38 +----
>  block/raw-win32.c                   |   2 +-
>  configure                           |  21 +++
>  cpus.c                              |   2 +
>  exec.c                              |  10 +-
>  hw/block/dataplane/Makefile.objs    |   2 +-
>  hw/block/dataplane/ioq.c            | 117 ---------------
>  hw/block/dataplane/ioq.h            |  57 -------
>  hw/block/dataplane/virtio-blk.c     | 290 ++++++++++++------------------------
>  hw/block/virtio-blk.c               |  16 ++
>  hw/virtio/dataplane/vring.c         |   8 +-
>  include/block/aio.h                 |   4 +
>  include/block/block.h               |  17 ++-
>  include/block/block_int.h           |   7 -
>  include/hw/virtio/dataplane/vring.h |   2 +-
>  include/qemu/tls.h                  | 125 +++++++++++++---
>  include/qom/cpu.h                   |  14 +-
>  main-loop.c                         |  16 +-
>  tests/Makefile                      |   3 +
>  tests/test-tls.c                    |  87 +++++++++++
>  util/qemu-thread-win32.c            |  17 +++
>  24 files changed, 426 insertions(+), 467 deletions(-)
>  delete mode 100644 hw/block/dataplane/ioq.c
>  delete mode 100644 hw/block/dataplane/ioq.h
>  create mode 100644 tests/test-tls.c
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer
  2013-07-15 17:05 ` [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer Paolo Bonzini
@ 2013-07-17  9:22   ` Stefan Hajnoczi
  2013-07-17  9:55     ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-07-17  9:22 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On Mon, Jul 15, 2013 at 07:05:50PM +0200, Paolo Bonzini wrote:
> Il 15/07/2013 16:42, Stefan Hajnoczi ha scritto:
> > v2:
> >  * Rebased onto qemu.git/master
> >  * Added comment explaining how the dataplane thread is restarted after draining [pbonzini]
> > 
> > This series adds image format, QMP 'transation', and QMP 'block_resize' support
> > to dataplane.  This is done by replacing custom Linux AIO code with QEMU block
> > layer APIs.
> > 
> > In order for block layer APIs to be safe, bdrv_drain_all() is modified to be
> > aware of device emulation threads (like dataplane).
> > BlockDevOps->drain_threads_cb() is introduced as a way to stop device emulation
> > threads temporarily.  Device emulation threads may resume after this main loop
> > iteration completes, which is enough to allow code running under the QEMU
> > global mutex a chance to use the block device temporarily.
> > 
> > bdrv_get_aio_context() is dropped in favor of a thread-local AioContext
> > pointer.  This allows qemu_bh_new(), qemu_aio_wait(), and friends to
> > automatically use the right AioContext.  When we introduce a BlockDriverState
> > lock in the future, it will be possible to use block devices from multiple
> > threads arbitrarily (right now we only allow it if bdrv_drain_all() is used).
> > 
> > Three open issues:
> > 
> >  * Timers only work in the main loop, so I/O throttling is ignored and QED is
> >    unsafe with x-data-plane=on.  I am tackling this in a separate series that
> >    makes QEMUTimer thread-safe and can then re-enable I/O throttling.
> > 
> >  * Peformance analysis in progress.
> > 
> >  * Need to test with NBD, Gluster, and other non-file protocols.
> 
> Since this is for 1.6, I am still not sure this is the right approach
> (especially a block device lock scares me a bit...).

I agree that the block device lock looks difficult.  When I started in
that direction it quickly became clear that it requires a lot of effort
to solve.  It is hard because of BDS relationships and the fact that I/O
requests live across main loop iterations (they must be reentered,
either as a callback or a coroutine).

But that approach is not implemented in this patch series...

> Quoting a private mail from Kevin (I guess he doesn't mind), the
> possible approaches include:
> 
> * Stopping the dataplane event loop while another user accesses the
> BlockDriverState
> 
> * Make other users use Bottom Halves to do a context switch into the
> dataplane event loop and do their operation from there
> 
> * Add locks the BlockDriverState and just do the accesses from different
> threads
> 
> and I believe a mix of these would be the way to go.
>
> Stopping the dataplane event loop is not really necessary for
> correctness; that only requires stopping the ioeventfd, and indeed this
> series already includes patches for this.  So you definitely have my
> "approval" (in quotes because you're the maintainer...) for patches 1-2-3.

The TLS AioContext approach is compatible with the future improvements
that you mentioned.  Remember that BlockDevOps->drain_threads_cb() is
orthogonal to TLS AioContext.  We can still get at the AioContext for a
BDS, if necessary.

The point of TLS AioContext is to avoid explicitly passing AioContext
everywhere.  Since BH is sometimes used deep down it would be very
tedious to start passing around AioContext.  qemu_bh_*() should just do
the right thing.

BTW, besides stopping ioeventfd it is also necessary to either complete
pending I/O requests (drain) or to at least postpone callbacks for
pending I/O requests while another thread is accessing the BDS.

(The problem with postponing callbacks is that block drivers sometimes
have dependencies between requests, for example allocating write
requests that touch the some metadata.  Simply postponing callbacks
leads to deadlock when waiting for dependencies.  This is where locks
save the day at the cost of complexity because they allow multiple
threads to make progress safely.)

> However, I'm not sure it is feasible to have a single AioContext per
> thread.  Consider a backup job whose target is not running in the same
> AioContext as the source; for optimization it might use bdrv_aio_*
> instead of coroutine-based I/O.
> 
> So, once you have stopped the ioeventfd to correctly support
> bdrv_drain_all(), I would like to see code that makes other threads use
> bottom halves to do a context switch.  Ping Fan's patches for a
> thread-safe BH list are useful for this.

If I understand correctly this is means a block job wraps I/O calls so
that they are scheduled in a remote AioContext/thread when necessary.
This is necessary when a block job touches multiple BDS which belong to
different threads.

> I think there are two ways to implement synchronous operations, there
> are two approaches:
> 
> * A lock, just like Kevin mentioned (though I would place it on the
> AioContext, not the BlockDriverState).  Then you can call aio_wait under
> the lock in bdrv_read and friends.

This sounds clever.  So block jobs don't need to explicitly wrap I/O
calls, it happens automatically in block.c:bdrv_read() and friends.

> * A "complementary" bottom half that is scheduled in the
> qemu_aio_context.  This second bottom half terminates processing in the
> dataplane thread and restarts the thread starting the synchronous
> operation.  I'm not sure how easy it would be to write infrastructure
> for this, but basically it'd mean adding a third path ("in coroutine
> context, but in the wrong AioContext") to the current "if
> (qemu_in_coroutine())" tests that block.c has.

Not sure I understand this approach but maybe something like moving a
coroutine from one AioContext to another and back again.

FWIW I'm also looking at how much (little) effort it would take to
disable dataplane temporarily while block jobs are running.  Since we're
discussing switching between threads already, it's not clear that we
gain much performance by trying to use dataplane.  It clearly adds
complexity.

Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer
  2013-07-17  9:22   ` Stefan Hajnoczi
@ 2013-07-17  9:55     ` Paolo Bonzini
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2013-07-17  9:55 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

Il 17/07/2013 11:22, Stefan Hajnoczi ha scritto:
>> Stopping the dataplane event loop is not really necessary for
>> correctness; that only requires stopping the ioeventfd, and indeed this
>> series already includes patches for this.  So you definitely have my
>> "approval" (in quotes because you're the maintainer...) for patches 1-2-3.
> 
> The TLS AioContext approach is compatible with the future improvements
> that you mentioned.

Yes, it is.  But I'd rather try and understand whether we need it,
before committing it.

> The point of TLS AioContext is to avoid explicitly passing AioContext
> everywhere.  Since BH is sometimes used deep down it would be very
> tedious to start passing around AioContext.  qemu_bh_*() should just do
> the right thing.

Note that only aio_bh_new() requires the AioContext.  Scheduling the BH
or deleting doesn't.

> BTW, besides stopping ioeventfd it is also necessary to either complete
> pending I/O requests (drain) or to at least postpone callbacks for
> pending I/O requests while another thread is accessing the BDS.

Yes, draining is cool, I omitted it for simplicity because it happens
also for non-dataplane.  In fact, this is not dataplane-specific, no?
So let's go with what we've been using so far---I agree that postponing
risks getting thorny.

>> However, I'm not sure it is feasible to have a single AioContext per
>> thread.  Consider a backup job whose target is not running in the same
>> AioContext as the source; for optimization it might use bdrv_aio_*
>> instead of coroutine-based I/O.
>>
>> So, once you have stopped the ioeventfd to correctly support
>> bdrv_drain_all(), I would like to see code that makes other threads use
>> bottom halves to do a context switch.  Ping Fan's patches for a
>> thread-safe BH list are useful for this.
> 
> If I understand correctly this is means a block job wraps I/O calls so
> that they are scheduled in a remote AioContext/thread when necessary.
> This is necessary when a block job touches multiple BDS which belong to
> different threads.

Yes.  I didn't consider that at this point you don't even need to put
block jobs in the AioContext's thread, but you're right.  They can run
in the iothread.

>> I think there are two ways to implement synchronous operations, there
>> are two approaches:
>>
>> * A lock, just like Kevin mentioned (though I would place it on the
>> AioContext, not the BlockDriverState).  Then you can call aio_wait under
>> the lock in bdrv_read and friends.
> 
> This sounds clever.  So block jobs don't need to explicitly wrap I/O
> calls, it happens automatically in block.c:bdrv_read() and friends.
> 
>> * A "complementary" bottom half that is scheduled in the
>> qemu_aio_context.  This second bottom half terminates processing in the
>> dataplane thread and restarts the thread starting the synchronous
>> operation.  I'm not sure how easy it would be to write infrastructure
>> for this, but basically it'd mean adding a third path ("in coroutine
>> context, but in the wrong AioContext") to the current "if
>> (qemu_in_coroutine())" tests that block.c has.
> 
> Not sure I understand this approach but maybe something like moving a
> coroutine from one AioContext to another and back again.

Yes.  I'm not sure which is best, of course.

> FWIW I'm also looking at how much (little) effort it would take to
> disable dataplane temporarily while block jobs are running.  Since we're
> discussing switching between threads already, it's not clear that we
> gain much performance by trying to use dataplane.  It clearly adds
> complexity.

Long term I'd like to remove the non-dataplane code altogether.  I.e.
the ioeventfd is started at the first kick and stopped at drain, but
apart from that all virtio-blk I/O uses in the dataplane thread's event
loop, always.

It needs ioeventfd support in TCG, though.

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-07-17  9:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-15 14:42 [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer Stefan Hajnoczi
2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 01/13] dataplane: sync virtio.c and vring.c virtqueue state Stefan Hajnoczi
2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 02/13] block: add BlockDevOps->drain_threads_cb() Stefan Hajnoczi
2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 03/13] virtio-blk: implement BlockDevOps->drain_threads_cb() Stefan Hajnoczi
2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 04/13] exec: do not use qemu/tls.h Stefan Hajnoczi
2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 05/13] qemu-thread: add TLS wrappers Stefan Hajnoczi
2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 06/13] block: add thread_aio_context TLS variable Stefan Hajnoczi
2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 07/13] block: drop bdrv_get_aio_context() Stefan Hajnoczi
2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 08/13] main-loop: use thread-local AioContext Stefan Hajnoczi
2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 09/13] linux-aio: bind EventNotifier to current AioContext Stefan Hajnoczi
2013-07-15 14:42 ` [Qemu-devel] [PATCH v2 10/13] block: disable I/O throttling outside main loop Stefan Hajnoczi
2013-07-15 14:43 ` [Qemu-devel] [PATCH v2 11/13] dataplane: use block layer for I/O Stefan Hajnoczi
2013-07-15 14:43 ` [Qemu-devel] [PATCH v2 12/13] dataplane: drop ioq Linux AIO request queue Stefan Hajnoczi
2013-07-15 14:43 ` [Qemu-devel] [PATCH v2 13/13] block: drop raw_get_aio_fd() Stefan Hajnoczi
2013-07-15 17:05 ` [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer Paolo Bonzini
2013-07-17  9:22   ` Stefan Hajnoczi
2013-07-17  9:55     ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.