qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine
@ 2019-12-18 16:32 Stefan Hajnoczi
  2019-12-18 16:32 ` [PATCH v3 01/15] configure: permit use of io_uring Stefan Hajnoczi
                   ` (16 more replies)
  0 siblings, 17 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-12-18 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, oleksandr, Julia Suvorova,
	Markus Armbruster, Max Reitz, Stefan Hajnoczi, Paolo Bonzini,
	Aarushi Mehta

v12:
 * Reword BlockdevAioOptions QAPI schema commit description [Markus]
 * Increase QAPI "Since: 4.2" to "Since: 5.0"
 * Explain rationale for io_uring stubs in commit description [Kevin]
 * Tried to use file.aio=io_uring instead of BDRV_O_IO_URING but it's really
   hard to make qemu-iotests work.  Tests build blkdebug: and other graphs so
   the syntax for io_uring is dependent on the test case.  I scrapped this
   approach and went back to a global flag.

v11:
 * Drop fd registration because it breaks QEMU's file locking and will need to
   be resolved in a separate patch series
 * Drop line-wrapping changes that accidentally broke several qemu-iotests

v10:
 * Dropped kernel submission queue polling, it requires root and has additional
   limitations.  It should be benchmarked and considered for inclusion later,
   maybe even together with kernel side changes.
 * Add io_uring_register_files() return value to trace_luring_fd_register()
 * Fix indentation in luring_fd_unregister()
 * Set s->fd_reg.fd_array to NULL after g_free() to avoid dangling pointers
 * Simplify fd registration code
 * Add luring_fd_unregister() and call it from file-posix.c to prevent
   fd leaks
 * Add trace_luring_fd_unregister() trace event
 * Add missing space to qemu-img command-line documentation
 * Update MAINTAINERS file [Julia]
 * Rename MAX_EVENTS to MAX_ENTRIES [Julia]
 * Define ioq_submit() before callers so the prototype isn't necessary [Julia]
 * Declare variables at the beginning of the block in luring_init() [Julia]

This patch series is based on Aarushi Mehta's v9 patch series written for
Google Summer of Code 2019:

  https://lists.gnu.org/archive/html/qemu-devel/2019-08/msg00179.html

It adds a new AIO engine that uses the new Linux io_uring API.  This is the
successor to Linux AIO with a number of improvements:
1. Both O_DIRECT and buffered I/O work
2. fdatasync(2) is supported (no need for a separate thread pool!)
3. True async behavior so the syscall doesn't block (Linux AIO got there to some degree...)
4. Advanced performance optimizations are available (file registration, memory
   buffer registration, completion polling, submission polling).

Since Aarushi has been busy, I have taken up this patch series.  Booting a
guest works with -drive aio=io_uring and -drive aio=io_uring,cache=none with a
raw file on XFS.

I currently recommend using -drive aio=io_uring only with host block devices
(like NVMe devices).  As of Linux v5.4-rc1 I still hit kernel bugs when using
image files on ext4 or XFS.

Aarushi Mehta (15):
  configure: permit use of io_uring
  qapi/block-core: add option for io_uring
  block/block: add BDRV flag for io_uring
  block/io_uring: implements interfaces for io_uring
  stubs: add stubs for io_uring interface
  util/async: add aio interfaces for io_uring
  blockdev: adds bdrv_parse_aio to use io_uring
  block/file-posix.c: extend to use io_uring
  block: add trace events for io_uring
  block/io_uring: adds userspace completion polling
  qemu-io: adds option to use aio engine
  qemu-img: adds option to use aio engine for benchmarking
  qemu-nbd: adds option for aio engines
  tests/qemu-iotests: enable testing with aio options
  tests/qemu-iotests: use AIOMODE with various tests

 MAINTAINERS                   |   9 +
 block.c                       |  22 ++
 block/Makefile.objs           |   3 +
 block/file-posix.c            |  95 ++++++--
 block/io_uring.c              | 433 ++++++++++++++++++++++++++++++++++
 block/trace-events            |  12 +
 blockdev.c                    |  12 +-
 configure                     |  27 +++
 include/block/aio.h           |  16 +-
 include/block/block.h         |   2 +
 include/block/raw-aio.h       |  12 +
 qapi/block-core.json          |   4 +-
 qemu-img-cmds.hx              |   4 +-
 qemu-img.c                    |  11 +-
 qemu-img.texi                 |   5 +-
 qemu-io.c                     |  25 +-
 qemu-nbd.c                    |  12 +-
 qemu-nbd.texi                 |   4 +-
 stubs/Makefile.objs           |   1 +
 stubs/io_uring.c              |  32 +++
 tests/qemu-iotests/028        |   2 +-
 tests/qemu-iotests/058        |   2 +-
 tests/qemu-iotests/089        |   4 +-
 tests/qemu-iotests/091        |   4 +-
 tests/qemu-iotests/109        |   2 +-
 tests/qemu-iotests/147        |   5 +-
 tests/qemu-iotests/181        |   8 +-
 tests/qemu-iotests/183        |   4 +-
 tests/qemu-iotests/185        |  10 +-
 tests/qemu-iotests/200        |   2 +-
 tests/qemu-iotests/201        |   8 +-
 tests/qemu-iotests/check      |  15 +-
 tests/qemu-iotests/common.rc  |  14 ++
 tests/qemu-iotests/iotests.py |  12 +-
 util/async.c                  |  36 +++
 35 files changed, 793 insertions(+), 76 deletions(-)
 create mode 100644 block/io_uring.c
 create mode 100644 stubs/io_uring.c

-- 
2.23.0



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 01/15] configure: permit use of io_uring
  2019-12-18 16:32 [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
@ 2019-12-18 16:32 ` Stefan Hajnoczi
  2019-12-18 16:32 ` [PATCH v3 02/15] qapi/block-core: add option for io_uring Stefan Hajnoczi
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-12-18 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Maxim Levitsky, qemu-block, oleksandr,
	Julia Suvorova, Markus Armbruster, Max Reitz, Stefan Hajnoczi,
	Paolo Bonzini, Aarushi Mehta

From: Aarushi Mehta <mehta.aaru20@gmail.com>

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Reviewed-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 configure | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/configure b/configure
index 84b413dbfc..acab024021 100755
--- a/configure
+++ b/configure
@@ -370,6 +370,7 @@ xen=""
 xen_ctrl_version=""
 xen_pci_passthrough=""
 linux_aio=""
+linux_io_uring=""
 cap_ng=""
 attr=""
 libattr=""
@@ -1251,6 +1252,10 @@ for opt do
   ;;
   --enable-linux-aio) linux_aio="yes"
   ;;
+  --disable-linux-io-uring) linux_io_uring="no"
+  ;;
+  --enable-linux-io-uring) linux_io_uring="yes"
+  ;;
   --disable-attr) attr="no"
   ;;
   --enable-attr) attr="yes"
@@ -1766,6 +1771,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   vde             support for vde network
   netmap          support for netmap network
   linux-aio       Linux AIO support
+  linux-io-uring  Linux io_uring support
   cap-ng          libcap-ng support
   attr            attr and xattr support
   vhost-net       vhost-net kernel acceleration support
@@ -3988,6 +3994,21 @@ EOF
     linux_aio=no
   fi
 fi
+##########################################
+# linux-io-uring probe
+
+if test "$linux_io_uring" != "no" ; then
+  if $pkg_config liburing; then
+    linux_io_uring_cflags=$($pkg_config --cflags liburing)
+    linux_io_uring_libs=$($pkg_config --libs liburing)
+    linux_io_uring=yes
+  else
+    if test "$linux_io_uring" = "yes" ; then
+      feature_not_found "linux io_uring" "Install liburing devel"
+    fi
+    linux_io_uring=no
+  fi
+fi
 
 ##########################################
 # TPM emulation is only on POSIX
@@ -6472,6 +6493,7 @@ echo "PIE               $pie"
 echo "vde support       $vde"
 echo "netmap support    $netmap"
 echo "Linux AIO support $linux_aio"
+echo "Linux io_uring support $linux_io_uring"
 echo "ATTR/XATTR support $attr"
 echo "Install blobs     $blobs"
 echo "KVM support       $kvm"
@@ -6960,6 +6982,11 @@ fi
 if test "$linux_aio" = "yes" ; then
   echo "CONFIG_LINUX_AIO=y" >> $config_host_mak
 fi
+if test "$linux_io_uring" = "yes" ; then
+  echo "CONFIG_LINUX_IO_URING=y" >> $config_host_mak
+  echo "LINUX_IO_URING_CFLAGS=$linux_io_uring_cflags" >> $config_host_mak
+  echo "LINUX_IO_URING_LIBS=$linux_io_uring_libs" >> $config_host_mak
+fi
 if test "$attr" = "yes" ; then
   echo "CONFIG_ATTR=y" >> $config_host_mak
 fi
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 02/15] qapi/block-core: add option for io_uring
  2019-12-18 16:32 [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
  2019-12-18 16:32 ` [PATCH v3 01/15] configure: permit use of io_uring Stefan Hajnoczi
@ 2019-12-18 16:32 ` Stefan Hajnoczi
  2019-12-18 16:32 ` [PATCH v3 03/15] block/block: add BDRV flag " Stefan Hajnoczi
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-12-18 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, oleksandr, Julia Suvorova,
	Markus Armbruster, Max Reitz, Stefan Hajnoczi, Paolo Bonzini,
	Aarushi Mehta

From: Aarushi Mehta <mehta.aaru20@gmail.com>

Since io_uring is the actual name of the Linux API, we use it as enum
value even though the QAPI schema conventions would prefer io-uring.

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v12:
 * Reword commit description [Markus]
 * Update version from 4.2 to 5.0
---
 qapi/block-core.json | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0cf68fea14..18f907373c 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2851,11 +2851,13 @@
 #
 # @threads:     Use qemu's thread pool
 # @native:      Use native AIO backend (only Linux and Windows)
+# @io_uring:    Use linux io_uring (since 5.0)
 #
 # Since: 2.9
 ##
 { 'enum': 'BlockdevAioOptions',
-  'data': [ 'threads', 'native' ] }
+  'data': [ 'threads', 'native',
+            { 'name': 'io_uring', 'if': 'defined(CONFIG_LINUX_IO_URING)' } ] }
 
 ##
 # @BlockdevCacheOptions:
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 03/15] block/block: add BDRV flag for io_uring
  2019-12-18 16:32 [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
  2019-12-18 16:32 ` [PATCH v3 01/15] configure: permit use of io_uring Stefan Hajnoczi
  2019-12-18 16:32 ` [PATCH v3 02/15] qapi/block-core: add option for io_uring Stefan Hajnoczi
@ 2019-12-18 16:32 ` Stefan Hajnoczi
  2019-12-18 16:32 ` [PATCH v3 04/15] block/io_uring: implements interfaces " Stefan Hajnoczi
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-12-18 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Maxim Levitsky, qemu-block, oleksandr,
	Julia Suvorova, Markus Armbruster, Max Reitz, Stefan Hajnoczi,
	Paolo Bonzini, Aarushi Mehta

From: Aarushi Mehta <mehta.aaru20@gmail.com>

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Reviewed-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/block/block.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/block/block.h b/include/block/block.h
index 1df9848e74..7071dc041e 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -126,6 +126,7 @@ typedef struct HDGeometry {
                                       ignoring the format layer */
 #define BDRV_O_NO_IO       0x10000 /* don't initialize for I/O */
 #define BDRV_O_AUTO_RDONLY 0x20000 /* degrade to read-only if opening read-write fails */
+#define BDRV_O_IO_URING    0x40000 /* use io_uring instead of the thread pool */
 
 #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_NO_FLUSH)
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 04/15] block/io_uring: implements interfaces for io_uring
  2019-12-18 16:32 [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2019-12-18 16:32 ` [PATCH v3 03/15] block/block: add BDRV flag " Stefan Hajnoczi
@ 2019-12-18 16:32 ` Stefan Hajnoczi
  2020-01-13 11:24   ` Stefano Garzarella
  2019-12-18 16:32 ` [PATCH v3 05/15] stubs: add stubs for io_uring interface Stefan Hajnoczi
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-12-18 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, oleksandr, Julia Suvorova,
	Markus Armbruster, Max Reitz, Stefan Hajnoczi, Paolo Bonzini,
	Aarushi Mehta

From: Aarushi Mehta <mehta.aaru20@gmail.com>

Aborts when sqe fails to be set as sqes cannot be returned to the
ring. Adds slow path for short reads for older kernels

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v11:
 * Fix git bisect compilation breakage: move trace_luring_init_state()
   into the tracing commit.
v10:
 * Update MAINTAINERS file [Julia]
 * Rename MAX_EVENTS to MAX_ENTRIES [Julia]
 * Define ioq_submit() before callers so the prototype isn't necessary [Julia]
 * Declare variables at the beginning of the block in luring_init() [Julia]
---
 MAINTAINERS             |   8 +
 block/Makefile.objs     |   3 +
 block/io_uring.c        | 401 ++++++++++++++++++++++++++++++++++++++++
 include/block/aio.h     |  16 +-
 include/block/raw-aio.h |  12 ++
 5 files changed, 439 insertions(+), 1 deletion(-)
 create mode 100644 block/io_uring.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 740401bcbb..fc7f53b229 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2596,6 +2596,14 @@ F: block/file-posix.c
 F: block/file-win32.c
 F: block/win32-aio.c
 
+Linux io_uring
+M: Aarushi Mehta <mehta.aaru20@gmail.com>
+M: Julia Suvorova <jusual@redhat.com>
+M: Stefan Hajnoczi <stefanha@redhat.com>
+L: qemu-block@nongnu.org
+S: Maintained
+F: block/io_uring.c
+
 qcow2
 M: Kevin Wolf <kwolf@redhat.com>
 M: Max Reitz <mreitz@redhat.com>
diff --git a/block/Makefile.objs b/block/Makefile.objs
index e394fe0b6c..035abb9c5c 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -18,6 +18,7 @@ block-obj-y += block-backend.o snapshot.o qapi.o
 block-obj-$(CONFIG_WIN32) += file-win32.o win32-aio.o
 block-obj-$(CONFIG_POSIX) += file-posix.o
 block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
+block-obj-$(CONFIG_LINUX_IO_URING) += io_uring.o
 block-obj-y += null.o mirror.o commit.o io.o create.o
 block-obj-y += throttle-groups.o
 block-obj-$(CONFIG_LINUX) += nvme.o
@@ -65,5 +66,7 @@ block-obj-$(if $(CONFIG_LZFSE),m,n) += dmg-lzfse.o
 dmg-lzfse.o-libs   := $(LZFSE_LIBS)
 qcow.o-libs        := -lz
 linux-aio.o-libs   := -laio
+io_uring.o-cflags  := $(LINUX_IO_URING_CFLAGS)
+io_uring.o-libs    := $(LINUX_IO_URING_LIBS)
 parallels.o-cflags := $(LIBXML2_CFLAGS)
 parallels.o-libs   := $(LIBXML2_LIBS)
diff --git a/block/io_uring.c b/block/io_uring.c
new file mode 100644
index 0000000000..bb433a685b
--- /dev/null
+++ b/block/io_uring.c
@@ -0,0 +1,401 @@
+/*
+ * Linux io_uring support.
+ *
+ * Copyright (C) 2009 IBM, Corp.
+ * Copyright (C) 2009 Red Hat, Inc.
+ * Copyright (C) 2019 Aarushi Mehta
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include <liburing.h>
+#include "qemu-common.h"
+#include "block/aio.h"
+#include "qemu/queue.h"
+#include "block/block.h"
+#include "block/raw-aio.h"
+#include "qemu/coroutine.h"
+#include "qapi/error.h"
+
+/* io_uring ring size */
+#define MAX_ENTRIES 128
+
+typedef struct LuringAIOCB {
+    Coroutine *co;
+    struct io_uring_sqe sqeq;
+    ssize_t ret;
+    QEMUIOVector *qiov;
+    bool is_read;
+    QSIMPLEQ_ENTRY(LuringAIOCB) next;
+
+    /*
+     * Buffered reads may require resubmission, see
+     * luring_resubmit_short_read().
+     */
+    int total_read;
+    QEMUIOVector resubmit_qiov;
+} LuringAIOCB;
+
+typedef struct LuringQueue {
+    int plugged;
+    unsigned int in_queue;
+    unsigned int in_flight;
+    bool blocked;
+    QSIMPLEQ_HEAD(, LuringAIOCB) submit_queue;
+} LuringQueue;
+
+typedef struct LuringState {
+    AioContext *aio_context;
+
+    struct io_uring ring;
+
+    /* io queue for submit at batch.  Protected by AioContext lock. */
+    LuringQueue io_q;
+
+    /* I/O completion processing.  Only runs in I/O thread.  */
+    QEMUBH *completion_bh;
+} LuringState;
+
+/**
+ * luring_resubmit:
+ *
+ * Resubmit a request by appending it to submit_queue.  The caller must ensure
+ * that ioq_submit() is called later so that submit_queue requests are started.
+ */
+static void luring_resubmit(LuringState *s, LuringAIOCB *luringcb)
+{
+    QSIMPLEQ_INSERT_TAIL(&s->io_q.submit_queue, luringcb, next);
+    s->io_q.in_queue++;
+}
+
+/**
+ * luring_resubmit_short_read:
+ *
+ * Before Linux commit 9d93a3f5a0c ("io_uring: punt short reads to async
+ * context") a buffered I/O request with the start of the file range in the
+ * page cache could result in a short read.  Applications need to resubmit the
+ * remaining read request.
+ *
+ * This is a slow path but recent kernels never take it.
+ */
+static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb,
+                                       int nread)
+{
+    QEMUIOVector *resubmit_qiov;
+    size_t remaining;
+
+    /* Update read position */
+    luringcb->total_read = nread;
+    remaining = luringcb->qiov->size - luringcb->total_read;
+
+    /* Shorten qiov */
+    resubmit_qiov = &luringcb->resubmit_qiov;
+    if (resubmit_qiov->iov == NULL) {
+        qemu_iovec_init(resubmit_qiov, luringcb->qiov->niov);
+    } else {
+        qemu_iovec_reset(resubmit_qiov);
+    }
+    qemu_iovec_concat(resubmit_qiov, luringcb->qiov, luringcb->total_read,
+                      remaining);
+
+    /* Update sqe */
+    luringcb->sqeq.off = nread;
+    luringcb->sqeq.addr = (__u64)(uintptr_t)luringcb->resubmit_qiov.iov;
+    luringcb->sqeq.len = luringcb->resubmit_qiov.niov;
+
+    luring_resubmit(s, luringcb);
+}
+
+/**
+ * luring_process_completions:
+ * @s: AIO state
+ *
+ * Fetches completed I/O requests, consumes cqes and invokes their callbacks
+ * The function is somewhat tricky because it supports nested event loops, for
+ * example when a request callback invokes aio_poll().
+ *
+ * Function schedules BH completion so it  can be called again in a nested
+ * event loop.  When there are no events left  to complete the BH is being
+ * canceled.
+ *
+ */
+static void luring_process_completions(LuringState *s)
+{
+    struct io_uring_cqe *cqes;
+    int total_bytes;
+    /*
+     * Request completion callbacks can run the nested event loop.
+     * Schedule ourselves so the nested event loop will "see" remaining
+     * completed requests and process them.  Without this, completion
+     * callbacks that wait for other requests using a nested event loop
+     * would hang forever.
+     *
+     * This workaround is needed because io_uring uses poll_wait, which
+     * is woken up when new events are added to the uring, thus polling on
+     * the same uring fd will block unless more events are received.
+     *
+     * Other leaf block drivers (drivers that access the data themselves)
+     * are networking based, so they poll sockets for data and run the
+     * correct coroutine.
+     */
+    qemu_bh_schedule(s->completion_bh);
+
+    while (io_uring_peek_cqe(&s->ring, &cqes) == 0) {
+        LuringAIOCB *luringcb;
+        int ret;
+
+        if (!cqes) {
+            break;
+        }
+
+        luringcb = io_uring_cqe_get_data(cqes);
+        ret = cqes->res;
+        io_uring_cqe_seen(&s->ring, cqes);
+        cqes = NULL;
+
+        /* Change counters one-by-one because we can be nested. */
+        s->io_q.in_flight--;
+
+        /* total_read is non-zero only for resubmitted read requests */
+        total_bytes = ret + luringcb->total_read;
+
+        if (ret < 0) {
+            if (ret == -EINTR) {
+                luring_resubmit(s, luringcb);
+                continue;
+            }
+        } else if (!luringcb->qiov) {
+            goto end;
+        } else if (total_bytes == luringcb->qiov->size) {
+            ret = 0;
+        /* Only read/write */
+        } else {
+            /* Short Read/Write */
+            if (luringcb->is_read) {
+                if (ret > 0) {
+                    luring_resubmit_short_read(s, luringcb, ret);
+                    continue;
+                } else {
+                    /* Pad with zeroes */
+                    qemu_iovec_memset(luringcb->qiov, total_bytes, 0,
+                                      luringcb->qiov->size - total_bytes);
+                    ret = 0;
+                }
+            } else {
+                ret = -ENOSPC;;
+            }
+        }
+end:
+        luringcb->ret = ret;
+        qemu_iovec_destroy(&luringcb->resubmit_qiov);
+
+        /*
+         * If the coroutine is already entered it must be in ioq_submit()
+         * and will notice luringcb->ret has been filled in when it
+         * eventually runs later. Coroutines cannot be entered recursively
+         * so avoid doing that!
+         */
+        if (!qemu_coroutine_entered(luringcb->co)) {
+            aio_co_wake(luringcb->co);
+        }
+    }
+    qemu_bh_cancel(s->completion_bh);
+}
+
+static int ioq_submit(LuringState *s)
+{
+    int ret = 0;
+    LuringAIOCB *luringcb, *luringcb_next;
+
+    while (s->io_q.in_queue > 0) {
+        /*
+         * Try to fetch sqes from the ring for requests waiting in
+         * the overflow queue
+         */
+        QSIMPLEQ_FOREACH_SAFE(luringcb, &s->io_q.submit_queue, next,
+                              luringcb_next) {
+            struct io_uring_sqe *sqes = io_uring_get_sqe(&s->ring);
+            if (!sqes) {
+                break;
+            }
+            /* Prep sqe for submission */
+            *sqes = luringcb->sqeq;
+            QSIMPLEQ_REMOVE_HEAD(&s->io_q.submit_queue, next);
+        }
+        ret = io_uring_submit(&s->ring);
+        /* Prevent infinite loop if submission is refused */
+        if (ret <= 0) {
+            if (ret == -EAGAIN) {
+                continue;
+            }
+            break;
+        }
+        s->io_q.in_flight += ret;
+        s->io_q.in_queue  -= ret;
+    }
+    s->io_q.blocked = (s->io_q.in_queue > 0);
+
+    if (s->io_q.in_flight) {
+        /*
+         * We can try to complete something just right away if there are
+         * still requests in-flight.
+         */
+        luring_process_completions(s);
+    }
+    return ret;
+}
+
+static void luring_process_completions_and_submit(LuringState *s)
+{
+    aio_context_acquire(s->aio_context);
+    luring_process_completions(s);
+
+    if (!s->io_q.plugged && s->io_q.in_queue > 0) {
+        ioq_submit(s);
+    }
+    aio_context_release(s->aio_context);
+}
+
+static void qemu_luring_completion_bh(void *opaque)
+{
+    LuringState *s = opaque;
+    luring_process_completions_and_submit(s);
+}
+
+static void qemu_luring_completion_cb(void *opaque)
+{
+    LuringState *s = opaque;
+    luring_process_completions_and_submit(s);
+}
+
+static void ioq_init(LuringQueue *io_q)
+{
+    QSIMPLEQ_INIT(&io_q->submit_queue);
+    io_q->plugged = 0;
+    io_q->in_queue = 0;
+    io_q->in_flight = 0;
+    io_q->blocked = false;
+}
+
+void luring_io_plug(BlockDriverState *bs, LuringState *s)
+{
+    s->io_q.plugged++;
+}
+
+void luring_io_unplug(BlockDriverState *bs, LuringState *s)
+{
+    assert(s->io_q.plugged);
+    if (--s->io_q.plugged == 0 &&
+        !s->io_q.blocked && s->io_q.in_queue > 0) {
+        ioq_submit(s);
+    }
+}
+
+/**
+ * luring_do_submit:
+ * @fd: file descriptor for I/O
+ * @luringcb: AIO control block
+ * @s: AIO state
+ * @offset: offset for request
+ * @type: type of request
+ *
+ * Fetches sqes from ring, adds to pending queue and preps them
+ *
+ */
+static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
+                            uint64_t offset, int type)
+{
+    struct io_uring_sqe *sqes = &luringcb->sqeq;
+
+    switch (type) {
+    case QEMU_AIO_WRITE:
+        io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
+                             luringcb->qiov->niov, offset);
+        break;
+    case QEMU_AIO_READ:
+        io_uring_prep_readv(sqes, fd, luringcb->qiov->iov,
+                            luringcb->qiov->niov, offset);
+        break;
+    case QEMU_AIO_FLUSH:
+        io_uring_prep_fsync(sqes, fd, IORING_FSYNC_DATASYNC);
+        break;
+    default:
+        fprintf(stderr, "%s: invalid AIO request type, aborting 0x%x.\n",
+                        __func__, type);
+        abort();
+    }
+    io_uring_sqe_set_data(sqes, luringcb);
+
+    QSIMPLEQ_INSERT_TAIL(&s->io_q.submit_queue, luringcb, next);
+    s->io_q.in_queue++;
+
+    if (!s->io_q.blocked &&
+        (!s->io_q.plugged ||
+         s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES)) {
+        return ioq_submit(s);
+    }
+    return 0;
+}
+
+int coroutine_fn luring_co_submit(BlockDriverState *bs, LuringState *s, int fd,
+                                  uint64_t offset, QEMUIOVector *qiov, int type)
+{
+    int ret;
+    LuringAIOCB luringcb = {
+        .co         = qemu_coroutine_self(),
+        .ret        = -EINPROGRESS,
+        .qiov       = qiov,
+        .is_read    = (type == QEMU_AIO_READ),
+    };
+
+    ret = luring_do_submit(fd, &luringcb, s, offset, type);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (luringcb.ret == -EINPROGRESS) {
+        qemu_coroutine_yield();
+    }
+    return luringcb.ret;
+}
+
+void luring_detach_aio_context(LuringState *s, AioContext *old_context)
+{
+    aio_set_fd_handler(old_context, s->ring.ring_fd, false, NULL, NULL, NULL,
+                       s);
+    qemu_bh_delete(s->completion_bh);
+    s->aio_context = NULL;
+}
+
+void luring_attach_aio_context(LuringState *s, AioContext *new_context)
+{
+    s->aio_context = new_context;
+    s->completion_bh = aio_bh_new(new_context, qemu_luring_completion_bh, s);
+    aio_set_fd_handler(s->aio_context, s->ring.ring_fd, false,
+                       qemu_luring_completion_cb, NULL, NULL, s);
+}
+
+LuringState *luring_init(Error **errp)
+{
+    int rc;
+    LuringState *s = g_new0(LuringState, 1);
+    struct io_uring *ring = &s->ring;
+
+    rc = io_uring_queue_init(MAX_ENTRIES, ring, 0);
+    if (rc < 0) {
+        error_setg_errno(errp, errno, "failed to init linux io_uring ring");
+        g_free(s);
+        return NULL;
+    }
+
+    ioq_init(&s->io_q);
+    return s;
+
+}
+
+void luring_cleanup(LuringState *s)
+{
+    io_uring_queue_exit(&s->ring);
+    g_free(s);
+}
diff --git a/include/block/aio.h b/include/block/aio.h
index 6b0d52f732..7ba9bd7874 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -49,6 +49,7 @@ typedef void IOHandler(void *opaque);
 struct Coroutine;
 struct ThreadPool;
 struct LinuxAioState;
+struct LuringState;
 
 struct AioContext {
     GSource source;
@@ -117,11 +118,19 @@ struct AioContext {
     struct ThreadPool *thread_pool;
 
 #ifdef CONFIG_LINUX_AIO
-    /* State for native Linux AIO.  Uses aio_context_acquire/release for
+    /*
+     * State for native Linux AIO.  Uses aio_context_acquire/release for
      * locking.
      */
     struct LinuxAioState *linux_aio;
 #endif
+#ifdef CONFIG_LINUX_IO_URING
+    /*
+     * State for Linux io_uring.  Uses aio_context_acquire/release for
+     * locking.
+     */
+    struct LuringState *linux_io_uring;
+#endif
 
     /* TimerLists for calling timers - one per clock type.  Has its own
      * locking.
@@ -386,6 +395,11 @@ struct LinuxAioState *aio_setup_linux_aio(AioContext *ctx, Error **errp);
 /* Return the LinuxAioState bound to this AioContext */
 struct LinuxAioState *aio_get_linux_aio(AioContext *ctx);
 
+/* Setup the LuringState bound to this AioContext */
+struct LuringState *aio_setup_linux_io_uring(AioContext *ctx, Error **errp);
+
+/* Return the LuringState bound to this AioContext */
+struct LuringState *aio_get_linux_io_uring(AioContext *ctx);
 /**
  * aio_timer_new_with_attrs:
  * @ctx: the aio context
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index 4629f24d08..251b10d273 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -57,6 +57,18 @@ void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context);
 void laio_io_plug(BlockDriverState *bs, LinuxAioState *s);
 void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s);
 #endif
+/* io_uring.c - Linux io_uring implementation */
+#ifdef CONFIG_LINUX_IO_URING
+typedef struct LuringState LuringState;
+LuringState *luring_init(Error **errp);
+void luring_cleanup(LuringState *s);
+int coroutine_fn luring_co_submit(BlockDriverState *bs, LuringState *s, int fd,
+                                uint64_t offset, QEMUIOVector *qiov, int type);
+void luring_detach_aio_context(LuringState *s, AioContext *old_context);
+void luring_attach_aio_context(LuringState *s, AioContext *new_context);
+void luring_io_plug(BlockDriverState *bs, LuringState *s);
+void luring_io_unplug(BlockDriverState *bs, LuringState *s);
+#endif
 
 #ifdef _WIN32
 typedef struct QEMUWin32AIOState QEMUWin32AIOState;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 05/15] stubs: add stubs for io_uring interface
  2019-12-18 16:32 [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2019-12-18 16:32 ` [PATCH v3 04/15] block/io_uring: implements interfaces " Stefan Hajnoczi
@ 2019-12-18 16:32 ` Stefan Hajnoczi
  2019-12-18 16:32 ` [PATCH v3 06/15] util/async: add aio interfaces for io_uring Stefan Hajnoczi
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-12-18 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, oleksandr, Julia Suvorova,
	Markus Armbruster, Max Reitz, Stefan Hajnoczi, Paolo Bonzini,
	Aarushi Mehta

From: Aarushi Mehta <mehta.aaru20@gmail.com>

Follow linux-aio.o and stub out the block/io_uring.o APIs that will be
missing when a binary is linked with obj-util-y but without
block-util-y (e.g. vhost-user-gpu).

For example, the stubs are necessary so that a binary using util/async.o
from obj-util-y for qemu_bh_new() links successfully.  In this case
block/io_uring.o from block-util-y isn't needed and we can avoid
dragging in the block layer by linking the stubs instead.  The stub
functions never get called.

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 MAINTAINERS         |  1 +
 stubs/Makefile.objs |  1 +
 stubs/io_uring.c    | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 34 insertions(+)
 create mode 100644 stubs/io_uring.c

diff --git a/MAINTAINERS b/MAINTAINERS
index fc7f53b229..2f578c70e7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2603,6 +2603,7 @@ M: Stefan Hajnoczi <stefanha@redhat.com>
 L: qemu-block@nongnu.org
 S: Maintained
 F: block/io_uring.c
+F: stubs/io_uring.c
 
 qcow2
 M: Kevin Wolf <kwolf@redhat.com>
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 4a50e95ec3..56c177c507 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -13,6 +13,7 @@ stub-obj-y += iothread.o
 stub-obj-y += iothread-lock.o
 stub-obj-y += is-daemonized.o
 stub-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
+stub-obj-$(CONFIG_LINUX_IO_URING) += io_uring.o
 stub-obj-y += machine-init-done.o
 stub-obj-y += migr-blocker.o
 stub-obj-y += change-state-handler.o
diff --git a/stubs/io_uring.c b/stubs/io_uring.c
new file mode 100644
index 0000000000..622d1e4648
--- /dev/null
+++ b/stubs/io_uring.c
@@ -0,0 +1,32 @@
+/*
+ * Linux io_uring support.
+ *
+ * Copyright (C) 2009 IBM, Corp.
+ * Copyright (C) 2009 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "block/aio.h"
+#include "block/raw-aio.h"
+
+void luring_detach_aio_context(LuringState *s, AioContext *old_context)
+{
+    abort();
+}
+
+void luring_attach_aio_context(LuringState *s, AioContext *new_context)
+{
+    abort();
+}
+
+LuringState *luring_init(Error **errp)
+{
+    abort();
+}
+
+void luring_cleanup(LuringState *s)
+{
+    abort();
+}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 06/15] util/async: add aio interfaces for io_uring
  2019-12-18 16:32 [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2019-12-18 16:32 ` [PATCH v3 05/15] stubs: add stubs for io_uring interface Stefan Hajnoczi
@ 2019-12-18 16:32 ` Stefan Hajnoczi
  2019-12-18 16:32 ` [PATCH v3 07/15] blockdev: adds bdrv_parse_aio to use io_uring Stefan Hajnoczi
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-12-18 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, oleksandr, Julia Suvorova,
	Markus Armbruster, Max Reitz, Stefan Hajnoczi, Paolo Bonzini,
	Aarushi Mehta

From: Aarushi Mehta <mehta.aaru20@gmail.com>

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 util/async.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/util/async.c b/util/async.c
index b1fa5319e5..c192a24a61 100644
--- a/util/async.c
+++ b/util/async.c
@@ -276,6 +276,14 @@ aio_ctx_finalize(GSource     *source)
     }
 #endif
 
+#ifdef CONFIG_LINUX_IO_URING
+    if (ctx->linux_io_uring) {
+        luring_detach_aio_context(ctx->linux_io_uring, ctx);
+        luring_cleanup(ctx->linux_io_uring);
+        ctx->linux_io_uring = NULL;
+    }
+#endif
+
     assert(QSLIST_EMPTY(&ctx->scheduled_coroutines));
     qemu_bh_delete(ctx->co_schedule_bh);
 
@@ -340,6 +348,29 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
 }
 #endif
 
+#ifdef CONFIG_LINUX_IO_URING
+LuringState *aio_setup_linux_io_uring(AioContext *ctx, Error **errp)
+{
+    if (ctx->linux_io_uring) {
+        return ctx->linux_io_uring;
+    }
+
+    ctx->linux_io_uring = luring_init(errp);
+    if (!ctx->linux_io_uring) {
+        return NULL;
+    }
+
+    luring_attach_aio_context(ctx->linux_io_uring, ctx);
+    return ctx->linux_io_uring;
+}
+
+LuringState *aio_get_linux_io_uring(AioContext *ctx)
+{
+    assert(ctx->linux_io_uring);
+    return ctx->linux_io_uring;
+}
+#endif
+
 void aio_notify(AioContext *ctx)
 {
     /* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
@@ -434,6 +465,11 @@ AioContext *aio_context_new(Error **errp)
 #ifdef CONFIG_LINUX_AIO
     ctx->linux_aio = NULL;
 #endif
+
+#ifdef CONFIG_LINUX_IO_URING
+    ctx->linux_io_uring = NULL;
+#endif
+
     ctx->thread_pool = NULL;
     qemu_rec_mutex_init(&ctx->lock);
     timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 07/15] blockdev: adds bdrv_parse_aio to use io_uring
  2019-12-18 16:32 [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
                   ` (5 preceding siblings ...)
  2019-12-18 16:32 ` [PATCH v3 06/15] util/async: add aio interfaces for io_uring Stefan Hajnoczi
@ 2019-12-18 16:32 ` Stefan Hajnoczi
  2019-12-18 16:32 ` [PATCH v3 08/15] block/file-posix.c: extend " Stefan Hajnoczi
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-12-18 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, oleksandr, Julia Suvorova,
	Markus Armbruster, Max Reitz, Stefan Hajnoczi, Paolo Bonzini,
	Aarushi Mehta

From: Aarushi Mehta <mehta.aaru20@gmail.com>

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c               | 22 ++++++++++++++++++++++
 blockdev.c            | 12 ++++--------
 include/block/block.h |  1 +
 3 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/block.c b/block.c
index 473eb6eeaa..a06b47c638 100644
--- a/block.c
+++ b/block.c
@@ -845,6 +845,28 @@ static BlockdevDetectZeroesOptions bdrv_parse_detect_zeroes(QemuOpts *opts,
     return detect_zeroes;
 }
 
+/**
+ * Set open flags for aio engine
+ *
+ * Return 0 on success, -1 if the engine specified is invalid
+ */
+int bdrv_parse_aio(const char *mode, int *flags)
+{
+    if (!strcmp(mode, "threads")) {
+        /* do nothing, default */
+    } else if (!strcmp(mode, "native")) {
+        *flags |= BDRV_O_NATIVE_AIO;
+#ifdef CONFIG_LINUX_IO_URING
+    } else if (!strcmp(mode, "io_uring")) {
+        *flags |= BDRV_O_IO_URING;
+#endif
+    } else {
+        return -1;
+    }
+
+    return 0;
+}
+
 /**
  * Set open flags for a given discard mode
  *
diff --git a/blockdev.c b/blockdev.c
index 8e029e9c01..6523f9551d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -385,13 +385,9 @@ static void extract_common_blockdev_options(QemuOpts *opts, int *bdrv_flags,
         }
 
         if ((aio = qemu_opt_get(opts, "aio")) != NULL) {
-            if (!strcmp(aio, "native")) {
-                *bdrv_flags |= BDRV_O_NATIVE_AIO;
-            } else if (!strcmp(aio, "threads")) {
-                /* this is the default */
-            } else {
-               error_setg(errp, "invalid aio option");
-               return;
+            if (bdrv_parse_aio(aio, bdrv_flags) < 0) {
+                error_setg(errp, "invalid aio option");
+                return;
             }
         }
     }
@@ -4641,7 +4637,7 @@ QemuOptsList qemu_common_drive_opts = {
         },{
             .name = "aio",
             .type = QEMU_OPT_STRING,
-            .help = "host AIO implementation (threads, native)",
+            .help = "host AIO implementation (threads, native, io_uring)",
         },{
             .name = BDRV_OPT_CACHE_WB,
             .type = QEMU_OPT_BOOL,
diff --git a/include/block/block.h b/include/block/block.h
index 7071dc041e..2abe008399 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -300,6 +300,7 @@ void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
 void bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
                        Error **errp);
 
+int bdrv_parse_aio(const char *mode, int *flags);
 int bdrv_parse_cache_mode(const char *mode, int *flags, bool *writethrough);
 int bdrv_parse_discard_flags(const char *mode, int *flags);
 BdrvChild *bdrv_open_child(const char *filename,
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 08/15] block/file-posix.c: extend to use io_uring
  2019-12-18 16:32 [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
                   ` (6 preceding siblings ...)
  2019-12-18 16:32 ` [PATCH v3 07/15] blockdev: adds bdrv_parse_aio to use io_uring Stefan Hajnoczi
@ 2019-12-18 16:32 ` Stefan Hajnoczi
  2020-01-13 11:49   ` Stefano Garzarella
  2019-12-18 16:32 ` [PATCH v3 09/15] block: add trace events for io_uring Stefan Hajnoczi
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-12-18 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Maxim Levitsky, qemu-block, oleksandr,
	Julia Suvorova, Markus Armbruster, Max Reitz, Stefan Hajnoczi,
	Paolo Bonzini, Aarushi Mehta

From: Aarushi Mehta <mehta.aaru20@gmail.com>

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Reviewed-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/file-posix.c | 95 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 75 insertions(+), 20 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 1b805bd938..a42a90e59d 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -156,6 +156,7 @@ typedef struct BDRVRawState {
     bool has_write_zeroes:1;
     bool discard_zeroes:1;
     bool use_linux_aio:1;
+    bool use_linux_io_uring:1;
     bool page_cache_inconsistent:1;
     bool has_fallocate;
     bool needs_alignment;
@@ -444,7 +445,7 @@ static QemuOptsList raw_runtime_opts = {
         {
             .name = "aio",
             .type = QEMU_OPT_STRING,
-            .help = "host AIO implementation (threads, native)",
+            .help = "host AIO implementation (threads, native, io_uring)",
         },
         {
             .name = "locking",
@@ -503,9 +504,11 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
         goto fail;
     }
 
-    aio_default = (bdrv_flags & BDRV_O_NATIVE_AIO)
-                  ? BLOCKDEV_AIO_OPTIONS_NATIVE
-                  : BLOCKDEV_AIO_OPTIONS_THREADS;
+    if (bdrv_flags & BDRV_O_NATIVE_AIO) {
+        aio_default = BLOCKDEV_AIO_OPTIONS_NATIVE;
+    } else {
+        aio_default = BLOCKDEV_AIO_OPTIONS_THREADS;
+    }
     aio = qapi_enum_parse(&BlockdevAioOptions_lookup,
                           qemu_opt_get(opts, "aio"),
                           aio_default, &local_err);
@@ -514,7 +517,11 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
         ret = -EINVAL;
         goto fail;
     }
+
     s->use_linux_aio = (aio == BLOCKDEV_AIO_OPTIONS_NATIVE);
+#ifdef CONFIG_LINUX_IO_URING
+    s->use_linux_io_uring = (aio == BLOCKDEV_AIO_OPTIONS_IO_URING);
+#endif
 
     locking = qapi_enum_parse(&OnOffAuto_lookup,
                               qemu_opt_get(opts, "locking"),
@@ -578,7 +585,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
     s->shared_perm = BLK_PERM_ALL;
 
 #ifdef CONFIG_LINUX_AIO
-     /* Currently Linux does AIO only for files opened with O_DIRECT */
+    /* Currently Linux does AIO only for files opened with O_DIRECT */
     if (s->use_linux_aio) {
         if (!(s->open_flags & O_DIRECT)) {
             error_setg(errp, "aio=native was specified, but it requires "
@@ -600,6 +607,22 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
     }
 #endif /* !defined(CONFIG_LINUX_AIO) */
 
+#ifdef CONFIG_LINUX_IO_URING
+    if (s->use_linux_io_uring) {
+        if (!aio_setup_linux_io_uring(bdrv_get_aio_context(bs), errp)) {
+            error_prepend(errp, "Unable to use io_uring: ");
+            goto fail;
+        }
+    }
+#else
+    if (s->use_linux_io_uring) {
+        error_setg(errp, "aio=io_uring was specified, but is not supported "
+                         "in this build.");
+        ret = -EINVAL;
+        goto fail;
+    }
+#endif /* !defined(CONFIG_LINUX_IO_URING) */
+
     s->has_discard = true;
     s->has_write_zeroes = true;
     if ((bs->open_flags & BDRV_O_NOCACHE) != 0) {
@@ -1877,21 +1900,25 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
         return -EIO;
 
     /*
-     * Check if the underlying device requires requests to be aligned,
-     * and if the request we are trying to submit is aligned or not.
-     * If this is the case tell the low-level driver that it needs
-     * to copy the buffer.
+     * When using O_DIRECT, the request must be aligned to be able to use
+     * either libaio or io_uring interface. If not fail back to regular thread
+     * pool read/write code which emulates this for us if we
+     * set QEMU_AIO_MISALIGNED.
      */
-    if (s->needs_alignment) {
-        if (!bdrv_qiov_is_aligned(bs, qiov)) {
-            type |= QEMU_AIO_MISALIGNED;
+    if (s->needs_alignment && !bdrv_qiov_is_aligned(bs, qiov)) {
+        type |= QEMU_AIO_MISALIGNED;
+#ifdef CONFIG_LINUX_IO_URING
+    } else if (s->use_linux_io_uring) {
+        LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs));
+        assert(qiov->size == bytes);
+        return luring_co_submit(bs, aio, s->fd, offset, qiov, type);
+#endif
 #ifdef CONFIG_LINUX_AIO
-        } else if (s->use_linux_aio) {
-            LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
-            assert(qiov->size == bytes);
-            return laio_co_submit(bs, aio, s->fd, offset, qiov, type);
+    } else if (s->use_linux_aio) {
+        LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
+        assert(qiov->size == bytes);
+        return laio_co_submit(bs, aio, s->fd, offset, qiov, type);
 #endif
-        }
     }
 
     acb = (RawPosixAIOData) {
@@ -1927,24 +1954,36 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset,
 
 static void raw_aio_plug(BlockDriverState *bs)
 {
+    BDRVRawState __attribute__((unused)) *s = bs->opaque;
 #ifdef CONFIG_LINUX_AIO
-    BDRVRawState *s = bs->opaque;
     if (s->use_linux_aio) {
         LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
         laio_io_plug(bs, aio);
     }
 #endif
+#ifdef CONFIG_LINUX_IO_URING
+    if (s->use_linux_io_uring) {
+        LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs));
+        luring_io_plug(bs, aio);
+    }
+#endif
 }
 
 static void raw_aio_unplug(BlockDriverState *bs)
 {
+    BDRVRawState __attribute__((unused)) *s = bs->opaque;
 #ifdef CONFIG_LINUX_AIO
-    BDRVRawState *s = bs->opaque;
     if (s->use_linux_aio) {
         LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
         laio_io_unplug(bs, aio);
     }
 #endif
+#ifdef CONFIG_LINUX_IO_URING
+    if (s->use_linux_io_uring) {
+        LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs));
+        luring_io_unplug(bs, aio);
+    }
+#endif
 }
 
 static int raw_co_flush_to_disk(BlockDriverState *bs)
@@ -1964,14 +2003,20 @@ static int raw_co_flush_to_disk(BlockDriverState *bs)
         .aio_type       = QEMU_AIO_FLUSH,
     };
 
+#ifdef CONFIG_LINUX_IO_URING
+    if (s->use_linux_io_uring) {
+        LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs));
+        return luring_co_submit(bs, aio, s->fd, 0, NULL, QEMU_AIO_FLUSH);
+    }
+#endif
     return raw_thread_pool_submit(bs, handle_aiocb_flush, &acb);
 }
 
 static void raw_aio_attach_aio_context(BlockDriverState *bs,
                                        AioContext *new_context)
 {
+    BDRVRawState __attribute__((unused)) *s = bs->opaque;
 #ifdef CONFIG_LINUX_AIO
-    BDRVRawState *s = bs->opaque;
     if (s->use_linux_aio) {
         Error *local_err = NULL;
         if (!aio_setup_linux_aio(new_context, &local_err)) {
@@ -1981,6 +2026,16 @@ static void raw_aio_attach_aio_context(BlockDriverState *bs,
         }
     }
 #endif
+#ifdef CONFIG_LINUX_IO_URING
+    if (s->use_linux_io_uring) {
+        Error *local_err;
+        if (!aio_setup_linux_io_uring(new_context, &local_err)) {
+            error_reportf_err(local_err, "Unable to use linux io_uring, "
+                                         "falling back to thread pool: ");
+            s->use_linux_io_uring = false;
+        }
+    }
+#endif
 }
 
 static void raw_close(BlockDriverState *bs)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 09/15] block: add trace events for io_uring
  2019-12-18 16:32 [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
                   ` (7 preceding siblings ...)
  2019-12-18 16:32 ` [PATCH v3 08/15] block/file-posix.c: extend " Stefan Hajnoczi
@ 2019-12-18 16:32 ` Stefan Hajnoczi
  2019-12-18 16:32 ` [PATCH v3 10/15] block/io_uring: adds userspace completion polling Stefan Hajnoczi
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-12-18 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, oleksandr, Julia Suvorova,
	Markus Armbruster, Max Reitz, Stefan Hajnoczi, Paolo Bonzini,
	Aarushi Mehta

From: Aarushi Mehta <mehta.aaru20@gmail.com>

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/io_uring.c   | 23 ++++++++++++++++++++---
 block/trace-events | 12 ++++++++++++
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index bb433a685b..a5c0d16220 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -17,6 +17,7 @@
 #include "block/raw-aio.h"
 #include "qemu/coroutine.h"
 #include "qapi/error.h"
+#include "trace.h"
 
 /* io_uring ring size */
 #define MAX_ENTRIES 128
@@ -85,6 +86,8 @@ static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb,
     QEMUIOVector *resubmit_qiov;
     size_t remaining;
 
+    trace_luring_resubmit_short_read(s, luringcb, nread);
+
     /* Update read position */
     luringcb->total_read = nread;
     remaining = luringcb->qiov->size - luringcb->total_read;
@@ -156,6 +159,7 @@ static void luring_process_completions(LuringState *s)
 
         /* Change counters one-by-one because we can be nested. */
         s->io_q.in_flight--;
+        trace_luring_process_completion(s, luringcb, ret);
 
         /* total_read is non-zero only for resubmitted read requests */
         total_bytes = ret + luringcb->total_read;
@@ -224,6 +228,7 @@ static int ioq_submit(LuringState *s)
             QSIMPLEQ_REMOVE_HEAD(&s->io_q.submit_queue, next);
         }
         ret = io_uring_submit(&s->ring);
+        trace_luring_io_uring_submit(s, ret);
         /* Prevent infinite loop if submission is refused */
         if (ret <= 0) {
             if (ret == -EAGAIN) {
@@ -280,12 +285,15 @@ static void ioq_init(LuringQueue *io_q)
 
 void luring_io_plug(BlockDriverState *bs, LuringState *s)
 {
+    trace_luring_io_plug(s);
     s->io_q.plugged++;
 }
 
 void luring_io_unplug(BlockDriverState *bs, LuringState *s)
 {
     assert(s->io_q.plugged);
+    trace_luring_io_unplug(s, s->io_q.blocked, s->io_q.plugged,
+                           s->io_q.in_queue, s->io_q.in_flight);
     if (--s->io_q.plugged == 0 &&
         !s->io_q.blocked && s->io_q.in_queue > 0) {
         ioq_submit(s);
@@ -306,6 +314,7 @@ void luring_io_unplug(BlockDriverState *bs, LuringState *s)
 static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
                             uint64_t offset, int type)
 {
+    int ret;
     struct io_uring_sqe *sqes = &luringcb->sqeq;
 
     switch (type) {
@@ -329,11 +338,14 @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
 
     QSIMPLEQ_INSERT_TAIL(&s->io_q.submit_queue, luringcb, next);
     s->io_q.in_queue++;
-
+    trace_luring_do_submit(s, s->io_q.blocked, s->io_q.plugged,
+                           s->io_q.in_queue, s->io_q.in_flight);
     if (!s->io_q.blocked &&
         (!s->io_q.plugged ||
          s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES)) {
-        return ioq_submit(s);
+        ret = ioq_submit(s);
+        trace_luring_do_submit_done(s, ret);
+        return ret;
     }
     return 0;
 }
@@ -348,8 +360,10 @@ int coroutine_fn luring_co_submit(BlockDriverState *bs, LuringState *s, int fd,
         .qiov       = qiov,
         .is_read    = (type == QEMU_AIO_READ),
     };
-
+    trace_luring_co_submit(bs, s, &luringcb, fd, offset, qiov ? qiov->size : 0,
+                           type);
     ret = luring_do_submit(fd, &luringcb, s, offset, type);
+
     if (ret < 0) {
         return ret;
     }
@@ -382,6 +396,8 @@ LuringState *luring_init(Error **errp)
     LuringState *s = g_new0(LuringState, 1);
     struct io_uring *ring = &s->ring;
 
+    trace_luring_init_state(s, sizeof(*s));
+
     rc = io_uring_queue_init(MAX_ENTRIES, ring, 0);
     if (rc < 0) {
         error_setg_errno(errp, errno, "failed to init linux io_uring ring");
@@ -398,4 +414,5 @@ void luring_cleanup(LuringState *s)
 {
     io_uring_queue_exit(&s->ring);
     g_free(s);
+    trace_luring_cleanup_state(s);
 }
diff --git a/block/trace-events b/block/trace-events
index 6ba86decca..1a7329b736 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -63,6 +63,18 @@ qmp_block_stream(void *bs) "bs %p"
 file_paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) "acb %p opaque %p offset %"PRId64" count %d type %d"
 file_copy_file_range(void *bs, int src, int64_t src_off, int dst, int64_t dst_off, int64_t bytes, int flags, int64_t ret) "bs %p src_fd %d offset %"PRIu64" dst_fd %d offset %"PRIu64" bytes %"PRIu64" flags %d ret %"PRId64
 
+#io_uring.c
+luring_init_state(void *s, size_t size) "s %p size %zu"
+luring_cleanup_state(void *s) "%p freed"
+luring_io_plug(void *s) "LuringState %p plug"
+luring_io_unplug(void *s, int blocked, int plugged, int queued, int inflight) "LuringState %p blocked %d plugged %d queued %d inflight %d"
+luring_do_submit(void *s, int blocked, int plugged, int queued, int inflight) "LuringState %p blocked %d plugged %d queued %d inflight %d"
+luring_do_submit_done(void *s, int ret) "LuringState %p submitted to kernel %d"
+luring_co_submit(void *bs, void *s, void *luringcb, int fd, uint64_t offset, size_t nbytes, int type) "bs %p s %p luringcb %p fd %d offset %" PRId64 " nbytes %zd type %d"
+luring_process_completion(void *s, void *aiocb, int ret) "LuringState %p luringcb %p ret %d"
+luring_io_uring_submit(void *s, int ret) "LuringState %p ret %d"
+luring_resubmit_short_read(void *s, void *luringcb, int nread) "LuringState %p luringcb %p nread %d"
+
 # qcow2.c
 qcow2_add_task(void *co, void *bs, void *pool, const char *action, int cluster_type, uint64_t file_cluster_offset, uint64_t offset, uint64_t bytes, void *qiov, size_t qiov_offset) "co %p bs %p pool %p: %s: cluster_type %d file_cluster_offset %" PRIu64 " offset %" PRIu64 " bytes %" PRIu64 " qiov %p qiov_offset %zu"
 qcow2_writev_start_req(void *co, int64_t offset, int bytes) "co %p offset 0x%" PRIx64 " bytes %d"
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 10/15] block/io_uring: adds userspace completion polling
  2019-12-18 16:32 [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
                   ` (8 preceding siblings ...)
  2019-12-18 16:32 ` [PATCH v3 09/15] block: add trace events for io_uring Stefan Hajnoczi
@ 2019-12-18 16:32 ` Stefan Hajnoczi
  2019-12-18 16:32 ` [PATCH v3 11/15] qemu-io: adds option to use aio engine Stefan Hajnoczi
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-12-18 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, oleksandr, Julia Suvorova,
	Markus Armbruster, Max Reitz, Stefan Hajnoczi, Paolo Bonzini,
	Aarushi Mehta

From: Aarushi Mehta <mehta.aaru20@gmail.com>

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/io_uring.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index a5c0d16220..56892fd1ab 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -274,6 +274,21 @@ static void qemu_luring_completion_cb(void *opaque)
     luring_process_completions_and_submit(s);
 }
 
+static bool qemu_luring_poll_cb(void *opaque)
+{
+    LuringState *s = opaque;
+    struct io_uring_cqe *cqes;
+
+    if (io_uring_peek_cqe(&s->ring, &cqes) == 0) {
+        if (cqes) {
+            luring_process_completions_and_submit(s);
+            return true;
+        }
+    }
+
+    return false;
+}
+
 static void ioq_init(LuringQueue *io_q)
 {
     QSIMPLEQ_INIT(&io_q->submit_queue);
@@ -387,7 +402,7 @@ void luring_attach_aio_context(LuringState *s, AioContext *new_context)
     s->aio_context = new_context;
     s->completion_bh = aio_bh_new(new_context, qemu_luring_completion_bh, s);
     aio_set_fd_handler(s->aio_context, s->ring.ring_fd, false,
-                       qemu_luring_completion_cb, NULL, NULL, s);
+                       qemu_luring_completion_cb, NULL, qemu_luring_poll_cb, s);
 }
 
 LuringState *luring_init(Error **errp)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 11/15] qemu-io: adds option to use aio engine
  2019-12-18 16:32 [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
                   ` (9 preceding siblings ...)
  2019-12-18 16:32 ` [PATCH v3 10/15] block/io_uring: adds userspace completion polling Stefan Hajnoczi
@ 2019-12-18 16:32 ` Stefan Hajnoczi
  2019-12-18 16:32 ` [PATCH v3 12/15] qemu-img: adds option to use aio engine for benchmarking Stefan Hajnoczi
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-12-18 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, oleksandr, Julia Suvorova,
	Markus Armbruster, Max Reitz, Stefan Hajnoczi, Paolo Bonzini,
	Aarushi Mehta

From: Aarushi Mehta <mehta.aaru20@gmail.com>

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 qemu-io.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index 91e3276592..3adc5a7d0d 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -130,7 +130,8 @@ static void open_help(void)
 " -C, -- use copy-on-read\n"
 " -n, -- disable host cache, short for -t none\n"
 " -U, -- force shared permissions\n"
-" -k, -- use kernel AIO implementation (on Linux only)\n"
+" -k, -- use kernel AIO implementation (Linux only, prefer use of -i)\n"
+" -i, -- use AIO mode (threads, native or io_uring)\n"
 " -t, -- use the given cache mode for the image\n"
 " -d, -- use the given discard mode for the image\n"
 " -o, -- options to be given to the block driver"
@@ -172,7 +173,7 @@ static int open_f(BlockBackend *blk, int argc, char **argv)
     QDict *opts;
     bool force_share = false;
 
-    while ((c = getopt(argc, argv, "snCro:kt:d:U")) != -1) {
+    while ((c = getopt(argc, argv, "snCro:ki:t:d:U")) != -1) {
         switch (c) {
         case 's':
             flags |= BDRV_O_SNAPSHOT;
@@ -204,6 +205,13 @@ static int open_f(BlockBackend *blk, int argc, char **argv)
                 return -EINVAL;
             }
             break;
+        case 'i':
+            if (bdrv_parse_aio(optarg, &flags) < 0) {
+                error_report("Invalid aio option: %s", optarg);
+                qemu_opts_reset(&empty_opts);
+                return -EINVAL;
+            }
+            break;
         case 'o':
             if (imageOpts) {
                 printf("--image-opts and 'open -o' are mutually exclusive\n");
@@ -291,7 +299,9 @@ static void usage(const char *name)
 "  -n, --nocache        disable host cache, short for -t none\n"
 "  -C, --copy-on-read   enable copy-on-read\n"
 "  -m, --misalign       misalign allocations for O_DIRECT\n"
-"  -k, --native-aio     use kernel AIO implementation (on Linux only)\n"
+"  -k, --native-aio     use kernel AIO implementation\n"
+"                       (Linux only, prefer use of -i)\n"
+"  -i, --aio=MODE       use AIO mode (threads, native or io_uring)\n"
 "  -t, --cache=MODE     use the given cache mode for the image\n"
 "  -d, --discard=MODE   use the given discard mode for the image\n"
 "  -T, --trace [[enable=]<pattern>][,events=<file>][,file=<file>]\n"
@@ -496,7 +506,7 @@ static QemuOptsList file_opts = {
 int main(int argc, char **argv)
 {
     int readonly = 0;
-    const char *sopt = "hVc:d:f:rsnCmkt:T:U";
+    const char *sopt = "hVc:d:f:rsnCmki:t:T:U";
     const struct option lopt[] = {
         { "help", no_argument, NULL, 'h' },
         { "version", no_argument, NULL, 'V' },
@@ -508,6 +518,7 @@ int main(int argc, char **argv)
         { "copy-on-read", no_argument, NULL, 'C' },
         { "misalign", no_argument, NULL, 'm' },
         { "native-aio", no_argument, NULL, 'k' },
+        { "aio", required_argument, NULL, 'i' },
         { "discard", required_argument, NULL, 'd' },
         { "cache", required_argument, NULL, 't' },
         { "trace", required_argument, NULL, 'T' },
@@ -575,6 +586,12 @@ int main(int argc, char **argv)
         case 'k':
             flags |= BDRV_O_NATIVE_AIO;
             break;
+        case 'i':
+            if (bdrv_parse_aio(optarg, &flags) < 0) {
+                error_report("Invalid aio option: %s", optarg);
+                exit(1);
+            }
+            break;
         case 't':
             if (bdrv_parse_cache_mode(optarg, &flags, &writethrough) < 0) {
                 error_report("Invalid cache option: %s", optarg);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 12/15] qemu-img: adds option to use aio engine for benchmarking
  2019-12-18 16:32 [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
                   ` (10 preceding siblings ...)
  2019-12-18 16:32 ` [PATCH v3 11/15] qemu-io: adds option to use aio engine Stefan Hajnoczi
@ 2019-12-18 16:32 ` Stefan Hajnoczi
  2019-12-18 16:32 ` [PATCH v3 13/15] qemu-nbd: adds option for aio engines Stefan Hajnoczi
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-12-18 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, oleksandr, Julia Suvorova,
	Markus Armbruster, Max Reitz, Stefan Hajnoczi, Paolo Bonzini,
	Aarushi Mehta

From: Aarushi Mehta <mehta.aaru20@gmail.com>

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v10:
 * Add missing space to qemu-img command-line documentation
---
 qemu-img-cmds.hx |  4 ++--
 qemu-img.c       | 11 ++++++++++-
 qemu-img.texi    |  5 ++++-
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 1c93e6d185..77b5a8dda8 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -20,9 +20,9 @@ STEXI
 ETEXI
 
 DEF("bench", img_bench,
-    "bench [-c count] [-d depth] [-f fmt] [--flush-interval=flush_interval] [-n] [--no-drain] [-o offset] [--pattern=pattern] [-q] [-s buffer_size] [-S step_size] [-t cache] [-w] [-U] filename")
+    "bench [-c count] [-d depth] [-f fmt] [--flush-interval=flush_interval] [-n] [--no-drain] [-o offset] [--pattern=pattern] [-q] [-s buffer_size] [-S step_size] [-t cache] [-i aio] [-w] [-U] filename")
 STEXI
-@item bench [-c @var{count}] [-d @var{depth}] [-f @var{fmt}] [--flush-interval=@var{flush_interval}] [-n] [--no-drain] [-o @var{offset}] [--pattern=@var{pattern}] [-q] [-s @var{buffer_size}] [-S @var{step_size}] [-t @var{cache}] [-w] [-U] @var{filename}
+@item bench [-c @var{count}] [-d @var{depth}] [-f @var{fmt}] [--flush-interval=@var{flush_interval}] [-n] [--no-drain] [-o @var{offset}] [--pattern=@var{pattern}] [-q] [-s @var{buffer_size}] [-S @var{step_size}] [-t @var{cache}] [-i @var{aio}] [-w] [-U] @var{filename}
 ETEXI
 
 DEF("check", img_check,
diff --git a/qemu-img.c b/qemu-img.c
index 95a24b9762..cde0a4834a 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4184,7 +4184,8 @@ static int img_bench(int argc, char **argv)
             {"force-share", no_argument, 0, 'U'},
             {0, 0, 0, 0}
         };
-        c = getopt_long(argc, argv, ":hc:d:f:no:qs:S:t:wU", long_options, NULL);
+        c = getopt_long(argc, argv, ":hc:d:f:ni:o:qs:S:t:wU", long_options,
+                        NULL);
         if (c == -1) {
             break;
         }
@@ -4227,6 +4228,14 @@ static int img_bench(int argc, char **argv)
         case 'n':
             flags |= BDRV_O_NATIVE_AIO;
             break;
+        case 'i':
+            ret = bdrv_parse_aio(optarg, &flags);
+            if (ret < 0) {
+                error_report("Invalid aio option: %s", optarg);
+                ret = -1;
+                goto out;
+            }
+            break;
         case 'o':
         {
             offset = cvtnum(optarg);
diff --git a/qemu-img.texi b/qemu-img.texi
index b5156d6316..20136fcb94 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -206,7 +206,7 @@ Command description:
 Amends the image format specific @var{options} for the image file
 @var{filename}. Not all file formats support this operation.
 
-@item bench [-c @var{count}] [-d @var{depth}] [-f @var{fmt}] [--flush-interval=@var{flush_interval}] [-n] [--no-drain] [-o @var{offset}] [--pattern=@var{pattern}] [-q] [-s @var{buffer_size}] [-S @var{step_size}] [-t @var{cache}] [-w] [-U] @var{filename}
+@item bench [-c @var{count}] [-d @var{depth}] [-f @var{fmt}] [--flush-interval=@var{flush_interval}] [-n] [-i @var{aio}] [--no-drain] [-o @var{offset}] [--pattern=@var{pattern}] [-q] [-s @var{buffer_size}] [-S @var{step_size}] [-t @var{cache}] [-w] [-U] @var{filename}
 
 Run a simple sequential I/O benchmark on the specified image. If @code{-w} is
 specified, a write test is performed, otherwise a read test is performed.
@@ -227,6 +227,9 @@ If @code{-n} is specified, the native AIO backend is used if possible. On
 Linux, this option only works if @code{-t none} or @code{-t directsync} is
 specified as well.
 
+If @code{-i} is specified, aio option can be used to specify different AIO
+backends: @var{threads}, @var{native} or @var{io_uring}.
+
 For write tests, by default a buffer filled with zeros is written. This can be
 overridden with a pattern byte specified by @var{pattern}.
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 13/15] qemu-nbd: adds option for aio engines
  2019-12-18 16:32 [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
                   ` (11 preceding siblings ...)
  2019-12-18 16:32 ` [PATCH v3 12/15] qemu-img: adds option to use aio engine for benchmarking Stefan Hajnoczi
@ 2019-12-18 16:32 ` Stefan Hajnoczi
  2019-12-18 16:32 ` [PATCH v3 14/15] tests/qemu-iotests: enable testing with aio options Stefan Hajnoczi
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-12-18 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, oleksandr, Julia Suvorova,
	Markus Armbruster, Max Reitz, Stefan Hajnoczi, Paolo Bonzini,
	Aarushi Mehta

From: Aarushi Mehta <mehta.aaru20@gmail.com>

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Acked-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 qemu-nbd.c    | 12 ++++--------
 qemu-nbd.texi |  4 ++--
 2 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 108a51f7eb..db29a0d0ed 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -135,7 +135,7 @@ static void usage(const char *name)
 "                            '[ID_OR_NAME]'\n"
 "  -n, --nocache             disable host cache\n"
 "      --cache=MODE          set cache mode (none, writeback, ...)\n"
-"      --aio=MODE            set AIO mode (native or threads)\n"
+"      --aio=MODE            set AIO mode (native, io_uring or threads)\n"
 "      --discard=MODE        set discard mode (ignore, unmap)\n"
 "      --detect-zeroes=MODE  set detect-zeroes mode (off, on, unmap)\n"
 "      --image-opts          treat FILE as a full set of image options\n"
@@ -726,13 +726,9 @@ int main(int argc, char **argv)
                 exit(EXIT_FAILURE);
             }
             seen_aio = true;
-            if (!strcmp(optarg, "native")) {
-                flags |= BDRV_O_NATIVE_AIO;
-            } else if (!strcmp(optarg, "threads")) {
-                /* this is the default */
-            } else {
-               error_report("invalid aio mode `%s'", optarg);
-               exit(EXIT_FAILURE);
+            if (bdrv_parse_aio(optarg, &flags) < 0) {
+                error_report("Invalid aio mode '%s'", optarg);
+                exit(EXIT_FAILURE);
             }
             break;
         case QEMU_NBD_OPT_DISCARD:
diff --git a/qemu-nbd.texi b/qemu-nbd.texi
index 7f55657722..3ee3e4bdee 100644
--- a/qemu-nbd.texi
+++ b/qemu-nbd.texi
@@ -77,8 +77,8 @@ as an read-only device, @var{snapshot_param} format is
 The cache mode to be used with the file.  See the documentation of
 the emulator's @code{-drive cache=...} option for allowed values.
 @item --aio=@var{aio}
-Set the asynchronous I/O mode between @samp{threads} (the default)
-and @samp{native} (Linux only).
+Set the asynchronous I/O mode between @samp{threads} (the default),
+@samp{native} (Linux only) and @samp{io_uring} (Linux 5.1+).
 @item --discard=@var{discard}
 Control whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})
 requests are ignored or passed to the filesystem.  @var{discard} is one of
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 14/15] tests/qemu-iotests: enable testing with aio options
  2019-12-18 16:32 [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
                   ` (12 preceding siblings ...)
  2019-12-18 16:32 ` [PATCH v3 13/15] qemu-nbd: adds option for aio engines Stefan Hajnoczi
@ 2019-12-18 16:32 ` Stefan Hajnoczi
  2019-12-18 16:32 ` [PATCH v3 15/15] tests/qemu-iotests: use AIOMODE with various tests Stefan Hajnoczi
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-12-18 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, oleksandr, Julia Suvorova,
	Markus Armbruster, Max Reitz, Stefan Hajnoczi, Paolo Bonzini,
	Aarushi Mehta

From: Aarushi Mehta <mehta.aaru20@gmail.com>

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/qemu-iotests/check      | 15 ++++++++++++++-
 tests/qemu-iotests/common.rc  | 14 ++++++++++++++
 tests/qemu-iotests/iotests.py | 12 ++++++++++--
 3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 90970b0549..bf38223559 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -137,6 +137,7 @@ sortme=false
 expunge=true
 have_test_arg=false
 cachemode=false
+aiomode=false
 
 tmp="${TEST_DIR}"/$$
 rm -f $tmp.list $tmp.tmp $tmp.sed
@@ -146,6 +147,7 @@ export IMGFMT_GENERIC=true
 export IMGPROTO=file
 export IMGOPTS=""
 export CACHEMODE="writeback"
+export AIOMODE="threads"
 export QEMU_IO_OPTIONS=""
 export QEMU_IO_OPTIONS_NO_FMT=""
 export CACHEMODE_IS_DEFAULT=true
@@ -230,6 +232,11 @@ s/ .*//p
         CACHEMODE_IS_DEFAULT=false
         cachemode=false
         continue
+    elif $aiomode
+    then
+        AIOMODE="$r"
+        aiomode=false
+        continue
     fi
 
     xpand=true
@@ -274,6 +281,7 @@ other options
     -n                  show me, do not run tests
     -o options          -o options to pass to qemu-img create/convert
     -c mode             cache mode
+    -i mode             AIO mode
     -makecheck          pretty print output for make check
 
 testlist options
@@ -438,10 +446,13 @@ testlist options
             cachemode=true
             xpand=false
             ;;
+        -i)
+            aiomode=true
+            xpand=false
+            ;;
         -T)        # deprecated timestamp option
             xpand=false
             ;;
-
         -v)
             verbose=true
             xpand=false
@@ -520,6 +531,8 @@ done
 
 # Set qemu-io cache mode with $CACHEMODE we have
 QEMU_IO_OPTIONS="$QEMU_IO_OPTIONS --cache $CACHEMODE"
+# Set qemu-io aio mode with $AIOMODE we have
+QEMU_IO_OPTIONS="$QEMU_IO_OPTIONS --aio $AIOMODE"
 
 QEMU_IO_OPTIONS_NO_FMT="$QEMU_IO_OPTIONS"
 if [ "$IMGOPTSSYNTAX" != "true" ]; then
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 0cc8acc9ed..37386376ef 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -599,6 +599,20 @@ _default_cache_mode()
         return
     fi
 }
+_supported_aio_modes()
+{
+    for mode; do
+        if [ "$mode" = "$AIOMODE" ]; then
+            return
+        fi
+    done
+    _notrun "not suitable for aio mode: $AIOMODE"
+}
+_default_aio_mode()
+{
+    AIOMODE="$1"
+    QEMU_IO="$QEMU_IO --aio $1"
+}
 
 _unsupported_imgopts()
 {
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index df0708923d..e12a91b336 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -65,6 +65,7 @@ test_dir = os.environ.get('TEST_DIR')
 sock_dir = os.environ.get('SOCK_DIR')
 output_dir = os.environ.get('OUTPUT_DIR', '.')
 cachemode = os.environ.get('CACHEMODE')
+aiomode = os.environ.get('AIOMODE')
 qemu_default_machine = os.environ.get('QEMU_DEFAULT_MACHINE')
 
 socket_scm_helper = os.environ.get('SOCKET_SCM_HELPER', 'socket_scm_helper')
@@ -490,6 +491,7 @@ class VM(qtest.QEMUQtestMachine):
             options.append('file=%s' % path)
             options.append('format=%s' % format)
             options.append('cache=%s' % cachemode)
+            options.append('aio=%s' % aiomode)
 
         if opts:
             options.append(opts)
@@ -904,6 +906,10 @@ def verify_cache_mode(supported_cache_modes=[]):
     if supported_cache_modes and (cachemode not in supported_cache_modes):
         notrun('not suitable for this cache mode: %s' % cachemode)
 
+def verify_aio_mode(supported_aio_modes=[]):
+    if supported_aio_modes and (aiomode not in supported_aio_modes):
+        notrun('not suitable for this aio mode: %s' % aiomode)
+
 def supports_quorum():
     return 'quorum' in qemu_img_pipe('--help')
 
@@ -990,8 +996,9 @@ def execute_unittest(output, verbosity, debug):
 
 def execute_test(test_function=None,
                  supported_fmts=[], supported_oses=['linux'],
-                 supported_cache_modes=[], unsupported_fmts=[],
-                 supported_protocols=[], unsupported_protocols=[]):
+                 supported_cache_modes=[], supported_aio_modes={},
+                 unsupported_fmts=[], supported_protocols=[],
+                 unsupported_protocols=[]):
     """Run either unittest or script-style tests."""
 
     # We are using TEST_DIR and QEMU_DEFAULT_MACHINE as proxies to
@@ -1008,6 +1015,7 @@ def execute_test(test_function=None,
     verify_protocol(supported_protocols, unsupported_protocols)
     verify_platform(supported_oses)
     verify_cache_mode(supported_cache_modes)
+    verify_aio_mode(supported_aio_modes)
 
     if debug:
         output = sys.stdout
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 15/15] tests/qemu-iotests: use AIOMODE with various tests
  2019-12-18 16:32 [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
                   ` (13 preceding siblings ...)
  2019-12-18 16:32 ` [PATCH v3 14/15] tests/qemu-iotests: enable testing with aio options Stefan Hajnoczi
@ 2019-12-18 16:32 ` Stefan Hajnoczi
  2020-01-10  9:55 ` [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
  2020-01-13 11:58 ` Stefano Garzarella
  16 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-12-18 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, oleksandr, Julia Suvorova,
	Markus Armbruster, Max Reitz, Stefan Hajnoczi, Paolo Bonzini,
	Aarushi Mehta

From: Aarushi Mehta <mehta.aaru20@gmail.com>

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v11:
 * Drop line-wrapping changes that accidentally broke qemu-iotests
---
 tests/qemu-iotests/028 |  2 +-
 tests/qemu-iotests/058 |  2 +-
 tests/qemu-iotests/089 |  4 ++--
 tests/qemu-iotests/091 |  4 ++--
 tests/qemu-iotests/109 |  2 +-
 tests/qemu-iotests/147 |  5 +++--
 tests/qemu-iotests/181 |  8 ++++----
 tests/qemu-iotests/183 |  4 ++--
 tests/qemu-iotests/185 | 10 +++++-----
 tests/qemu-iotests/200 |  2 +-
 tests/qemu-iotests/201 |  8 ++++----
 11 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028
index bba1ee59ae..d4d972c884 100755
--- a/tests/qemu-iotests/028
+++ b/tests/qemu-iotests/028
@@ -108,7 +108,7 @@ echo block-backup
 echo
 
 qemu_comm_method="monitor"
-_launch_qemu -drive file="${TEST_IMG}",cache=${CACHEMODE},id=disk
+_launch_qemu -drive file="${TEST_IMG}",cache=${CACHEMODE},aio=${AIOMODE},id=disk
 h=$QEMU_HANDLE
 if [ "${VALGRIND_QEMU}" == "y" ]; then
     QEMU_COMM_TIMEOUT=7
diff --git a/tests/qemu-iotests/058 b/tests/qemu-iotests/058
index 8c3212a72f..38d1ed90c0 100755
--- a/tests/qemu-iotests/058
+++ b/tests/qemu-iotests/058
@@ -64,7 +64,7 @@ nbd_snapshot_img="nbd:unix:$nbd_unix_socket"
 converted_image=$TEST_IMG.converted
 
 # Use -f raw instead of -f $IMGFMT for the NBD connection
-QEMU_IO_NBD="$QEMU_IO -f raw --cache=$CACHEMODE"
+QEMU_IO_NBD="$QEMU_IO -f raw --cache=$CACHEMODE --aio=$AIOMODE"
 
 echo
 echo "== preparing image =="
diff --git a/tests/qemu-iotests/089 b/tests/qemu-iotests/089
index ad029f1f09..059ad75e28 100755
--- a/tests/qemu-iotests/089
+++ b/tests/qemu-iotests/089
@@ -64,7 +64,7 @@ $QEMU_IO -c 'write -P 42 0 512' -c 'write -P 23 512 512' \
 
 $QEMU_IMG convert -f raw -O $IMGFMT "$TEST_IMG.base" "$TEST_IMG"
 
-$QEMU_IO_PROG --cache $CACHEMODE \
+$QEMU_IO_PROG --cache $CACHEMODE --aio $AIOMODE \
          -c 'read -P 42 0 512' -c 'read -P 23 512 512' \
          -c 'read -P 66 1024 512' "json:{
     \"driver\": \"$IMGFMT\",
@@ -111,7 +111,7 @@ $QEMU_IO -c 'write -P 42 0x38000 512' "$TEST_IMG" | _filter_qemu_io
 
 # The "image.filename" part tests whether "a": { "b": "c" } and "a.b": "c" do
 # the same (which they should).
-$QEMU_IO_PROG --cache $CACHEMODE \
+$QEMU_IO_PROG --cache $CACHEMODE --aio $AIOMODE  \
      -c 'read -P 42 0x38000 512' "json:{
     \"driver\": \"$IMGFMT\",
     \"file\": {
diff --git a/tests/qemu-iotests/091 b/tests/qemu-iotests/091
index f4b44659ae..3a29469324 100755
--- a/tests/qemu-iotests/091
+++ b/tests/qemu-iotests/091
@@ -60,13 +60,13 @@ echo === Starting QEMU VM1 ===
 echo
 
 qemu_comm_method="monitor"
-_launch_qemu -drive file="${TEST_IMG}",cache=${CACHEMODE},id=disk
+_launch_qemu -drive file="${TEST_IMG}",cache=${CACHEMODE},aio=${AIOMODE},id=disk
 h1=$QEMU_HANDLE
 
 echo
 echo === Starting QEMU VM2 ===
 echo
-_launch_qemu -drive file="${TEST_IMG}",cache=${CACHEMODE},id=disk \
+_launch_qemu -drive file="${TEST_IMG}",cache=${CACHEMODE},aio=${AIOMODE},id=disk \
              -incoming "exec: cat '${MIG_FIFO}'"
 h2=$QEMU_HANDLE
 
diff --git a/tests/qemu-iotests/109 b/tests/qemu-iotests/109
index 9897ceb6cd..3a3dcf94eb 100755
--- a/tests/qemu-iotests/109
+++ b/tests/qemu-iotests/109
@@ -52,7 +52,7 @@ run_qemu()
     local qmp_format="$3"
     local qmp_event="$4"
 
-    _launch_qemu -drive file="${source_img}",format=raw,cache=${CACHEMODE},id=src
+    _launch_qemu -drive file="${source_img}",format=raw,cache=${CACHEMODE},aio=${AIOMODE},id=src
     _send_qemu_cmd $QEMU_HANDLE "{ 'execute': 'qmp_capabilities' }" "return"
 
     _send_qemu_cmd $QEMU_HANDLE \
diff --git a/tests/qemu-iotests/147 b/tests/qemu-iotests/147
index 03fc2fabcf..2b6f859a09 100755
--- a/tests/qemu-iotests/147
+++ b/tests/qemu-iotests/147
@@ -24,7 +24,7 @@ import socket
 import stat
 import time
 import iotests
-from iotests import cachemode, imgfmt, qemu_img, qemu_nbd, qemu_nbd_early_pipe
+from iotests import cachemode, aiomode, imgfmt, qemu_img, qemu_nbd, qemu_nbd_early_pipe
 
 NBD_PORT_START      = 32768
 NBD_PORT_END        = NBD_PORT_START + 1024
@@ -134,7 +134,8 @@ class BuiltinNBD(NBDBlockdevAddBase):
         self.server.add_drive_raw('if=none,id=nbd-export,' +
                                   'file=%s,' % test_img +
                                   'format=%s,' % imgfmt +
-                                  'cache=%s' % cachemode)
+                                  'cache=%s' % cachemode +
+                                  'aio=%s' % aiomode)
         self.server.launch()
 
     def tearDown(self):
diff --git a/tests/qemu-iotests/181 b/tests/qemu-iotests/181
index 378c2899d1..438c2dcd80 100755
--- a/tests/qemu-iotests/181
+++ b/tests/qemu-iotests/181
@@ -58,20 +58,20 @@ qemu_comm_method="monitor"
 
 if [ "$IMGOPTSSYNTAX" = "true" ]; then
     _launch_qemu \
-        -drive "${TEST_IMG}",cache=${CACHEMODE},id=disk
+        -drive "${TEST_IMG}",cache=${CACHEMODE},aio=$AIOMODE,id=disk
 else
     _launch_qemu \
-        -drive file="${TEST_IMG}",cache=${CACHEMODE},driver=$IMGFMT,id=disk
+        -drive file="${TEST_IMG}",cache=${CACHEMODE},aio=$AIOMODE,driver=$IMGFMT,id=disk
 fi
 src=$QEMU_HANDLE
 
 if [ "$IMGOPTSSYNTAX" = "true" ]; then
     _launch_qemu \
-        -drive "${TEST_IMG}",cache=${CACHEMODE},id=disk \
+        -drive "${TEST_IMG}",cache=${CACHEMODE},aio=$AIOMODE,id=disk \
         -incoming "unix:${MIG_SOCKET}"
 else
     _launch_qemu \
-        -drive file="${TEST_IMG}",cache=${CACHEMODE},driver=$IMGFMT,id=disk \
+        -drive file="${TEST_IMG}",cache=${CACHEMODE},aio=$AIOMODE,driver=$IMGFMT,id=disk \
         -incoming "unix:${MIG_SOCKET}"
 fi
 dest=$QEMU_HANDLE
diff --git a/tests/qemu-iotests/183 b/tests/qemu-iotests/183
index bced83fae0..94334a793d 100755
--- a/tests/qemu-iotests/183
+++ b/tests/qemu-iotests/183
@@ -56,12 +56,12 @@ echo
 qemu_comm_method="qmp"
 
 _launch_qemu \
-    -drive file="${TEST_IMG}",cache=$CACHEMODE,driver=$IMGFMT,id=disk
+    -drive file="${TEST_IMG}",cache=$CACHEMODE,aio=$AIOMODE,driver=$IMGFMT,id=disk
 src=$QEMU_HANDLE
 _send_qemu_cmd $src "{ 'execute': 'qmp_capabilities' }" 'return'
 
 _launch_qemu \
-    -drive file="${TEST_IMG}.dest",cache=$CACHEMODE,driver=$IMGFMT,id=disk \
+    -drive file="${TEST_IMG}.dest",cache=$CACHEMODE,aio=$AIOMODE,driver=$IMGFMT,id=disk \
     -incoming "unix:${MIG_SOCKET}"
 dest=$QEMU_HANDLE
 _send_qemu_cmd $dest "{ 'execute': 'qmp_capabilities' }" 'return'
diff --git a/tests/qemu-iotests/185 b/tests/qemu-iotests/185
index 454ff600cc..53aa4e89ed 100755
--- a/tests/qemu-iotests/185
+++ b/tests/qemu-iotests/185
@@ -54,7 +54,7 @@ echo
 qemu_comm_method="qmp"
 
 _launch_qemu \
-    -drive file="${TEST_IMG}.base",cache=$CACHEMODE,driver=$IMGFMT,id=disk
+    -drive file="${TEST_IMG}.base",cache=$CACHEMODE,aio=$AIOMODE,driver=$IMGFMT,id=disk
 h=$QEMU_HANDLE
 _send_qemu_cmd $h "{ 'execute': 'qmp_capabilities' }" 'return'
 
@@ -125,7 +125,7 @@ echo === Start active commit job and exit qemu ===
 echo
 
 _launch_qemu \
-    -drive file="${TEST_IMG}",cache=$CACHEMODE,driver=$IMGFMT,id=disk
+    -drive file="${TEST_IMG}",cache=$CACHEMODE,aio=$AIOMODE,driver=$IMGFMT,id=disk
 h=$QEMU_HANDLE
 _send_qemu_cmd $h "{ 'execute': 'qmp_capabilities' }" 'return'
 
@@ -147,7 +147,7 @@ echo === Start mirror job and exit qemu ===
 echo
 
 _launch_qemu \
-    -drive file="${TEST_IMG}",cache=$CACHEMODE,driver=$IMGFMT,id=disk
+    -drive file="${TEST_IMG}",cache=$CACHEMODE,aio=$AIOMODE,driver=$IMGFMT,id=disk
 h=$QEMU_HANDLE
 _send_qemu_cmd $h "{ 'execute': 'qmp_capabilities' }" 'return'
 
@@ -172,7 +172,7 @@ echo === Start backup job and exit qemu ===
 echo
 
 _launch_qemu \
-    -drive file="${TEST_IMG}",cache=$CACHEMODE,driver=$IMGFMT,id=disk
+    -drive file="${TEST_IMG}",cache=$CACHEMODE,aio=$AIOMODE,driver=$IMGFMT,id=disk
 h=$QEMU_HANDLE
 _send_qemu_cmd $h "{ 'execute': 'qmp_capabilities' }" 'return'
 
@@ -196,7 +196,7 @@ echo === Start streaming job and exit qemu ===
 echo
 
 _launch_qemu \
-    -drive file="${TEST_IMG}",cache=$CACHEMODE,driver=$IMGFMT,id=disk
+    -drive file="${TEST_IMG}",cache=$CACHEMODE,aio=$AIOMODE,driver=$IMGFMT,id=disk
 h=$QEMU_HANDLE
 _send_qemu_cmd $h "{ 'execute': 'qmp_capabilities' }" 'return'
 
diff --git a/tests/qemu-iotests/200 b/tests/qemu-iotests/200
index 72d431f251..49f93c5df2 100755
--- a/tests/qemu-iotests/200
+++ b/tests/qemu-iotests/200
@@ -66,7 +66,7 @@ echo === Starting QEMU VM ===
 echo
 qemu_comm_method="qmp"
 _launch_qemu -object iothread,id=iothread0 $virtio_scsi \
-             -drive file="${TEST_IMG}",media=disk,if=none,cache=$CACHEMODE,id=drive_sysdisk,format=$IMGFMT \
+             -drive file="${TEST_IMG}",media=disk,if=none,cache=$CACHEMODE,aio=$AIOMODE,id=drive_sysdisk,format=$IMGFMT \
              -device scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0
 h1=$QEMU_HANDLE
 
diff --git a/tests/qemu-iotests/201 b/tests/qemu-iotests/201
index 86fa37e714..0ec0b07c36 100755
--- a/tests/qemu-iotests/201
+++ b/tests/qemu-iotests/201
@@ -58,20 +58,20 @@ qemu_comm_method="monitor"
 
 if [ "$IMGOPTSSYNTAX" = "true" ]; then
     _launch_qemu \
-        -drive "${TEST_IMG}",cache=${CACHEMODE},id=disk
+        -drive "${TEST_IMG}",cache=${CACHEMODE},aio=$AIOMODE,id=disk
 else
     _launch_qemu \
-        -drive file="${TEST_IMG}",cache=${CACHEMODE},driver=$IMGFMT,id=disk
+        -drive file="${TEST_IMG}",cache=${CACHEMODE},aio=$AIOMODE,driver=$IMGFMT,id=disk
 fi
 src=$QEMU_HANDLE
 
 if [ "$IMGOPTSSYNTAX" = "true" ]; then
     _launch_qemu \
-        -drive "${TEST_IMG}",cache=${CACHEMODE},id=disk \
+        -drive "${TEST_IMG}",cache=${CACHEMODE},aio=$AIOMODE,id=disk \
         -incoming "unix:${MIG_SOCKET}"
 else
     _launch_qemu \
-        -drive file="${TEST_IMG}",cache=${CACHEMODE},driver=$IMGFMT,id=disk \
+        -drive file="${TEST_IMG}",cache=${CACHEMODE},aio=$AIOMODE,driver=$IMGFMT,id=disk \
         -incoming "unix:${MIG_SOCKET}"
 fi
 dest=$QEMU_HANDLE
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine
  2019-12-18 16:32 [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
                   ` (14 preceding siblings ...)
  2019-12-18 16:32 ` [PATCH v3 15/15] tests/qemu-iotests: use AIOMODE with various tests Stefan Hajnoczi
@ 2020-01-10  9:55 ` Stefan Hajnoczi
  2020-01-13 11:58 ` Stefano Garzarella
  16 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2020-01-10  9:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, oleksandr, Julia Suvorova,
	Markus Armbruster, Max Reitz, Paolo Bonzini, Aarushi Mehta

[-- Attachment #1: Type: text/plain, Size: 5234 bytes --]

On Wed, Dec 18, 2019 at 04:32:13PM +0000, Stefan Hajnoczi wrote:
> v12:
>  * Reword BlockdevAioOptions QAPI schema commit description [Markus]
>  * Increase QAPI "Since: 4.2" to "Since: 5.0"
>  * Explain rationale for io_uring stubs in commit description [Kevin]
>  * Tried to use file.aio=io_uring instead of BDRV_O_IO_URING but it's really
>    hard to make qemu-iotests work.  Tests build blkdebug: and other graphs so
>    the syntax for io_uring is dependent on the test case.  I scrapped this
>    approach and went back to a global flag.
> 
> v11:
>  * Drop fd registration because it breaks QEMU's file locking and will need to
>    be resolved in a separate patch series
>  * Drop line-wrapping changes that accidentally broke several qemu-iotests
> 
> v10:
>  * Dropped kernel submission queue polling, it requires root and has additional
>    limitations.  It should be benchmarked and considered for inclusion later,
>    maybe even together with kernel side changes.
>  * Add io_uring_register_files() return value to trace_luring_fd_register()
>  * Fix indentation in luring_fd_unregister()
>  * Set s->fd_reg.fd_array to NULL after g_free() to avoid dangling pointers
>  * Simplify fd registration code
>  * Add luring_fd_unregister() and call it from file-posix.c to prevent
>    fd leaks
>  * Add trace_luring_fd_unregister() trace event
>  * Add missing space to qemu-img command-line documentation
>  * Update MAINTAINERS file [Julia]
>  * Rename MAX_EVENTS to MAX_ENTRIES [Julia]
>  * Define ioq_submit() before callers so the prototype isn't necessary [Julia]
>  * Declare variables at the beginning of the block in luring_init() [Julia]
> 
> This patch series is based on Aarushi Mehta's v9 patch series written for
> Google Summer of Code 2019:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2019-08/msg00179.html
> 
> It adds a new AIO engine that uses the new Linux io_uring API.  This is the
> successor to Linux AIO with a number of improvements:
> 1. Both O_DIRECT and buffered I/O work
> 2. fdatasync(2) is supported (no need for a separate thread pool!)
> 3. True async behavior so the syscall doesn't block (Linux AIO got there to some degree...)
> 4. Advanced performance optimizations are available (file registration, memory
>    buffer registration, completion polling, submission polling).
> 
> Since Aarushi has been busy, I have taken up this patch series.  Booting a
> guest works with -drive aio=io_uring and -drive aio=io_uring,cache=none with a
> raw file on XFS.
> 
> I currently recommend using -drive aio=io_uring only with host block devices
> (like NVMe devices).  As of Linux v5.4-rc1 I still hit kernel bugs when using
> image files on ext4 or XFS.
> 
> Aarushi Mehta (15):
>   configure: permit use of io_uring
>   qapi/block-core: add option for io_uring
>   block/block: add BDRV flag for io_uring
>   block/io_uring: implements interfaces for io_uring
>   stubs: add stubs for io_uring interface
>   util/async: add aio interfaces for io_uring
>   blockdev: adds bdrv_parse_aio to use io_uring
>   block/file-posix.c: extend to use io_uring
>   block: add trace events for io_uring
>   block/io_uring: adds userspace completion polling
>   qemu-io: adds option to use aio engine
>   qemu-img: adds option to use aio engine for benchmarking
>   qemu-nbd: adds option for aio engines
>   tests/qemu-iotests: enable testing with aio options
>   tests/qemu-iotests: use AIOMODE with various tests
> 
>  MAINTAINERS                   |   9 +
>  block.c                       |  22 ++
>  block/Makefile.objs           |   3 +
>  block/file-posix.c            |  95 ++++++--
>  block/io_uring.c              | 433 ++++++++++++++++++++++++++++++++++
>  block/trace-events            |  12 +
>  blockdev.c                    |  12 +-
>  configure                     |  27 +++
>  include/block/aio.h           |  16 +-
>  include/block/block.h         |   2 +
>  include/block/raw-aio.h       |  12 +
>  qapi/block-core.json          |   4 +-
>  qemu-img-cmds.hx              |   4 +-
>  qemu-img.c                    |  11 +-
>  qemu-img.texi                 |   5 +-
>  qemu-io.c                     |  25 +-
>  qemu-nbd.c                    |  12 +-
>  qemu-nbd.texi                 |   4 +-
>  stubs/Makefile.objs           |   1 +
>  stubs/io_uring.c              |  32 +++
>  tests/qemu-iotests/028        |   2 +-
>  tests/qemu-iotests/058        |   2 +-
>  tests/qemu-iotests/089        |   4 +-
>  tests/qemu-iotests/091        |   4 +-
>  tests/qemu-iotests/109        |   2 +-
>  tests/qemu-iotests/147        |   5 +-
>  tests/qemu-iotests/181        |   8 +-
>  tests/qemu-iotests/183        |   4 +-
>  tests/qemu-iotests/185        |  10 +-
>  tests/qemu-iotests/200        |   2 +-
>  tests/qemu-iotests/201        |   8 +-
>  tests/qemu-iotests/check      |  15 +-
>  tests/qemu-iotests/common.rc  |  14 ++
>  tests/qemu-iotests/iotests.py |  12 +-
>  util/async.c                  |  36 +++
>  35 files changed, 793 insertions(+), 76 deletions(-)
>  create mode 100644 block/io_uring.c
>  create mode 100644 stubs/io_uring.c

Ping?  Any further comments?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 04/15] block/io_uring: implements interfaces for io_uring
  2019-12-18 16:32 ` [PATCH v3 04/15] block/io_uring: implements interfaces " Stefan Hajnoczi
@ 2020-01-13 11:24   ` Stefano Garzarella
  2020-01-14 10:40     ` Stefan Hajnoczi
  0 siblings, 1 reply; 22+ messages in thread
From: Stefano Garzarella @ 2020-01-13 11:24 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, qemu-block, oleksandr, Julia Suvorova,
	qemu-devel, Markus Armbruster, Paolo Bonzini, Max Reitz,
	Aarushi Mehta

On Wed, Dec 18, 2019 at 04:32:17PM +0000, Stefan Hajnoczi wrote:
> From: Aarushi Mehta <mehta.aaru20@gmail.com>
> 
> Aborts when sqe fails to be set as sqes cannot be returned to the
> ring. Adds slow path for short reads for older kernels
> 
> Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> v11:
>  * Fix git bisect compilation breakage: move trace_luring_init_state()
>    into the tracing commit.
> v10:
>  * Update MAINTAINERS file [Julia]
>  * Rename MAX_EVENTS to MAX_ENTRIES [Julia]
>  * Define ioq_submit() before callers so the prototype isn't necessary [Julia]
>  * Declare variables at the beginning of the block in luring_init() [Julia]
> ---
>  MAINTAINERS             |   8 +
>  block/Makefile.objs     |   3 +
>  block/io_uring.c        | 401 ++++++++++++++++++++++++++++++++++++++++
>  include/block/aio.h     |  16 +-
>  include/block/raw-aio.h |  12 ++
>  5 files changed, 439 insertions(+), 1 deletion(-)
>  create mode 100644 block/io_uring.c

There are few double spaces in several comment blocks, so if you need to
respin maybe you can clean these, otherwise we can do later.

The patch LGTM, but I don't have a lot of experience with io_uring until
now, so

Acked-by: Stefano Garzarella <sgarzare@redhat.com>


I really interested on it and I'll try to contribute on this new AIO engine.

> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 740401bcbb..fc7f53b229 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2596,6 +2596,14 @@ F: block/file-posix.c
>  F: block/file-win32.c
>  F: block/win32-aio.c
>  
> +Linux io_uring
> +M: Aarushi Mehta <mehta.aaru20@gmail.com>
> +M: Julia Suvorova <jusual@redhat.com>
> +M: Stefan Hajnoczi <stefanha@redhat.com>
> +L: qemu-block@nongnu.org
> +S: Maintained
> +F: block/io_uring.c
> +
>  qcow2
>  M: Kevin Wolf <kwolf@redhat.com>
>  M: Max Reitz <mreitz@redhat.com>
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index e394fe0b6c..035abb9c5c 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -18,6 +18,7 @@ block-obj-y += block-backend.o snapshot.o qapi.o
>  block-obj-$(CONFIG_WIN32) += file-win32.o win32-aio.o
>  block-obj-$(CONFIG_POSIX) += file-posix.o
>  block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
> +block-obj-$(CONFIG_LINUX_IO_URING) += io_uring.o
>  block-obj-y += null.o mirror.o commit.o io.o create.o
>  block-obj-y += throttle-groups.o
>  block-obj-$(CONFIG_LINUX) += nvme.o
> @@ -65,5 +66,7 @@ block-obj-$(if $(CONFIG_LZFSE),m,n) += dmg-lzfse.o
>  dmg-lzfse.o-libs   := $(LZFSE_LIBS)
>  qcow.o-libs        := -lz
>  linux-aio.o-libs   := -laio
> +io_uring.o-cflags  := $(LINUX_IO_URING_CFLAGS)
> +io_uring.o-libs    := $(LINUX_IO_URING_LIBS)
>  parallels.o-cflags := $(LIBXML2_CFLAGS)
>  parallels.o-libs   := $(LIBXML2_LIBS)
> diff --git a/block/io_uring.c b/block/io_uring.c
> new file mode 100644
> index 0000000000..bb433a685b
> --- /dev/null
> +++ b/block/io_uring.c
> @@ -0,0 +1,401 @@
> +/*
> + * Linux io_uring support.
> + *
> + * Copyright (C) 2009 IBM, Corp.
> + * Copyright (C) 2009 Red Hat, Inc.
> + * Copyright (C) 2019 Aarushi Mehta
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +#include "qemu/osdep.h"
> +#include <liburing.h>
> +#include "qemu-common.h"
> +#include "block/aio.h"
> +#include "qemu/queue.h"
> +#include "block/block.h"
> +#include "block/raw-aio.h"
> +#include "qemu/coroutine.h"
> +#include "qapi/error.h"
> +
> +/* io_uring ring size */
> +#define MAX_ENTRIES 128
> +
> +typedef struct LuringAIOCB {
> +    Coroutine *co;
> +    struct io_uring_sqe sqeq;
> +    ssize_t ret;
> +    QEMUIOVector *qiov;
> +    bool is_read;
> +    QSIMPLEQ_ENTRY(LuringAIOCB) next;
> +
> +    /*
> +     * Buffered reads may require resubmission, see
> +     * luring_resubmit_short_read().
> +     */
> +    int total_read;
> +    QEMUIOVector resubmit_qiov;
> +} LuringAIOCB;
> +
> +typedef struct LuringQueue {
> +    int plugged;
> +    unsigned int in_queue;
> +    unsigned int in_flight;
> +    bool blocked;
> +    QSIMPLEQ_HEAD(, LuringAIOCB) submit_queue;
> +} LuringQueue;
> +
> +typedef struct LuringState {
> +    AioContext *aio_context;
> +
> +    struct io_uring ring;
> +
> +    /* io queue for submit at batch.  Protected by AioContext lock. */
> +    LuringQueue io_q;
> +
> +    /* I/O completion processing.  Only runs in I/O thread.  */
> +    QEMUBH *completion_bh;
> +} LuringState;
> +
> +/**
> + * luring_resubmit:
> + *
> + * Resubmit a request by appending it to submit_queue.  The caller must ensure
> + * that ioq_submit() is called later so that submit_queue requests are started.
> + */
> +static void luring_resubmit(LuringState *s, LuringAIOCB *luringcb)
> +{
> +    QSIMPLEQ_INSERT_TAIL(&s->io_q.submit_queue, luringcb, next);
> +    s->io_q.in_queue++;
> +}
> +
> +/**
> + * luring_resubmit_short_read:
> + *
> + * Before Linux commit 9d93a3f5a0c ("io_uring: punt short reads to async
> + * context") a buffered I/O request with the start of the file range in the
> + * page cache could result in a short read.  Applications need to resubmit the
> + * remaining read request.
> + *
> + * This is a slow path but recent kernels never take it.
> + */
> +static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb,
> +                                       int nread)
> +{
> +    QEMUIOVector *resubmit_qiov;
> +    size_t remaining;
> +
> +    /* Update read position */
> +    luringcb->total_read = nread;
> +    remaining = luringcb->qiov->size - luringcb->total_read;
> +
> +    /* Shorten qiov */
> +    resubmit_qiov = &luringcb->resubmit_qiov;
> +    if (resubmit_qiov->iov == NULL) {
> +        qemu_iovec_init(resubmit_qiov, luringcb->qiov->niov);
> +    } else {
> +        qemu_iovec_reset(resubmit_qiov);
> +    }
> +    qemu_iovec_concat(resubmit_qiov, luringcb->qiov, luringcb->total_read,
> +                      remaining);
> +
> +    /* Update sqe */
> +    luringcb->sqeq.off = nread;
> +    luringcb->sqeq.addr = (__u64)(uintptr_t)luringcb->resubmit_qiov.iov;
> +    luringcb->sqeq.len = luringcb->resubmit_qiov.niov;
> +
> +    luring_resubmit(s, luringcb);
> +}
> +
> +/**
> + * luring_process_completions:
> + * @s: AIO state
> + *
> + * Fetches completed I/O requests, consumes cqes and invokes their callbacks
> + * The function is somewhat tricky because it supports nested event loops, for
> + * example when a request callback invokes aio_poll().
> + *
> + * Function schedules BH completion so it  can be called again in a nested
> + * event loop.  When there are no events left  to complete the BH is being
> + * canceled.
> + *
> + */
> +static void luring_process_completions(LuringState *s)
> +{
> +    struct io_uring_cqe *cqes;
> +    int total_bytes;
> +    /*
> +     * Request completion callbacks can run the nested event loop.
> +     * Schedule ourselves so the nested event loop will "see" remaining
> +     * completed requests and process them.  Without this, completion
> +     * callbacks that wait for other requests using a nested event loop
> +     * would hang forever.
> +     *
> +     * This workaround is needed because io_uring uses poll_wait, which
> +     * is woken up when new events are added to the uring, thus polling on
> +     * the same uring fd will block unless more events are received.
> +     *
> +     * Other leaf block drivers (drivers that access the data themselves)
> +     * are networking based, so they poll sockets for data and run the
> +     * correct coroutine.
> +     */
> +    qemu_bh_schedule(s->completion_bh);
> +
> +    while (io_uring_peek_cqe(&s->ring, &cqes) == 0) {
> +        LuringAIOCB *luringcb;
> +        int ret;
> +
> +        if (!cqes) {
> +            break;
> +        }
> +
> +        luringcb = io_uring_cqe_get_data(cqes);
> +        ret = cqes->res;
> +        io_uring_cqe_seen(&s->ring, cqes);
> +        cqes = NULL;
> +
> +        /* Change counters one-by-one because we can be nested. */
> +        s->io_q.in_flight--;
> +
> +        /* total_read is non-zero only for resubmitted read requests */
> +        total_bytes = ret + luringcb->total_read;
> +
> +        if (ret < 0) {
> +            if (ret == -EINTR) {
> +                luring_resubmit(s, luringcb);
> +                continue;
> +            }
> +        } else if (!luringcb->qiov) {
> +            goto end;
> +        } else if (total_bytes == luringcb->qiov->size) {
> +            ret = 0;
> +        /* Only read/write */
> +        } else {
> +            /* Short Read/Write */
> +            if (luringcb->is_read) {
> +                if (ret > 0) {
> +                    luring_resubmit_short_read(s, luringcb, ret);
> +                    continue;
> +                } else {
> +                    /* Pad with zeroes */
> +                    qemu_iovec_memset(luringcb->qiov, total_bytes, 0,
> +                                      luringcb->qiov->size - total_bytes);
> +                    ret = 0;
> +                }
> +            } else {
> +                ret = -ENOSPC;;
> +            }
> +        }
> +end:
> +        luringcb->ret = ret;
> +        qemu_iovec_destroy(&luringcb->resubmit_qiov);
> +
> +        /*
> +         * If the coroutine is already entered it must be in ioq_submit()
> +         * and will notice luringcb->ret has been filled in when it
> +         * eventually runs later. Coroutines cannot be entered recursively
> +         * so avoid doing that!
> +         */
> +        if (!qemu_coroutine_entered(luringcb->co)) {
> +            aio_co_wake(luringcb->co);
> +        }
> +    }
> +    qemu_bh_cancel(s->completion_bh);
> +}
> +
> +static int ioq_submit(LuringState *s)
> +{
> +    int ret = 0;
> +    LuringAIOCB *luringcb, *luringcb_next;
> +
> +    while (s->io_q.in_queue > 0) {
> +        /*
> +         * Try to fetch sqes from the ring for requests waiting in
> +         * the overflow queue
> +         */
> +        QSIMPLEQ_FOREACH_SAFE(luringcb, &s->io_q.submit_queue, next,
> +                              luringcb_next) {
> +            struct io_uring_sqe *sqes = io_uring_get_sqe(&s->ring);
> +            if (!sqes) {
> +                break;
> +            }
> +            /* Prep sqe for submission */
> +            *sqes = luringcb->sqeq;
> +            QSIMPLEQ_REMOVE_HEAD(&s->io_q.submit_queue, next);
> +        }
> +        ret = io_uring_submit(&s->ring);
> +        /* Prevent infinite loop if submission is refused */
> +        if (ret <= 0) {
> +            if (ret == -EAGAIN) {
> +                continue;
> +            }
> +            break;
> +        }
> +        s->io_q.in_flight += ret;
> +        s->io_q.in_queue  -= ret;
> +    }
> +    s->io_q.blocked = (s->io_q.in_queue > 0);
> +
> +    if (s->io_q.in_flight) {
> +        /*
> +         * We can try to complete something just right away if there are
> +         * still requests in-flight.
> +         */
> +        luring_process_completions(s);
> +    }
> +    return ret;
> +}
> +
> +static void luring_process_completions_and_submit(LuringState *s)
> +{
> +    aio_context_acquire(s->aio_context);
> +    luring_process_completions(s);
> +
> +    if (!s->io_q.plugged && s->io_q.in_queue > 0) {
> +        ioq_submit(s);
> +    }
> +    aio_context_release(s->aio_context);
> +}
> +
> +static void qemu_luring_completion_bh(void *opaque)
> +{
> +    LuringState *s = opaque;
> +    luring_process_completions_and_submit(s);
> +}
> +
> +static void qemu_luring_completion_cb(void *opaque)
> +{
> +    LuringState *s = opaque;
> +    luring_process_completions_and_submit(s);
> +}
> +
> +static void ioq_init(LuringQueue *io_q)
> +{
> +    QSIMPLEQ_INIT(&io_q->submit_queue);
> +    io_q->plugged = 0;
> +    io_q->in_queue = 0;
> +    io_q->in_flight = 0;
> +    io_q->blocked = false;
> +}
> +
> +void luring_io_plug(BlockDriverState *bs, LuringState *s)
> +{
> +    s->io_q.plugged++;
> +}
> +
> +void luring_io_unplug(BlockDriverState *bs, LuringState *s)
> +{
> +    assert(s->io_q.plugged);
> +    if (--s->io_q.plugged == 0 &&
> +        !s->io_q.blocked && s->io_q.in_queue > 0) {
> +        ioq_submit(s);
> +    }
> +}
> +
> +/**
> + * luring_do_submit:
> + * @fd: file descriptor for I/O
> + * @luringcb: AIO control block
> + * @s: AIO state
> + * @offset: offset for request
> + * @type: type of request
> + *
> + * Fetches sqes from ring, adds to pending queue and preps them
> + *
> + */
> +static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
> +                            uint64_t offset, int type)
> +{
> +    struct io_uring_sqe *sqes = &luringcb->sqeq;
> +
> +    switch (type) {
> +    case QEMU_AIO_WRITE:
> +        io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
> +                             luringcb->qiov->niov, offset);
> +        break;
> +    case QEMU_AIO_READ:
> +        io_uring_prep_readv(sqes, fd, luringcb->qiov->iov,
> +                            luringcb->qiov->niov, offset);
> +        break;
> +    case QEMU_AIO_FLUSH:
> +        io_uring_prep_fsync(sqes, fd, IORING_FSYNC_DATASYNC);
> +        break;
> +    default:
> +        fprintf(stderr, "%s: invalid AIO request type, aborting 0x%x.\n",
> +                        __func__, type);
> +        abort();
> +    }
> +    io_uring_sqe_set_data(sqes, luringcb);
> +
> +    QSIMPLEQ_INSERT_TAIL(&s->io_q.submit_queue, luringcb, next);
> +    s->io_q.in_queue++;
> +
> +    if (!s->io_q.blocked &&
> +        (!s->io_q.plugged ||
> +         s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES)) {
> +        return ioq_submit(s);
> +    }
> +    return 0;
> +}
> +
> +int coroutine_fn luring_co_submit(BlockDriverState *bs, LuringState *s, int fd,
> +                                  uint64_t offset, QEMUIOVector *qiov, int type)
> +{
> +    int ret;
> +    LuringAIOCB luringcb = {
> +        .co         = qemu_coroutine_self(),
> +        .ret        = -EINPROGRESS,
> +        .qiov       = qiov,
> +        .is_read    = (type == QEMU_AIO_READ),
> +    };
> +
> +    ret = luring_do_submit(fd, &luringcb, s, offset, type);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    if (luringcb.ret == -EINPROGRESS) {
> +        qemu_coroutine_yield();
> +    }
> +    return luringcb.ret;
> +}
> +
> +void luring_detach_aio_context(LuringState *s, AioContext *old_context)
> +{
> +    aio_set_fd_handler(old_context, s->ring.ring_fd, false, NULL, NULL, NULL,
> +                       s);
> +    qemu_bh_delete(s->completion_bh);
> +    s->aio_context = NULL;
> +}
> +
> +void luring_attach_aio_context(LuringState *s, AioContext *new_context)
> +{
> +    s->aio_context = new_context;
> +    s->completion_bh = aio_bh_new(new_context, qemu_luring_completion_bh, s);
> +    aio_set_fd_handler(s->aio_context, s->ring.ring_fd, false,
> +                       qemu_luring_completion_cb, NULL, NULL, s);
> +}
> +
> +LuringState *luring_init(Error **errp)
> +{
> +    int rc;
> +    LuringState *s = g_new0(LuringState, 1);
> +    struct io_uring *ring = &s->ring;
> +
> +    rc = io_uring_queue_init(MAX_ENTRIES, ring, 0);
> +    if (rc < 0) {
> +        error_setg_errno(errp, errno, "failed to init linux io_uring ring");
> +        g_free(s);
> +        return NULL;
> +    }
> +
> +    ioq_init(&s->io_q);
> +    return s;
> +
> +}
> +
> +void luring_cleanup(LuringState *s)
> +{
> +    io_uring_queue_exit(&s->ring);
> +    g_free(s);
> +}
> diff --git a/include/block/aio.h b/include/block/aio.h
> index 6b0d52f732..7ba9bd7874 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -49,6 +49,7 @@ typedef void IOHandler(void *opaque);
>  struct Coroutine;
>  struct ThreadPool;
>  struct LinuxAioState;
> +struct LuringState;
>  
>  struct AioContext {
>      GSource source;
> @@ -117,11 +118,19 @@ struct AioContext {
>      struct ThreadPool *thread_pool;
>  
>  #ifdef CONFIG_LINUX_AIO
> -    /* State for native Linux AIO.  Uses aio_context_acquire/release for
> +    /*
> +     * State for native Linux AIO.  Uses aio_context_acquire/release for
>       * locking.
>       */
>      struct LinuxAioState *linux_aio;
>  #endif
> +#ifdef CONFIG_LINUX_IO_URING
> +    /*
> +     * State for Linux io_uring.  Uses aio_context_acquire/release for
> +     * locking.
> +     */
> +    struct LuringState *linux_io_uring;
> +#endif
>  
>      /* TimerLists for calling timers - one per clock type.  Has its own
>       * locking.
> @@ -386,6 +395,11 @@ struct LinuxAioState *aio_setup_linux_aio(AioContext *ctx, Error **errp);
>  /* Return the LinuxAioState bound to this AioContext */
>  struct LinuxAioState *aio_get_linux_aio(AioContext *ctx);
>  
> +/* Setup the LuringState bound to this AioContext */
> +struct LuringState *aio_setup_linux_io_uring(AioContext *ctx, Error **errp);
> +
> +/* Return the LuringState bound to this AioContext */
> +struct LuringState *aio_get_linux_io_uring(AioContext *ctx);
>  /**
>   * aio_timer_new_with_attrs:
>   * @ctx: the aio context
> diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
> index 4629f24d08..251b10d273 100644
> --- a/include/block/raw-aio.h
> +++ b/include/block/raw-aio.h
> @@ -57,6 +57,18 @@ void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context);
>  void laio_io_plug(BlockDriverState *bs, LinuxAioState *s);
>  void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s);
>  #endif
> +/* io_uring.c - Linux io_uring implementation */
> +#ifdef CONFIG_LINUX_IO_URING
> +typedef struct LuringState LuringState;
> +LuringState *luring_init(Error **errp);
> +void luring_cleanup(LuringState *s);
> +int coroutine_fn luring_co_submit(BlockDriverState *bs, LuringState *s, int fd,
> +                                uint64_t offset, QEMUIOVector *qiov, int type);
> +void luring_detach_aio_context(LuringState *s, AioContext *old_context);
> +void luring_attach_aio_context(LuringState *s, AioContext *new_context);
> +void luring_io_plug(BlockDriverState *bs, LuringState *s);
> +void luring_io_unplug(BlockDriverState *bs, LuringState *s);
> +#endif
>  
>  #ifdef _WIN32
>  typedef struct QEMUWin32AIOState QEMUWin32AIOState;
> -- 
> 2.23.0
> 
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 08/15] block/file-posix.c: extend to use io_uring
  2019-12-18 16:32 ` [PATCH v3 08/15] block/file-posix.c: extend " Stefan Hajnoczi
@ 2020-01-13 11:49   ` Stefano Garzarella
  2020-01-14 10:37     ` Stefan Hajnoczi
  0 siblings, 1 reply; 22+ messages in thread
From: Stefano Garzarella @ 2020-01-13 11:49 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, Maxim Levitsky, qemu-block, oleksandr,
	Julia Suvorova, qemu-devel, Markus Armbruster, Paolo Bonzini,
	Max Reitz, Aarushi Mehta

On Wed, Dec 18, 2019 at 04:32:21PM +0000, Stefan Hajnoczi wrote:
> From: Aarushi Mehta <mehta.aaru20@gmail.com>
> 
> Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
> Reviewed-by: Maxim Levitsky <maximlevitsky@gmail.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/file-posix.c | 95 ++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 75 insertions(+), 20 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 1b805bd938..a42a90e59d 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -156,6 +156,7 @@ typedef struct BDRVRawState {
>      bool has_write_zeroes:1;
>      bool discard_zeroes:1;
>      bool use_linux_aio:1;
> +    bool use_linux_io_uring:1;
>      bool page_cache_inconsistent:1;
>      bool has_fallocate;
>      bool needs_alignment;
> @@ -444,7 +445,7 @@ static QemuOptsList raw_runtime_opts = {
>          {
>              .name = "aio",
>              .type = QEMU_OPT_STRING,
> -            .help = "host AIO implementation (threads, native)",
> +            .help = "host AIO implementation (threads, native, io_uring)",
>          },
>          {
>              .name = "locking",
> @@ -503,9 +504,11 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
>          goto fail;
>      }
>  
> -    aio_default = (bdrv_flags & BDRV_O_NATIVE_AIO)
> -                  ? BLOCKDEV_AIO_OPTIONS_NATIVE
> -                  : BLOCKDEV_AIO_OPTIONS_THREADS;
> +    if (bdrv_flags & BDRV_O_NATIVE_AIO) {
> +        aio_default = BLOCKDEV_AIO_OPTIONS_NATIVE;
> +    } else {
> +        aio_default = BLOCKDEV_AIO_OPTIONS_THREADS;
> +    }

This is only a cosmetic change?

>      aio = qapi_enum_parse(&BlockdevAioOptions_lookup,
>                            qemu_opt_get(opts, "aio"),
>                            aio_default, &local_err);
> @@ -514,7 +517,11 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
>          ret = -EINVAL;
>          goto fail;
>      }
> +
>      s->use_linux_aio = (aio == BLOCKDEV_AIO_OPTIONS_NATIVE);
> +#ifdef CONFIG_LINUX_IO_URING
> +    s->use_linux_io_uring = (aio == BLOCKDEV_AIO_OPTIONS_IO_URING);
> +#endif
>  
>      locking = qapi_enum_parse(&OnOffAuto_lookup,
>                                qemu_opt_get(opts, "locking"),
> @@ -578,7 +585,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
>      s->shared_perm = BLK_PERM_ALL;
>  
>  #ifdef CONFIG_LINUX_AIO
> -     /* Currently Linux does AIO only for files opened with O_DIRECT */
> +    /* Currently Linux does AIO only for files opened with O_DIRECT */

Also this is a not related fix, if you respin maybe we should split in a
new patch or say something in the commit message.

>      if (s->use_linux_aio) {
>          if (!(s->open_flags & O_DIRECT)) {
>              error_setg(errp, "aio=native was specified, but it requires "
> @@ -600,6 +607,22 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
>      }
>  #endif /* !defined(CONFIG_LINUX_AIO) */
>  
> +#ifdef CONFIG_LINUX_IO_URING
> +    if (s->use_linux_io_uring) {
> +        if (!aio_setup_linux_io_uring(bdrv_get_aio_context(bs), errp)) {
> +            error_prepend(errp, "Unable to use io_uring: ");
> +            goto fail;
> +        }
> +    }
> +#else
> +    if (s->use_linux_io_uring) {
> +        error_setg(errp, "aio=io_uring was specified, but is not supported "
> +                         "in this build.");
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +#endif /* !defined(CONFIG_LINUX_IO_URING) */
> +
>      s->has_discard = true;
>      s->has_write_zeroes = true;
>      if ((bs->open_flags & BDRV_O_NOCACHE) != 0) {
> @@ -1877,21 +1900,25 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
>          return -EIO;
>  
>      /*
> -     * Check if the underlying device requires requests to be aligned,
> -     * and if the request we are trying to submit is aligned or not.
> -     * If this is the case tell the low-level driver that it needs
> -     * to copy the buffer.
> +     * When using O_DIRECT, the request must be aligned to be able to use
> +     * either libaio or io_uring interface. If not fail back to regular thread
> +     * pool read/write code which emulates this for us if we
> +     * set QEMU_AIO_MISALIGNED.
>       */
> -    if (s->needs_alignment) {
> -        if (!bdrv_qiov_is_aligned(bs, qiov)) {
> -            type |= QEMU_AIO_MISALIGNED;
> +    if (s->needs_alignment && !bdrv_qiov_is_aligned(bs, qiov)) {
> +        type |= QEMU_AIO_MISALIGNED;
> +#ifdef CONFIG_LINUX_IO_URING
> +    } else if (s->use_linux_io_uring) {
> +        LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs));
> +        assert(qiov->size == bytes);
> +        return luring_co_submit(bs, aio, s->fd, offset, qiov, type);
> +#endif
>  #ifdef CONFIG_LINUX_AIO
> -        } else if (s->use_linux_aio) {
> -            LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
> -            assert(qiov->size == bytes);
> -            return laio_co_submit(bs, aio, s->fd, offset, qiov, type);
> +    } else if (s->use_linux_aio) {

This code block was executed if "s->needs_alignment" was true, now we don't
check it. Could this be a problem?

> +        LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
> +        assert(qiov->size == bytes);
> +        return laio_co_submit(bs, aio, s->fd, offset, qiov, type);
>  #endif
> -        }
>      }
>  
>      acb = (RawPosixAIOData) {
> @@ -1927,24 +1954,36 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset,
>  
>  static void raw_aio_plug(BlockDriverState *bs)
>  {
> +    BDRVRawState __attribute__((unused)) *s = bs->opaque;
>  #ifdef CONFIG_LINUX_AIO
> -    BDRVRawState *s = bs->opaque;
>      if (s->use_linux_aio) {
>          LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
>          laio_io_plug(bs, aio);
>      }
>  #endif
> +#ifdef CONFIG_LINUX_IO_URING
> +    if (s->use_linux_io_uring) {
> +        LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs));
> +        luring_io_plug(bs, aio);
> +    }
> +#endif
>  }
>  
>  static void raw_aio_unplug(BlockDriverState *bs)
>  {
> +    BDRVRawState __attribute__((unused)) *s = bs->opaque;
>  #ifdef CONFIG_LINUX_AIO
> -    BDRVRawState *s = bs->opaque;
>      if (s->use_linux_aio) {
>          LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
>          laio_io_unplug(bs, aio);
>      }
>  #endif
> +#ifdef CONFIG_LINUX_IO_URING
> +    if (s->use_linux_io_uring) {
> +        LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs));
> +        luring_io_unplug(bs, aio);
> +    }
> +#endif
>  }
>  
>  static int raw_co_flush_to_disk(BlockDriverState *bs)
> @@ -1964,14 +2003,20 @@ static int raw_co_flush_to_disk(BlockDriverState *bs)
>          .aio_type       = QEMU_AIO_FLUSH,
>      };
>  
> +#ifdef CONFIG_LINUX_IO_URING
> +    if (s->use_linux_io_uring) {
> +        LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs));
> +        return luring_co_submit(bs, aio, s->fd, 0, NULL, QEMU_AIO_FLUSH);
> +    }
> +#endif
>      return raw_thread_pool_submit(bs, handle_aiocb_flush, &acb);
>  }
>  
>  static void raw_aio_attach_aio_context(BlockDriverState *bs,
>                                         AioContext *new_context)
>  {
> +    BDRVRawState __attribute__((unused)) *s = bs->opaque;
>  #ifdef CONFIG_LINUX_AIO
> -    BDRVRawState *s = bs->opaque;
>      if (s->use_linux_aio) {
>          Error *local_err = NULL;
>          if (!aio_setup_linux_aio(new_context, &local_err)) {
> @@ -1981,6 +2026,16 @@ static void raw_aio_attach_aio_context(BlockDriverState *bs,
>          }
>      }
>  #endif
> +#ifdef CONFIG_LINUX_IO_URING
> +    if (s->use_linux_io_uring) {
> +        Error *local_err;
> +        if (!aio_setup_linux_io_uring(new_context, &local_err)) {
> +            error_reportf_err(local_err, "Unable to use linux io_uring, "
> +                                         "falling back to thread pool: ");
> +            s->use_linux_io_uring = false;
> +        }
> +    }
> +#endif
>  }
>  
>  static void raw_close(BlockDriverState *bs)
> -- 
> 2.23.0
> 
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine
  2019-12-18 16:32 [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
                   ` (15 preceding siblings ...)
  2020-01-10  9:55 ` [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
@ 2020-01-13 11:58 ` Stefano Garzarella
  16 siblings, 0 replies; 22+ messages in thread
From: Stefano Garzarella @ 2020-01-13 11:58 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, qemu-block, oleksandr, Julia Suvorova,
	qemu-devel, Markus Armbruster, Paolo Bonzini, Max Reitz,
	Aarushi Mehta

The series LGTM, just some comments on patches 4 and 8.

I succefully tried iotests on raw and qcow2 with io_uring, so

Acked-by: Stefano Garzarella <sgarzare@redhat.com>

Thanks,
Stefano

On Wed, Dec 18, 2019 at 04:32:13PM +0000, Stefan Hajnoczi wrote:
> v12:
>  * Reword BlockdevAioOptions QAPI schema commit description [Markus]
>  * Increase QAPI "Since: 4.2" to "Since: 5.0"
>  * Explain rationale for io_uring stubs in commit description [Kevin]
>  * Tried to use file.aio=io_uring instead of BDRV_O_IO_URING but it's really
>    hard to make qemu-iotests work.  Tests build blkdebug: and other graphs so
>    the syntax for io_uring is dependent on the test case.  I scrapped this
>    approach and went back to a global flag.
> 
> v11:
>  * Drop fd registration because it breaks QEMU's file locking and will need to
>    be resolved in a separate patch series
>  * Drop line-wrapping changes that accidentally broke several qemu-iotests
> 
> v10:
>  * Dropped kernel submission queue polling, it requires root and has additional
>    limitations.  It should be benchmarked and considered for inclusion later,
>    maybe even together with kernel side changes.
>  * Add io_uring_register_files() return value to trace_luring_fd_register()
>  * Fix indentation in luring_fd_unregister()
>  * Set s->fd_reg.fd_array to NULL after g_free() to avoid dangling pointers
>  * Simplify fd registration code
>  * Add luring_fd_unregister() and call it from file-posix.c to prevent
>    fd leaks
>  * Add trace_luring_fd_unregister() trace event
>  * Add missing space to qemu-img command-line documentation
>  * Update MAINTAINERS file [Julia]
>  * Rename MAX_EVENTS to MAX_ENTRIES [Julia]
>  * Define ioq_submit() before callers so the prototype isn't necessary [Julia]
>  * Declare variables at the beginning of the block in luring_init() [Julia]
> 
> This patch series is based on Aarushi Mehta's v9 patch series written for
> Google Summer of Code 2019:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2019-08/msg00179.html
> 
> It adds a new AIO engine that uses the new Linux io_uring API.  This is the
> successor to Linux AIO with a number of improvements:
> 1. Both O_DIRECT and buffered I/O work
> 2. fdatasync(2) is supported (no need for a separate thread pool!)
> 3. True async behavior so the syscall doesn't block (Linux AIO got there to some degree...)
> 4. Advanced performance optimizations are available (file registration, memory
>    buffer registration, completion polling, submission polling).
> 
> Since Aarushi has been busy, I have taken up this patch series.  Booting a
> guest works with -drive aio=io_uring and -drive aio=io_uring,cache=none with a
> raw file on XFS.
> 
> I currently recommend using -drive aio=io_uring only with host block devices
> (like NVMe devices).  As of Linux v5.4-rc1 I still hit kernel bugs when using
> image files on ext4 or XFS.
> 
> Aarushi Mehta (15):
>   configure: permit use of io_uring
>   qapi/block-core: add option for io_uring
>   block/block: add BDRV flag for io_uring
>   block/io_uring: implements interfaces for io_uring
>   stubs: add stubs for io_uring interface
>   util/async: add aio interfaces for io_uring
>   blockdev: adds bdrv_parse_aio to use io_uring
>   block/file-posix.c: extend to use io_uring
>   block: add trace events for io_uring
>   block/io_uring: adds userspace completion polling
>   qemu-io: adds option to use aio engine
>   qemu-img: adds option to use aio engine for benchmarking
>   qemu-nbd: adds option for aio engines
>   tests/qemu-iotests: enable testing with aio options
>   tests/qemu-iotests: use AIOMODE with various tests
> 
>  MAINTAINERS                   |   9 +
>  block.c                       |  22 ++
>  block/Makefile.objs           |   3 +
>  block/file-posix.c            |  95 ++++++--
>  block/io_uring.c              | 433 ++++++++++++++++++++++++++++++++++
>  block/trace-events            |  12 +
>  blockdev.c                    |  12 +-
>  configure                     |  27 +++
>  include/block/aio.h           |  16 +-
>  include/block/block.h         |   2 +
>  include/block/raw-aio.h       |  12 +
>  qapi/block-core.json          |   4 +-
>  qemu-img-cmds.hx              |   4 +-
>  qemu-img.c                    |  11 +-
>  qemu-img.texi                 |   5 +-
>  qemu-io.c                     |  25 +-
>  qemu-nbd.c                    |  12 +-
>  qemu-nbd.texi                 |   4 +-
>  stubs/Makefile.objs           |   1 +
>  stubs/io_uring.c              |  32 +++
>  tests/qemu-iotests/028        |   2 +-
>  tests/qemu-iotests/058        |   2 +-
>  tests/qemu-iotests/089        |   4 +-
>  tests/qemu-iotests/091        |   4 +-
>  tests/qemu-iotests/109        |   2 +-
>  tests/qemu-iotests/147        |   5 +-
>  tests/qemu-iotests/181        |   8 +-
>  tests/qemu-iotests/183        |   4 +-
>  tests/qemu-iotests/185        |  10 +-
>  tests/qemu-iotests/200        |   2 +-
>  tests/qemu-iotests/201        |   8 +-
>  tests/qemu-iotests/check      |  15 +-
>  tests/qemu-iotests/common.rc  |  14 ++
>  tests/qemu-iotests/iotests.py |  12 +-
>  util/async.c                  |  36 +++
>  35 files changed, 793 insertions(+), 76 deletions(-)
>  create mode 100644 block/io_uring.c
>  create mode 100644 stubs/io_uring.c
> 
> -- 
> 2.23.0
> 
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 08/15] block/file-posix.c: extend to use io_uring
  2020-01-13 11:49   ` Stefano Garzarella
@ 2020-01-14 10:37     ` Stefan Hajnoczi
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2020-01-14 10:37 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Fam Zheng, Kevin Wolf, Maxim Levitsky, qemu-block, oleksandr,
	Julia Suvorova, qemu-devel, Markus Armbruster, Stefan Hajnoczi,
	Paolo Bonzini, Max Reitz, Aarushi Mehta

[-- Attachment #1: Type: text/plain, Size: 3364 bytes --]

On Mon, Jan 13, 2020 at 12:49:27PM +0100, Stefano Garzarella wrote:
> On Wed, Dec 18, 2019 at 04:32:21PM +0000, Stefan Hajnoczi wrote:
> > @@ -503,9 +504,11 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
> >          goto fail;
> >      }
> >  
> > -    aio_default = (bdrv_flags & BDRV_O_NATIVE_AIO)
> > -                  ? BLOCKDEV_AIO_OPTIONS_NATIVE
> > -                  : BLOCKDEV_AIO_OPTIONS_THREADS;
> > +    if (bdrv_flags & BDRV_O_NATIVE_AIO) {
> > +        aio_default = BLOCKDEV_AIO_OPTIONS_NATIVE;
> > +    } else {
> > +        aio_default = BLOCKDEV_AIO_OPTIONS_THREADS;
> > +    }
> 
> This is only a cosmetic change?
...
> > @@ -578,7 +585,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
> >      s->shared_perm = BLK_PERM_ALL;
> >  
> >  #ifdef CONFIG_LINUX_AIO
> > -     /* Currently Linux does AIO only for files opened with O_DIRECT */
> > +    /* Currently Linux does AIO only for files opened with O_DIRECT */
> 
> Also this is a not related fix, if you respin maybe we should split in a
> new patch or say something in the commit message.

Thanks, I'll remove whitespace changes and unrelated reformatting from
this patch.

> > @@ -1877,21 +1900,25 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
> >          return -EIO;
> >  
> >      /*
> > -     * Check if the underlying device requires requests to be aligned,
> > -     * and if the request we are trying to submit is aligned or not.
> > -     * If this is the case tell the low-level driver that it needs
> > -     * to copy the buffer.
> > +     * When using O_DIRECT, the request must be aligned to be able to use
> > +     * either libaio or io_uring interface. If not fail back to regular thread
> > +     * pool read/write code which emulates this for us if we
> > +     * set QEMU_AIO_MISALIGNED.
> >       */
> > -    if (s->needs_alignment) {
> > -        if (!bdrv_qiov_is_aligned(bs, qiov)) {
> > -            type |= QEMU_AIO_MISALIGNED;
> > +    if (s->needs_alignment && !bdrv_qiov_is_aligned(bs, qiov)) {
> > +        type |= QEMU_AIO_MISALIGNED;
> > +#ifdef CONFIG_LINUX_IO_URING
> > +    } else if (s->use_linux_io_uring) {
> > +        LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs));
> > +        assert(qiov->size == bytes);
> > +        return luring_co_submit(bs, aio, s->fd, offset, qiov, type);
> > +#endif
> >  #ifdef CONFIG_LINUX_AIO
> > -        } else if (s->use_linux_aio) {
> > -            LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
> > -            assert(qiov->size == bytes);
> > -            return laio_co_submit(bs, aio, s->fd, offset, qiov, type);
> > +    } else if (s->use_linux_aio) {
> 
> This code block was executed if "s->needs_alignment" was true, now we don't
> check it. Could this be a problem?

From raw_open_common():

  /* Currently Linux does AIO only for files opened with O_DIRECT */
  if (s->use_linux_aio) {
      if (!(s->open_flags & O_DIRECT)) {
          error_setg(errp, "aio=native was specified, but it requires "
                           "cache.direct=on, which was not specified.");
          ret = -EINVAL;
          goto fail;

There is no change in behavior since use_linux_aio is only true when
needs_alignment is set.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 04/15] block/io_uring: implements interfaces for io_uring
  2020-01-13 11:24   ` Stefano Garzarella
@ 2020-01-14 10:40     ` Stefan Hajnoczi
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2020-01-14 10:40 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Fam Zheng, Kevin Wolf, qemu-block, oleksandr, Julia Suvorova,
	qemu-devel, Markus Armbruster, Stefan Hajnoczi, Paolo Bonzini,
	Max Reitz, Aarushi Mehta

[-- Attachment #1: Type: text/plain, Size: 1450 bytes --]

On Mon, Jan 13, 2020 at 12:24:07PM +0100, Stefano Garzarella wrote:
> On Wed, Dec 18, 2019 at 04:32:17PM +0000, Stefan Hajnoczi wrote:
> > From: Aarushi Mehta <mehta.aaru20@gmail.com>
> > 
> > Aborts when sqe fails to be set as sqes cannot be returned to the
> > ring. Adds slow path for short reads for older kernels
> > 
> > Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > v11:
> >  * Fix git bisect compilation breakage: move trace_luring_init_state()
> >    into the tracing commit.
> > v10:
> >  * Update MAINTAINERS file [Julia]
> >  * Rename MAX_EVENTS to MAX_ENTRIES [Julia]
> >  * Define ioq_submit() before callers so the prototype isn't necessary [Julia]
> >  * Declare variables at the beginning of the block in luring_init() [Julia]
> > ---
> >  MAINTAINERS             |   8 +
> >  block/Makefile.objs     |   3 +
> >  block/io_uring.c        | 401 ++++++++++++++++++++++++++++++++++++++++
> >  include/block/aio.h     |  16 +-
> >  include/block/raw-aio.h |  12 ++
> >  5 files changed, 439 insertions(+), 1 deletion(-)
> >  create mode 100644 block/io_uring.c
> 
> There are few double spaces in several comment blocks, so if you need to
> respin maybe you can clean these, otherwise we can do later.

Double spaces are used throughout include/block/aio.h and generally in
QEMU.  Both a single space and double space are fine.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-01-14 10:42 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-18 16:32 [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
2019-12-18 16:32 ` [PATCH v3 01/15] configure: permit use of io_uring Stefan Hajnoczi
2019-12-18 16:32 ` [PATCH v3 02/15] qapi/block-core: add option for io_uring Stefan Hajnoczi
2019-12-18 16:32 ` [PATCH v3 03/15] block/block: add BDRV flag " Stefan Hajnoczi
2019-12-18 16:32 ` [PATCH v3 04/15] block/io_uring: implements interfaces " Stefan Hajnoczi
2020-01-13 11:24   ` Stefano Garzarella
2020-01-14 10:40     ` Stefan Hajnoczi
2019-12-18 16:32 ` [PATCH v3 05/15] stubs: add stubs for io_uring interface Stefan Hajnoczi
2019-12-18 16:32 ` [PATCH v3 06/15] util/async: add aio interfaces for io_uring Stefan Hajnoczi
2019-12-18 16:32 ` [PATCH v3 07/15] blockdev: adds bdrv_parse_aio to use io_uring Stefan Hajnoczi
2019-12-18 16:32 ` [PATCH v3 08/15] block/file-posix.c: extend " Stefan Hajnoczi
2020-01-13 11:49   ` Stefano Garzarella
2020-01-14 10:37     ` Stefan Hajnoczi
2019-12-18 16:32 ` [PATCH v3 09/15] block: add trace events for io_uring Stefan Hajnoczi
2019-12-18 16:32 ` [PATCH v3 10/15] block/io_uring: adds userspace completion polling Stefan Hajnoczi
2019-12-18 16:32 ` [PATCH v3 11/15] qemu-io: adds option to use aio engine Stefan Hajnoczi
2019-12-18 16:32 ` [PATCH v3 12/15] qemu-img: adds option to use aio engine for benchmarking Stefan Hajnoczi
2019-12-18 16:32 ` [PATCH v3 13/15] qemu-nbd: adds option for aio engines Stefan Hajnoczi
2019-12-18 16:32 ` [PATCH v3 14/15] tests/qemu-iotests: enable testing with aio options Stefan Hajnoczi
2019-12-18 16:32 ` [PATCH v3 15/15] tests/qemu-iotests: use AIOMODE with various tests Stefan Hajnoczi
2020-01-10  9:55 ` [PATCH v3 00/15] io_uring: add Linux io_uring AIO engine Stefan Hajnoczi
2020-01-13 11:58 ` Stefano Garzarella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).