All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH 0/9] Add support for io_uring
@ 2019-05-21 23:52 Aarushi Mehta
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 1/9] qapi/block-core: add option " Aarushi Mehta
                   ` (10 more replies)
  0 siblings, 11 replies; 22+ messages in thread
From: Aarushi Mehta @ 2019-05-21 23:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: saket.sinha89, Julia Suvorova, qemu-block, Stefan Hajnoczi,
	Aarushi Mehta

This patch series adds supports for the newly developed io_uring Linux AIO interface. Testing it requires a host kernel with it and the liburing library. Use the option -drive aio=io_uring to enable it.

Aarushi Mehta (9):
  qapi/block-core: add option for io_uring
  block/block: add BDRV flag for io_uring
  include/block: declare interfaces for io_uring
  stubs: add aio interface stubs for io_uring
  util/asyn: add aio interfaces for io_uring
  block/io_uring: implements interfaces for io_uring
  blockdev: accept io_uring as option
  block/file-posix: extends to use with io_uring
  configure: permits use of io_uring with probe

 block/Makefile.objs     |   2 +
 block/file-posix.c      |  63 ++++++-
 block/io_uring.c        | 385 ++++++++++++++++++++++++++++++++++++++++
 blockdev.c              |   4 +-
 configure               |  27 +++
 include/block/aio.h     |  16 +-
 include/block/block.h   |   1 +
 include/block/raw-aio.h |  15 ++
 qapi/block-core.json    |   3 +-
 stubs/Makefile.objs     |   1 +
 stubs/io_uring.c        |  32 ++++
 util/async.c            |  32 ++++
 12 files changed, 573 insertions(+), 8 deletions(-)
 create mode 100644 block/io_uring.c
 create mode 100644 stubs/io_uring.c

--
2.17.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC PATCH 1/9] qapi/block-core: add option for io_uring
  2019-05-21 23:52 [Qemu-devel] [RFC PATCH 0/9] Add support for io_uring Aarushi Mehta
@ 2019-05-21 23:52 ` Aarushi Mehta
  2019-05-22  0:39   ` Eric Blake
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 2/9] block/block: add BDRV flag " Aarushi Mehta
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Aarushi Mehta @ 2019-05-21 23:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: saket.sinha89, Julia Suvorova, qemu-block, Stefan Hajnoczi,
	Aarushi Mehta

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
---
 qapi/block-core.json | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 7ccbfff9d0..116995810a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2776,11 +2776,12 @@
 #
 # @threads:     Use qemu's thread pool
 # @native:      Use native AIO backend (only Linux and Windows)
+# @io_uring:    Use linux io_uring
 #
 # Since: 2.9
 ##
 { 'enum': 'BlockdevAioOptions',
-  'data': [ 'threads', 'native' ] }
+  'data': [ 'threads', 'native','io_uring' ] }

 ##
 # @BlockdevCacheOptions:
--
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC PATCH 2/9] block/block: add BDRV flag for io_uring
  2019-05-21 23:52 [Qemu-devel] [RFC PATCH 0/9] Add support for io_uring Aarushi Mehta
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 1/9] qapi/block-core: add option " Aarushi Mehta
@ 2019-05-21 23:52 ` Aarushi Mehta
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 3/9] include/block: declare interfaces " Aarushi Mehta
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Aarushi Mehta @ 2019-05-21 23:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: saket.sinha89, Julia Suvorova, qemu-block, Stefan Hajnoczi,
	Aarushi Mehta

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
---
 include/block/block.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/block/block.h b/include/block/block.h
index 5e2b98b0ee..005f42b133 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -121,6 +121,7 @@ typedef struct HDGeometry {
                                       ignoring the format layer */
 #define BDRV_O_NO_IO       0x10000 /* don't initialize for I/O */
 #define BDRV_O_AUTO_RDONLY 0x20000 /* degrade to read-only if opening read-write fails */
+#define BDRV_O_IO_URING    0x40000 /* use io_uring instead of the thread pool */

 #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_NO_FLUSH)

--
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC PATCH 3/9] include/block: declare interfaces for io_uring
  2019-05-21 23:52 [Qemu-devel] [RFC PATCH 0/9] Add support for io_uring Aarushi Mehta
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 1/9] qapi/block-core: add option " Aarushi Mehta
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 2/9] block/block: add BDRV flag " Aarushi Mehta
@ 2019-05-21 23:52 ` Aarushi Mehta
  2019-05-22 13:09   ` Stefan Hajnoczi
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 4/9] stubs: add aio interface stubs " Aarushi Mehta
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Aarushi Mehta @ 2019-05-21 23:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: saket.sinha89, Julia Suvorova, qemu-block, Stefan Hajnoczi,
	Aarushi Mehta

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
---
 include/block/aio.h     | 16 +++++++++++++++-
 include/block/raw-aio.h | 15 +++++++++++++++
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index 0ca25dfec6..2f9acbcd7d 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -50,6 +50,7 @@ typedef void IOHandler(void *opaque);
 struct Coroutine;
 struct ThreadPool;
 struct LinuxAioState;
+struct LuringState;

 struct AioContext {
     GSource source;
@@ -118,11 +119,19 @@ struct AioContext {
     struct ThreadPool *thread_pool;

 #ifdef CONFIG_LINUX_AIO
-    /* State for native Linux AIO.  Uses aio_context_acquire/release for
+    /*
+     * State for native Linux AIO.  Uses aio_context_acquire/release for
      * locking.
      */
     struct LinuxAioState *linux_aio;
 #endif
+#ifdef CONFIG_LINUX_IO_URING
+    /*
+     * State for native Linux AIO.  Uses aio_context_acquire/release for
+     * locking.
+     */
+    struct LuringState *linux_io_uring;
+#endif

     /* TimerLists for calling timers - one per clock type.  Has its own
      * locking.
@@ -387,6 +396,11 @@ struct LinuxAioState *aio_setup_linux_aio(AioContext *ctx, Error **errp);
 /* Return the LinuxAioState bound to this AioContext */
 struct LinuxAioState *aio_get_linux_aio(AioContext *ctx);

+/* Setup the LuringState bound to this AioContext */
+struct LuringState *aio_setup_linux_io_uring(AioContext *ctx, Error **errp);
+
+/* Return the LuringState bound to this AioContext */
+struct LuringState *aio_get_linux_io_uring(AioContext *ctx);
 /**
  * aio_timer_new_with_attrs:
  * @ctx: the aio context
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index ba223dd1f1..2e413bbee0 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -58,6 +58,21 @@ void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context);
 void laio_io_plug(BlockDriverState *bs, LinuxAioState *s);
 void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s);
 #endif
+/* io_uring.c - Linux io_uring implementation */
+#ifdef CONFIG_LINUX_IO_URING
+typedef struct LuringState LuringState;
+LuringState *luring_init(Error **errp);
+void luring_cleanup(LuringState *s);
+int coroutine_fn luring_co_submit(BlockDriverState *bs, LuringState *s, int fd,
+                                uint64_t offset, QEMUIOVector *qiov, int type);
+BlockAIOCB *luring_submit(BlockDriverState *bs, LuringState *s, int fd,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockCompletionFunc *cb, void *opaque, int type);
+void luring_detach_aio_context(LuringState *s, AioContext *old_context);
+void luring_attach_aio_context(LuringState *s, AioContext *new_context);
+void luring_io_plug(BlockDriverState *bs, LuringState *s);
+void luring_io_unplug(BlockDriverState *bs, LuringState *s);
+#endif

 #ifdef _WIN32
 typedef struct QEMUWin32AIOState QEMUWin32AIOState;
--
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC PATCH 4/9] stubs: add aio interface stubs for io_uring
  2019-05-21 23:52 [Qemu-devel] [RFC PATCH 0/9] Add support for io_uring Aarushi Mehta
                   ` (2 preceding siblings ...)
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 3/9] include/block: declare interfaces " Aarushi Mehta
@ 2019-05-21 23:52 ` Aarushi Mehta
  2019-05-22 13:10   ` Stefan Hajnoczi
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 5/9] util/asyn: add aio interfaces " Aarushi Mehta
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Aarushi Mehta @ 2019-05-21 23:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: saket.sinha89, Julia Suvorova, qemu-block, Stefan Hajnoczi,
	Aarushi Mehta

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
---
 stubs/Makefile.objs |  1 +
 stubs/io_uring.c    | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)
 create mode 100644 stubs/io_uring.c

diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 269dfa5832..86ceca5c4e 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -13,6 +13,7 @@ stub-obj-y += iothread.o
 stub-obj-y += iothread-lock.o
 stub-obj-y += is-daemonized.o
 stub-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
+stub-obj-$(CONFIG_LINUX_IO_URING) += io_uring.o
 stub-obj-y += machine-init-done.o
 stub-obj-y += migr-blocker.o
 stub-obj-y += change-state-handler.o
diff --git a/stubs/io_uring.c b/stubs/io_uring.c
new file mode 100644
index 0000000000..622d1e4648
--- /dev/null
+++ b/stubs/io_uring.c
@@ -0,0 +1,32 @@
+/*
+ * Linux io_uring support.
+ *
+ * Copyright (C) 2009 IBM, Corp.
+ * Copyright (C) 2009 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "block/aio.h"
+#include "block/raw-aio.h"
+
+void luring_detach_aio_context(LuringState *s, AioContext *old_context)
+{
+    abort();
+}
+
+void luring_attach_aio_context(LuringState *s, AioContext *new_context)
+{
+    abort();
+}
+
+LuringState *luring_init(Error **errp)
+{
+    abort();
+}
+
+void luring_cleanup(LuringState *s)
+{
+    abort();
+}
--
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC PATCH 5/9] util/asyn: add aio interfaces for io_uring
  2019-05-21 23:52 [Qemu-devel] [RFC PATCH 0/9] Add support for io_uring Aarushi Mehta
                   ` (3 preceding siblings ...)
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 4/9] stubs: add aio interface stubs " Aarushi Mehta
@ 2019-05-21 23:52 ` Aarushi Mehta
  2019-05-22 13:16   ` Stefan Hajnoczi
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 6/9] block/io_uring: implements " Aarushi Mehta
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Aarushi Mehta @ 2019-05-21 23:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: saket.sinha89, Julia Suvorova, qemu-block, Stefan Hajnoczi,
	Aarushi Mehta

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
---
 util/async.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/util/async.c b/util/async.c
index c10642a385..dd45aff8f5 100644
--- a/util/async.c
+++ b/util/async.c
@@ -277,6 +277,14 @@ aio_ctx_finalize(GSource     *source)
     }
 #endif

+#ifdef CONFIG_LINUX_IO_URING
+    if (ctx->linux_io_uring) {
+        luring_detach_aio_context(ctx->linux_io_uring, ctx);
+        luring_cleanup(ctx->linux_io_uring);
+        ctx->linux_io_uring = NULL;
+    }
+#endif
+
     assert(QSLIST_EMPTY(&ctx->scheduled_coroutines));
     qemu_bh_delete(ctx->co_schedule_bh);

@@ -341,6 +349,25 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
 }
 #endif

+#ifdef CONFIG_LINUX_IO_URING
+LuringState *aio_setup_linux_io_uring(AioContext *ctx, Error **errp)
+{
+    if (!ctx->linux_io_uring) {
+        ctx->linux_io_uring = luring_init(errp);
+        if (ctx->linux_io_uring) {
+            luring_attach_aio_context(ctx->linux_io_uring, ctx);
+        }
+    }
+    return ctx->linux_io_uring;
+}
+
+LuringState *aio_get_linux_io_uring(AioContext *ctx)
+{
+    assert(ctx->linux_io_uring);
+    return ctx->linux_io_uring;
+}
+#endif
+
 void aio_notify(AioContext *ctx)
 {
     /* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
@@ -432,6 +459,11 @@ AioContext *aio_context_new(Error **errp)
 #ifdef CONFIG_LINUX_AIO
     ctx->linux_aio = NULL;
 #endif
+
+#ifdef CONFIG_LINUX_IO_URING
+    ctx->linux_io_uring = NULL;
+#endif
+
     ctx->thread_pool = NULL;
     qemu_rec_mutex_init(&ctx->lock);
     timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
--
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC PATCH 6/9] block/io_uring: implements interfaces for io_uring
  2019-05-21 23:52 [Qemu-devel] [RFC PATCH 0/9] Add support for io_uring Aarushi Mehta
                   ` (4 preceding siblings ...)
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 5/9] util/asyn: add aio interfaces " Aarushi Mehta
@ 2019-05-21 23:52 ` Aarushi Mehta
  2019-05-22 15:43   ` Stefan Hajnoczi
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 7/9] blockdev: accept io_uring as option Aarushi Mehta
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Aarushi Mehta @ 2019-05-21 23:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: saket.sinha89, Julia Suvorova, qemu-block, Stefan Hajnoczi,
	Aarushi Mehta

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
---
 block/Makefile.objs |   2 +
 block/io_uring.c    | 385 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 387 insertions(+)
 create mode 100644 block/io_uring.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 7a81892a52..262d413c6d 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -18,6 +18,7 @@ block-obj-y += block-backend.o snapshot.o qapi.o
 block-obj-$(CONFIG_WIN32) += file-win32.o win32-aio.o
 block-obj-$(CONFIG_POSIX) += file-posix.o
 block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
+block-obj-$(CONFIG_LINUX_IO_URING) += io_uring.o
 block-obj-y += null.o mirror.o commit.o io.o create.o
 block-obj-y += throttle-groups.o
 block-obj-$(CONFIG_LINUX) += nvme.o
@@ -61,5 +62,6 @@ block-obj-$(if $(CONFIG_LZFSE),m,n) += dmg-lzfse.o
 dmg-lzfse.o-libs   := $(LZFSE_LIBS)
 qcow.o-libs        := -lz
 linux-aio.o-libs   := -laio
+io_uring.o-libs    := -luring
 parallels.o-cflags := $(LIBXML2_CFLAGS)
 parallels.o-libs   := $(LIBXML2_LIBS)
diff --git a/block/io_uring.c b/block/io_uring.c
new file mode 100644
index 0000000000..f8b0df90d4
--- /dev/null
+++ b/block/io_uring.c
@@ -0,0 +1,385 @@
+/*
+ * Linux io_uring support.
+ *
+ * Copyright (C) 2009 IBM, Corp.
+ * Copyright (C) 2009 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "block/aio.h"
+#include "qemu/queue.h"
+#include "block/block.h"
+#include "block/raw-aio.h"
+#include "qemu/event_notifier.h"
+#include "qemu/coroutine.h"
+#include "qapi/error.h"
+
+#include <liburing.h>
+
+/*
+ * Queue size (per-device).
+ *
+ * XXX: eventually we need to communicate this to the guest and/or make it
+ *      tunable by the guest.  If we get more outstanding requests at a time
+ *      than this we will get EAGAIN from io_submit which is communicated to
+ *      the guest as an I/O error.
+ */
+#define MAX_EVENTS 128
+
+struct qemu_luringcb {
+    BlockAIOCB common;
+    Coroutine *co;
+    LuringState *ctx;
+    struct io_uring_sqe sqeq;
+    int ret;
+    size_t nbytes;
+    QEMUIOVector *qiov;
+    bool is_read;
+    QSIMPLEQ_ENTRY(qemu_luringcb) next;
+};
+
+typedef struct {
+    int plugged;
+    unsigned int in_queue;
+    unsigned int in_flight;
+    bool blocked;
+    QSIMPLEQ_HEAD(, qemu_luringcb) pending;
+} LuringQueue;
+
+struct LuringState {
+    AioContext *aio_context;
+
+    struct io_uring ring;
+    EventNotifier e;
+
+    /* io queue for submit at batch.  Protected by AioContext lock. */
+    LuringQueue io_q;
+
+    /* I/O completion processing.  Only runs in I/O thread.  */
+    QEMUBH *completion_bh;
+    int event_idx;
+    int event_max;
+};
+
+static void ioq_submit(LuringState *s);
+
+static inline int32_t io_cqe_ret(struct io_uring_cqe *cqe)
+{
+    return cqe->res;
+}
+
+/**
+ * io_getevents_peek:
+ * @ring: io_uring instance
+ * @cqes: Completion event array
+
+ * Returns the number of completed events and sets a pointer
+ * on events queue.  This function does not update the head and tail.
+ */
+static inline unsigned int io_getevents_peek(struct io_uring *ring,
+                                             struct io_uring_cqe **cqes)
+{
+    unsigned int nr;
+    read_barrier();
+    struct io_uring_cq *cq_ring = &ring->cq;
+    unsigned int head = *cq_ring->khead & *cq_ring->kring_mask;
+    unsigned int tail = *cq_ring->ktail & *cq_ring->kring_mask;
+    nr = tail - head;
+    *cqes = ring->cq.cqes;
+    return nr;
+}
+
+/**
+ * qemu_luring_process_completions:
+ * @s: AIO state
+ *
+ * Fetches completed I/O requests, consumes cqes and invokes their callbacks.
+ *
+ */
+static void qemu_luring_process_completions(LuringState *s)
+{
+    struct io_uring_cqe *cqes;
+    qemu_bh_schedule(s->completion_bh);
+
+    while ((s->event_max = io_getevents_peek(&s->ring, &cqes))) {
+        for (s->event_idx = 0; s->event_idx < s->event_max; ) {
+                io_uring_cqe_seen(&s->ring, cqes);
+
+            /* Change counters one-by-one because we can be nested. */
+            s->io_q.in_flight--;
+            s->event_idx++;
+        }
+    }
+
+    qemu_bh_cancel(s->completion_bh);
+
+    /*
+     *If we are nested we have to notify the level above that we are done
+     * by setting event_max to zero, upper level will then jump out of it's
+     * own `for` loop.  If we are the last all counters dropped to zero.
+     */
+    s->event_max = 0;
+    s->event_idx = 0;
+}
+
+static void qemu_luring_process_completions_and_submit(LuringState *s)
+{
+    aio_context_acquire(s->aio_context);
+    qemu_luring_process_completions(s);
+
+    if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
+        ioq_submit(s);
+    }
+    aio_context_release(s->aio_context);
+}
+
+static void qemu_luring_completion_bh(void *opaque)
+{
+    LuringState *s = opaque;
+
+    qemu_luring_process_completions_and_submit(s);
+}
+
+static void qemu_luring_completion_cb(EventNotifier *e)
+{
+    LuringState *s = container_of(e, LuringState, e);
+
+    if (event_notifier_test_and_clear(&s->e)) {
+        qemu_luring_process_completions_and_submit(s);
+    }
+}
+
+static bool qemu_luring_poll_cb(void *opaque)
+{
+    EventNotifier *e = opaque;
+    LuringState *s = container_of(e, LuringState, e);
+    struct io_uring_cqe *cqes;
+
+    if (!io_getevents_peek(&s->ring, &cqes)) {
+        return false;
+    }
+
+    qemu_luring_process_completions_and_submit(s);
+    return true;
+}
+
+static const AIOCBInfo luring_aiocb_info = {
+    .aiocb_size         = sizeof(struct qemu_luringcb),
+};
+
+
+static void ioq_init(LuringQueue *io_q)
+{
+    QSIMPLEQ_INIT(&io_q->pending);
+    io_q->plugged = 0;
+    io_q->in_queue = 0;
+    io_q->in_flight = 0;
+    io_q->blocked = false;
+}
+
+static void ioq_submit(LuringState *s)
+{
+    int ret, len;
+    struct qemu_luringcb *aiocb;
+    QSIMPLEQ_HEAD(, qemu_luringcb) completed;
+
+    do {
+        if (s->io_q.in_flight >= MAX_EVENTS) {
+            break;
+        }
+        len = 0;
+        QSIMPLEQ_FOREACH(aiocb, &s->io_q.pending, next) {
+            if (s->io_q.in_flight + len++ >= MAX_EVENTS) {
+                break;
+            }
+            struct io_uring_sqe *sqes = io_uring_get_sqe(&s->ring);
+            if (sqes)  { /* Prep sqe for subission */
+                memset(sqes, 0, sizeof(*sqes));
+                sqes->opcode = aiocb->sqeq.opcode;
+                sqes->fd = aiocb->sqeq.fd;
+                if (sqes->opcode == IORING_OP_FSYNC) {
+                    sqes->fsync_flags = aiocb->sqeq.fsync_flags;
+                }
+                else {
+                    sqes->off = aiocb->sqeq.off;
+                    sqes->addr = aiocb->sqeq.addr;
+                    sqes->len = aiocb->sqeq.len;
+                }
+                QSIMPLEQ_REMOVE_HEAD(&s->io_q.pending, next);
+            }
+        }
+
+        ret =  io_uring_submit(&s->ring);
+        if (ret == -EAGAIN) {
+            break;
+        }
+
+        s->io_q.in_flight += ret;
+        s->io_q.in_queue  -= ret;
+        QSIMPLEQ_SPLIT_AFTER(&s->io_q.pending, aiocb, next, &completed);
+    } while (!QSIMPLEQ_EMPTY(&s->io_q.pending));
+    s->io_q.blocked = (s->io_q.in_queue > 0);
+
+    if (s->io_q.in_flight) {
+        /*
+         * We can try to complete something just right away if there are
+         * still requests in-flight.
+         */
+        qemu_luring_process_completions(s);
+    }
+}
+
+void luring_io_plug(BlockDriverState *bs, LuringState *s)
+{
+    s->io_q.plugged++;
+}
+
+void luring_io_unplug(BlockDriverState *bs, LuringState *s)
+{
+    assert(s->io_q.plugged);
+    if (--s->io_q.plugged == 0 &&
+        !s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
+        ioq_submit(s);
+    }
+}
+
+static int luring_do_submit(int fd, struct qemu_luringcb *luringcb,
+                            off_t offset, int type)
+{
+    LuringState *s = luringcb->ctx;
+    struct io_uring_sqe *sqes = io_uring_get_sqe(&s->ring);
+    if (!sqes) {
+        sqes = &luringcb->sqeq;
+        QSIMPLEQ_INSERT_TAIL(&s->io_q.pending, luringcb, next);
+    }
+    QEMUIOVector *qiov = luringcb->qiov;
+
+    switch (type) {
+    case QEMU_AIO_WRITE:
+        io_uring_prep_writev(sqes, fd, qiov->iov, qiov->niov, offset);
+        break;
+    case QEMU_AIO_READ:
+        io_uring_prep_readv(sqes, fd, qiov->iov, qiov->niov, offset);
+        break;
+    case QEMU_AIO_FLUSH:
+        io_uring_prep_fsync(sqes, fd, 0);
+        break;
+    default:
+        fprintf(stderr, "%s: invalid AIO request type 0x%x.\n",
+                        __func__, type);
+        return -EIO;
+    }
+
+    s->io_q.in_queue++;
+    if (!s->io_q.blocked &&
+        (!s->io_q.plugged ||
+         s->io_q.in_flight + s->io_q.in_queue >= MAX_EVENTS)) {
+        ioq_submit(s);
+    }
+
+    return 0;
+}
+
+int coroutine_fn luring_co_submit(BlockDriverState *bs, LuringState *s, int fd,
+                                uint64_t offset, QEMUIOVector *qiov, int type)
+{
+    int ret;
+    struct qemu_luringcb luringcb = {
+        .co         = qemu_coroutine_self(),
+        .nbytes     = qiov->size,
+        .ctx        = s,
+        .ret        = -EINPROGRESS,
+        .is_read    = (type == QEMU_AIO_READ),
+        .qiov       = qiov,
+    };
+
+    ret = luring_do_submit(fd, &luringcb, offset, type);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (luringcb.ret == -EINPROGRESS) {
+        qemu_coroutine_yield();
+    }
+    return luringcb.ret;
+}
+
+BlockAIOCB *luring_submit(BlockDriverState *bs, LuringState *s, int fd,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockCompletionFunc *cb, void *opaque, int type)
+{
+    struct qemu_luringcb *luringcb;
+    off_t offset = sector_num * BDRV_SECTOR_SIZE;
+    int ret;
+
+    luringcb = qemu_aio_get(&luring_aiocb_info, bs, cb, opaque);
+    luringcb->nbytes = nb_sectors * BDRV_SECTOR_SIZE;
+    luringcb->ctx = s;
+    luringcb->ret = -EINPROGRESS;
+    luringcb->is_read = (type == QEMU_AIO_READ);
+    luringcb->qiov = qiov;
+
+    ret = luring_do_submit(fd, luringcb, offset, type);
+    if (ret < 0) {
+        qemu_aio_unref(luringcb);
+        return NULL;
+    }
+
+    return &luringcb->common;
+}
+
+void luring_detach_aio_context(LuringState *s, AioContext *old_context)
+{
+    aio_set_event_notifier(old_context, &s->e, false, NULL, NULL);
+    qemu_bh_delete(s->completion_bh);
+    s->aio_context = NULL;
+}
+
+void luring_attach_aio_context(LuringState *s, AioContext *new_context)
+{
+    s->aio_context = new_context;
+    s->completion_bh = aio_bh_new(new_context, qemu_luring_completion_bh, s);
+    aio_set_event_notifier(new_context, &s->e, false,
+                           qemu_luring_completion_cb,
+                           qemu_luring_poll_cb);
+}
+
+LuringState *luring_init(Error **errp)
+{
+    int rc;
+    LuringState *s;
+    s = g_malloc0(sizeof(*s));
+    rc = event_notifier_init(&s->e, false);
+    if (rc < 0) {
+        error_setg_errno(errp, -rc, "failed to to initialize event notifier");
+        goto out_free_state;
+    }
+
+    struct io_uring *ring = &s->ring;
+    rc =  io_uring_queue_init(MAX_EVENTS, ring, 0);
+    if (rc < 0) {
+        error_setg_errno(errp, -rc, "failed to create linux io_uring queue");
+        goto out_close_efd;
+    }
+
+    ioq_init(&s->io_q);
+    aio_set_fd_handler(s->aio_context, ring->ring_fd, false,
+                       (IOHandler *)qemu_luring_completion_cb, NULL, NULL, &s);
+    return s;
+
+out_close_efd:
+    event_notifier_cleanup(&s->e);
+out_free_state:
+    g_free(s);
+    return NULL;
+}
+
+void luring_cleanup(LuringState *s)
+{
+    event_notifier_cleanup(&s->e);
+    io_uring_queue_exit(&s->ring);
+    g_free(s);
+}
--
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC PATCH 7/9] blockdev: accept io_uring as option
  2019-05-21 23:52 [Qemu-devel] [RFC PATCH 0/9] Add support for io_uring Aarushi Mehta
                   ` (5 preceding siblings ...)
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 6/9] block/io_uring: implements " Aarushi Mehta
@ 2019-05-21 23:52 ` Aarushi Mehta
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 8/9] block/file-posix: extends to use with io_uring Aarushi Mehta
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Aarushi Mehta @ 2019-05-21 23:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: saket.sinha89, Julia Suvorova, qemu-block, Stefan Hajnoczi,
	Aarushi Mehta

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
---
 blockdev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 79fbac8450..b44b9d660d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -386,6 +386,8 @@ static void extract_common_blockdev_options(QemuOpts *opts, int *bdrv_flags,
         if ((aio = qemu_opt_get(opts, "aio")) != NULL) {
             if (!strcmp(aio, "native")) {
                 *bdrv_flags |= BDRV_O_NATIVE_AIO;
+            } else if (!strcmp(aio, "io_uring")) {
+                *bdrv_flags |= BDRV_O_IO_URING;
             } else if (!strcmp(aio, "threads")) {
                 /* this is the default */
             } else {
@@ -4547,7 +4549,7 @@ QemuOptsList qemu_common_drive_opts = {
         },{
             .name = "aio",
             .type = QEMU_OPT_STRING,
-            .help = "host AIO implementation (threads, native)",
+            .help = "host AIO implementation (threads, native, io_uring)",
         },{
             .name = BDRV_OPT_CACHE_WB,
             .type = QEMU_OPT_BOOL,
--
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC PATCH 8/9] block/file-posix: extends to use with io_uring
  2019-05-21 23:52 [Qemu-devel] [RFC PATCH 0/9] Add support for io_uring Aarushi Mehta
                   ` (6 preceding siblings ...)
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 7/9] blockdev: accept io_uring as option Aarushi Mehta
@ 2019-05-21 23:52 ` Aarushi Mehta
  2019-05-22 15:45   ` Stefan Hajnoczi
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 9/9] configure: permits use of io_uring with probe Aarushi Mehta
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Aarushi Mehta @ 2019-05-21 23:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: saket.sinha89, Julia Suvorova, qemu-block, Stefan Hajnoczi,
	Aarushi Mehta

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
---
 block/file-posix.c | 63 ++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 58 insertions(+), 5 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 1cf4ee49eb..41952217a4 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -154,6 +154,7 @@ typedef struct BDRVRawState {
     bool has_write_zeroes:1;
     bool discard_zeroes:1;
     bool use_linux_aio:1;
+    bool use_linux_io_uring;
     bool page_cache_inconsistent:1;
     bool has_fallocate;
     bool needs_alignment;
@@ -423,7 +424,7 @@ static QemuOptsList raw_runtime_opts = {
         {
             .name = "aio",
             .type = QEMU_OPT_STRING,
-            .help = "host AIO implementation (threads, native)",
+            .help = "host AIO implementation (threads, native, io_uring)",
         },
         {
             .name = "locking",
@@ -494,6 +495,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
         goto fail;
     }
     s->use_linux_aio = (aio == BLOCKDEV_AIO_OPTIONS_NATIVE);
+    s->use_linux_io_uring = (aio == BLOCKDEV_AIO_OPTIONS_IO_URING);

     locking = qapi_enum_parse(&OnOffAuto_lookup,
                               qemu_opt_get(opts, "locking"),
@@ -557,7 +559,9 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
     s->shared_perm = BLK_PERM_ALL;

 #ifdef CONFIG_LINUX_AIO
-     /* Currently Linux does AIO only for files opened with O_DIRECT */
+     /*
+      * Currently Linux does AIO only for files opened with O_DIRECT
+      */
     if (s->use_linux_aio) {
         if (!(s->open_flags & O_DIRECT)) {
             error_setg(errp, "aio=native was specified, but it requires "
@@ -578,6 +582,21 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
         goto fail;
     }
 #endif /* !defined(CONFIG_LINUX_AIO) */
+#ifdef CONFIG_LINUX_IO_URING
+    if (s->use_linux_io_uring) {
+        if (!aio_setup_linux_io_uring(bdrv_get_aio_context(bs), errp)) {
+            error_prepend(errp, "Unable to use io_uring: ");
+            goto fail;
+        }
+    }
+#else
+    if (s->use_linux_io_uring) {
+        error_setg(errp, "aio=io_uring was specified, but is not supported "
+                         "in this build.");
+        ret = -EINVAL;
+        goto fail;
+    }
+#endif /* !defined(CONFIG_LINUX_IO_URING) */

     s->has_discard = true;
     s->has_write_zeroes = true;
@@ -1870,6 +1889,12 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
             LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
             assert(qiov->size == bytes);
             return laio_co_submit(bs, aio, s->fd, offset, qiov, type);
+#endif
+#ifdef CONFIG_LINUX_IO_URING
+        } else if (s->use_linux_io_uring) {
+            LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs));
+            assert(qiov->size == bytes);
+            return luring_co_submit(bs, aio, s->fd, offset, qiov, type);
 #endif
         }
     }
@@ -1907,24 +1932,40 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset,

 static void raw_aio_plug(BlockDriverState *bs)
 {
-#ifdef CONFIG_LINUX_AIO
+#if defined CONFIG_LINUX_AIO || defined CONFIG_LINUX_IO_URING
     BDRVRawState *s = bs->opaque;
+#endif
+#ifdef CONFIG_LINUX_AIO
     if (s->use_linux_aio) {
         LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
         laio_io_plug(bs, aio);
     }
 #endif
+#ifdef CONFIG_LINUX_IO_URING
+    if (s->use_linux_io_uring) {
+        LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs));
+        luring_io_plug(bs, aio);
+    }
+#endif
 }

 static void raw_aio_unplug(BlockDriverState *bs)
 {
-#ifdef CONFIG_LINUX_AIO
+#if defined CONFIG_LINUX_AIO || defined CONFIG_LINUX_IO_URING
     BDRVRawState *s = bs->opaque;
+#endif
+#ifdef CONFIG_LINUX_AIO
     if (s->use_linux_aio) {
         LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
         laio_io_unplug(bs, aio);
     }
 #endif
+#ifdef CONFIG_LINUX_IO_URING
+    if (s->use_linux_aio) {
+        LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs));
+        luring_io_unplug(bs, aio);
+    }
+#endif
 }

 static int raw_co_flush_to_disk(BlockDriverState *bs)
@@ -1950,8 +1991,10 @@ static int raw_co_flush_to_disk(BlockDriverState *bs)
 static void raw_aio_attach_aio_context(BlockDriverState *bs,
                                        AioContext *new_context)
 {
+#if defined CONFIG_LINUX_AIO || defined CONFIG_LINUX_IO_URING
+        BDRVRawState *s = bs->opaque;
+#endif
 #ifdef CONFIG_LINUX_AIO
-    BDRVRawState *s = bs->opaque;
     if (s->use_linux_aio) {
         Error *local_err;
         if (!aio_setup_linux_aio(new_context, &local_err)) {
@@ -1961,6 +2004,16 @@ static void raw_aio_attach_aio_context(BlockDriverState *bs,
         }
     }
 #endif
+#ifdef CONFIG_LINUX_IO_URING
+    if (s->use_linux_io_uring) {
+        Error *local_err;
+        if (!aio_setup_linux_io_uring(new_context, &local_err)) {
+            error_reportf_err(local_err, "Unable to use linux io_uring, "
+                                         "falling back to thread pool: ");
+            s->use_linux_io_uring = false;
+        }
+    }
+#endif
 }

 static void raw_close(BlockDriverState *bs)
--
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC PATCH 9/9] configure: permits use of io_uring with probe
  2019-05-21 23:52 [Qemu-devel] [RFC PATCH 0/9] Add support for io_uring Aarushi Mehta
                   ` (7 preceding siblings ...)
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 8/9] block/file-posix: extends to use with io_uring Aarushi Mehta
@ 2019-05-21 23:52 ` Aarushi Mehta
  2019-05-22  0:09 ` [Qemu-devel] [RFC PATCH 0/9] Add support for io_uring no-reply
  2019-05-22 15:46 ` Stefan Hajnoczi
  10 siblings, 0 replies; 22+ messages in thread
From: Aarushi Mehta @ 2019-05-21 23:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: saket.sinha89, Julia Suvorova, qemu-block, Stefan Hajnoczi,
	Aarushi Mehta

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
---
 configure | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/configure b/configure
index d2fc346302..25eb150cb5 100755
--- a/configure
+++ b/configure
@@ -365,6 +365,7 @@ xen=""
 xen_ctrl_version=""
 xen_pci_passthrough=""
 linux_aio=""
+linux_io_uring=""
 cap_ng=""
 attr=""
 libattr=""
@@ -1255,6 +1256,10 @@ for opt do
   ;;
   --enable-linux-aio) linux_aio="yes"
   ;;
+  --disable-linux-io-uring) linux_io_uring="no"
+  ;;
+  --enable-linux-io-uring) linux_io_uring="yes"
+  ;;
   --disable-attr) attr="no"
   ;;
   --enable-attr) attr="yes"
@@ -1773,6 +1778,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   vde             support for vde network
   netmap          support for netmap network
   linux-aio       Linux AIO support
+  linux-io-uring  Linux io_uring support
   cap-ng          libcap-ng support
   attr            attr and xattr support
   vhost-net       vhost-net kernel acceleration support
@@ -3949,6 +3955,23 @@ EOF
     linux_aio=no
   fi
 fi
+##########################################
+# linux-io-uring probe
+
+if test "$linux_io_uring" != "no" ; then
+  cat > $TMPC <<EOF
+#include <liburing.h>
+int main(void) { io_uring_queue_init(0, NULL, 0); io_uring_submit(NULL); return 0; }
+EOF
+  if compile_prog "" "-luring" ; then
+    linux_io_uring=yes
+  else
+    if test "$linux_io_uring" = "yes" ; then
+      feature_not_found "linux io_uring" "Install liburing"
+    fi
+    linux_io_uring=no
+  fi
+fi

 ##########################################
 # TPM emulation is only on POSIX
@@ -6351,6 +6374,7 @@ echo "PIE               $pie"
 echo "vde support       $vde"
 echo "netmap support    $netmap"
 echo "Linux AIO support $linux_aio"
+echo "Linux io_uring support $linux_io_uring"
 echo "ATTR/XATTR support $attr"
 echo "Install blobs     $blobs"
 echo "KVM support       $kvm"
@@ -6831,6 +6855,9 @@ fi
 if test "$linux_aio" = "yes" ; then
   echo "CONFIG_LINUX_AIO=y" >> $config_host_mak
 fi
+if test "$linux_io_uring" = "yes" ; then
+  echo "CONFIG_LINUX_IO_URING=y" >> $config_host_mak
+fi
 if test "$attr" = "yes" ; then
   echo "CONFIG_ATTR=y" >> $config_host_mak
 fi
--
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/9] Add support for io_uring
  2019-05-21 23:52 [Qemu-devel] [RFC PATCH 0/9] Add support for io_uring Aarushi Mehta
                   ` (8 preceding siblings ...)
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 9/9] configure: permits use of io_uring with probe Aarushi Mehta
@ 2019-05-22  0:09 ` no-reply
  2019-05-22 15:46 ` Stefan Hajnoczi
  10 siblings, 0 replies; 22+ messages in thread
From: no-reply @ 2019-05-22  0:09 UTC (permalink / raw)
  To: mehta.aaru20
  Cc: fam, qemu-block, qemu-devel, saket.sinha89, stefanha, jusual,
	mehta.aaru20

Patchew URL: https://patchew.org/QEMU/20190521235215.31341-1-mehta.aaru20@gmail.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [RFC PATCH 0/9] Add support for io_uring
Message-id: 20190521235215.31341-1-mehta.aaru20@gmail.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

From https://github.com/patchew-project/qemu
 * [new tag]               patchew/20190521235215.31341-1-mehta.aaru20@gmail.com -> patchew/20190521235215.31341-1-mehta.aaru20@gmail.com
Switched to a new branch 'test'
678f12be88 configure: permits use of io_uring with probe
e766842048 block/file-posix: extends to use with io_uring
03f080b205 blockdev: accept io_uring as option
7208a09b56 block/io_uring: implements interfaces for io_uring
8d681f052d util/asyn: add aio interfaces for io_uring
27f7b0c951 stubs: add aio interface stubs for io_uring
9c2064f2f5 include/block: declare interfaces for io_uring
54e3c5397c block/block: add BDRV flag for io_uring
25162d6c1f qapi/block-core: add option for io_uring

=== OUTPUT BEGIN ===
1/9 Checking commit 25162d6c1fc9 (qapi/block-core: add option for io_uring)
2/9 Checking commit 54e3c5397cb4 (block/block: add BDRV flag for io_uring)
3/9 Checking commit 9c2064f2f5db (include/block: declare interfaces for io_uring)
4/9 Checking commit 27f7b0c95162 (stubs: add aio interface stubs for io_uring)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#23: 
new file mode 100644

total: 0 errors, 1 warnings, 39 lines checked

Patch 4/9 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
5/9 Checking commit 8d681f052d02 (util/asyn: add aio interfaces for io_uring)
6/9 Checking commit 7208a09b5645 (block/io_uring: implements interfaces for io_uring)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#30: 
new file mode 100644

ERROR: else should follow close brace '}'
#240: FILE: block/io_uring.c:206:
+                }
+                else {

total: 1 errors, 1 warnings, 398 lines checked

Patch 6/9 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

7/9 Checking commit 03f080b2050b (blockdev: accept io_uring as option)
8/9 Checking commit e766842048eb (block/file-posix: extends to use with io_uring)
9/9 Checking commit 678f12be88a3 (configure: permits use of io_uring with probe)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190521235215.31341-1-mehta.aaru20@gmail.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/9] qapi/block-core: add option for io_uring
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 1/9] qapi/block-core: add option " Aarushi Mehta
@ 2019-05-22  0:39   ` Eric Blake
  2019-05-22  0:51     ` Aarushi Mehta
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Blake @ 2019-05-22  0:39 UTC (permalink / raw)
  To: Aarushi Mehta, qemu-devel
  Cc: saket.sinha89, Julia Suvorova, Stefan Hajnoczi, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1274 bytes --]

On 5/21/19 6:52 PM, Aarushi Mehta wrote:
> Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>

Sparse on the details. The subject line says what, but without a 'why'
for how io_uring is different from existing aio options, it's hard to
see why I'd want to use it. Do you have any benchmark numbers?

> ---
>  qapi/block-core.json | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 7ccbfff9d0..116995810a 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2776,11 +2776,12 @@
>  #
>  # @threads:     Use qemu's thread pool
>  # @native:      Use native AIO backend (only Linux and Windows)
> +# @io_uring:    Use linux io_uring

Missing a '(since 4.1)' tag.

>  #
>  # Since: 2.9
>  ##
>  { 'enum': 'BlockdevAioOptions',
> -  'data': [ 'threads', 'native' ] }
> +  'data': [ 'threads', 'native','io_uring' ] }

Missing space after ',' (not essential, but matching style is nice).
Should the new element be defined conditionally, so that introspection
only sees the new enum member when compiled for Linux?



-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/9] qapi/block-core: add option for io_uring
  2019-05-22  0:39   ` Eric Blake
@ 2019-05-22  0:51     ` Aarushi Mehta
  2019-05-22  1:01       ` Eric Blake
                         ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Aarushi Mehta @ 2019-05-22  0:51 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: saket.sinha89, Julia Suvorova, Stefan Hajnoczi, qemu-block

On Tue, 2019-05-21 at 19:39 -0500, Eric Blake wrote:
> On 5/21/19 6:52 PM, Aarushi Mehta wrote:
> > Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
> 
> Sparse on the details. The subject line says what, but without a
> 'why'
> for how io_uring is different from existing aio options, it's hard to
> see why I'd want to use it. Do you have any benchmark numbers?

For peak performance, io_uring helps us get to 1.7M 4k IOPS with
polling. aio reaches a performance cliff much lower than that, at 608K.
If we disable polling, io_uring is able to drive about 1.2M IOPS for
the (otherwise) same test case.

More details, and the source for the above is at
http://kernel.dk/io_uring.pdf

> > ---
> >  qapi/block-core.json | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index 7ccbfff9d0..116995810a 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -2776,11 +2776,12 @@
> >  #
> >  # @threads:     Use qemu's thread pool
> >  # @native:      Use native AIO backend (only Linux and Windows)
> > +# @io_uring:    Use linux io_uring
> 
> Missing a '(since 4.1)' tag.
> 
> >  #
> >  # Since: 2.9
> >  ##
> >  { 'enum': 'BlockdevAioOptions',
> > -  'data': [ 'threads', 'native' ] }
> > +  'data': [ 'threads', 'native','io_uring' ] }
> 
> Missing space after ',' (not essential, but matching style is nice).
> Should the new element be defined conditionally, so that
> introspection
> only sees the new enum member when compiled for Linux?
> 
I'm not sure what would be the benefits of that? We already check for
Linux at configure, and this would reduce readability. We aren't doing
this for native.
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/9] qapi/block-core: add option for io_uring
  2019-05-22  0:51     ` Aarushi Mehta
@ 2019-05-22  1:01       ` Eric Blake
  2019-05-22  9:16       ` [Qemu-devel] [Qemu-block] " Kevin Wolf
  2019-05-22 13:07       ` [Qemu-devel] " Stefan Hajnoczi
  2 siblings, 0 replies; 22+ messages in thread
From: Eric Blake @ 2019-05-22  1:01 UTC (permalink / raw)
  To: Aarushi Mehta, qemu-devel
  Cc: saket.sinha89, Julia Suvorova, Stefan Hajnoczi, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1078 bytes --]

On 5/21/19 7:51 PM, Aarushi Mehta wrote:

>>> +# @io_uring:    Use linux io_uring
>>
>> Missing a '(since 4.1)' tag.
>>
>>>  #
>>>  # Since: 2.9
>>>  ##
>>>  { 'enum': 'BlockdevAioOptions',
>>> -  'data': [ 'threads', 'native' ] }
>>> +  'data': [ 'threads', 'native','io_uring' ] }
>>
>> Missing space after ',' (not essential, but matching style is nice).
>> Should the new element be defined conditionally, so that
>> introspection
>> only sees the new enum member when compiled for Linux?
>>
> I'm not sure what would be the benefits of that? We already check for
> Linux at configure, and this would reduce readability. We aren't doing
> this for native.

Look at BlockdevOptionsFile in qapi/block-core.qapi. Telling the QAPI
generator that something is only available on Linux means that it will
be obvious to introspection (the QMP command query-qmp-schema) whether
the feature is present in a particular binary.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [RFC PATCH 1/9] qapi/block-core: add option for io_uring
  2019-05-22  0:51     ` Aarushi Mehta
  2019-05-22  1:01       ` Eric Blake
@ 2019-05-22  9:16       ` Kevin Wolf
  2019-05-22 13:07       ` [Qemu-devel] " Stefan Hajnoczi
  2 siblings, 0 replies; 22+ messages in thread
From: Kevin Wolf @ 2019-05-22  9:16 UTC (permalink / raw)
  To: Aarushi Mehta
  Cc: qemu-block, qemu-devel, saket.sinha89, Stefan Hajnoczi, Julia Suvorova

Am 22.05.2019 um 02:51 hat Aarushi Mehta geschrieben:
> On Tue, 2019-05-21 at 19:39 -0500, Eric Blake wrote:
> > On 5/21/19 6:52 PM, Aarushi Mehta wrote:
> > > Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
> > 
> > Sparse on the details. The subject line says what, but without a
> > 'why'
> > for how io_uring is different from existing aio options, it's hard to
> > see why I'd want to use it. Do you have any benchmark numbers?
> 
> For peak performance, io_uring helps us get to 1.7M 4k IOPS with
> polling. aio reaches a performance cliff much lower than that, at 608K.
> If we disable polling, io_uring is able to drive about 1.2M IOPS for
> the (otherwise) same test case.
> 
> More details, and the source for the above is at
> http://kernel.dk/io_uring.pdf
> 
> > > ---
> > >  qapi/block-core.json | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > > index 7ccbfff9d0..116995810a 100644
> > > --- a/qapi/block-core.json
> > > +++ b/qapi/block-core.json
> > > @@ -2776,11 +2776,12 @@
> > >  #
> > >  # @threads:     Use qemu's thread pool
> > >  # @native:      Use native AIO backend (only Linux and Windows)
> > > +# @io_uring:    Use linux io_uring
> > 
> > Missing a '(since 4.1)' tag.
> > 
> > >  #
> > >  # Since: 2.9
> > >  ##
> > >  { 'enum': 'BlockdevAioOptions',
> > > -  'data': [ 'threads', 'native' ] }
> > > +  'data': [ 'threads', 'native','io_uring' ] }
> > 
> > Missing space after ',' (not essential, but matching style is nice).
> > Should the new element be defined conditionally, so that
> > introspection
> > only sees the new enum member when compiled for Linux?
> > 
> I'm not sure what would be the benefits of that? We already check for
> Linux at configure, and this would reduce readability. We aren't doing
> this for native.

BlockdevAioOptions is used in BlockdevOptionsFile, which contains the
options for two different drivers: file-posix and file-win32. Both of
them support both 'threads' and 'native'. However, I don't think you'll
add the new mode to the Windows driver. So I think making it conditional
on CONFIG_POSIX at least is necessary.

Kevin


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/9] qapi/block-core: add option for io_uring
  2019-05-22  0:51     ` Aarushi Mehta
  2019-05-22  1:01       ` Eric Blake
  2019-05-22  9:16       ` [Qemu-devel] [Qemu-block] " Kevin Wolf
@ 2019-05-22 13:07       ` Stefan Hajnoczi
  2 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-05-22 13:07 UTC (permalink / raw)
  To: Aarushi Mehta
  Cc: qemu-block, qemu-devel, saket.sinha89, Stefan Hajnoczi, Julia Suvorova

[-- Attachment #1: Type: text/plain, Size: 1033 bytes --]

On Wed, May 22, 2019 at 06:21:51AM +0530, Aarushi Mehta wrote:
> On Tue, 2019-05-21 at 19:39 -0500, Eric Blake wrote:
> > On 5/21/19 6:52 PM, Aarushi Mehta wrote:
> > > Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
> > 
> > Sparse on the details. The subject line says what, but without a
> > 'why'
> > for how io_uring is different from existing aio options, it's hard to
> > see why I'd want to use it. Do you have any benchmark numbers?
> 
> For peak performance, io_uring helps us get to 1.7M 4k IOPS with
> polling. aio reaches a performance cliff much lower than that, at 608K.
> If we disable polling, io_uring is able to drive about 1.2M IOPS for
> the (otherwise) same test case.
> 
> More details, and the source for the above is at
> http://kernel.dk/io_uring.pdf

So that Aarushi's email isn't accidentally misquoted later on:

These numbers are not via QEMU.  QEMU is likely to show different
performance results and they are expected to be lower due to
virtualization overhead.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/9] include/block: declare interfaces for io_uring
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 3/9] include/block: declare interfaces " Aarushi Mehta
@ 2019-05-22 13:09   ` Stefan Hajnoczi
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-05-22 13:09 UTC (permalink / raw)
  To: Aarushi Mehta
  Cc: saket.sinha89, Stefan Hajnoczi, Julia Suvorova, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 570 bytes --]

On Wed, May 22, 2019 at 05:22:09AM +0530, Aarushi Mehta wrote:
>  #ifdef CONFIG_LINUX_AIO
> -    /* State for native Linux AIO.  Uses aio_context_acquire/release for
> +    /*
> +     * State for native Linux AIO.  Uses aio_context_acquire/release for
>       * locking.
>       */
>      struct LinuxAioState *linux_aio;
>  #endif
> +#ifdef CONFIG_LINUX_IO_URING
> +    /*
> +     * State for native Linux AIO.  Uses aio_context_acquire/release for

s/native Linux AIO/Linux io_uring/

> +     * locking.
> +     */
> +    struct LuringState *linux_io_uring;
> +#endif

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 4/9] stubs: add aio interface stubs for io_uring
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 4/9] stubs: add aio interface stubs " Aarushi Mehta
@ 2019-05-22 13:10   ` Stefan Hajnoczi
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-05-22 13:10 UTC (permalink / raw)
  To: Aarushi Mehta
  Cc: saket.sinha89, Stefan Hajnoczi, Julia Suvorova, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 344 bytes --]

On Wed, May 22, 2019 at 05:22:10AM +0530, Aarushi Mehta wrote:
> Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
> ---
>  stubs/Makefile.objs |  1 +
>  stubs/io_uring.c    | 32 ++++++++++++++++++++++++++++++++
>  2 files changed, 33 insertions(+)
>  create mode 100644 stubs/io_uring.c

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 5/9] util/asyn: add aio interfaces for io_uring
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 5/9] util/asyn: add aio interfaces " Aarushi Mehta
@ 2019-05-22 13:16   ` Stefan Hajnoczi
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-05-22 13:16 UTC (permalink / raw)
  To: Aarushi Mehta
  Cc: saket.sinha89, Stefan Hajnoczi, Julia Suvorova, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1151 bytes --]

On Wed, May 22, 2019 at 05:22:11AM +0530, Aarushi Mehta wrote:

s/asyn/async/ in the Subject line.

> @@ -341,6 +349,25 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
>  }
>  #endif
> 
> +#ifdef CONFIG_LINUX_IO_URING
> +LuringState *aio_setup_linux_io_uring(AioContext *ctx, Error **errp)
> +{
> +    if (!ctx->linux_io_uring) {
> +        ctx->linux_io_uring = luring_init(errp);
> +        if (ctx->linux_io_uring) {
> +            luring_attach_aio_context(ctx->linux_io_uring, ctx);
> +        }
> +    }
> +    return ctx->linux_io_uring;
> +}

Personally I find functions that are written to have a single return
statement hard to understand due to the nesting and multiple code paths.
I prefer a straight-line approach that returns as soon as there is an
error.

This is a style thing.  You can do it either way.  Here is the same
without nesting:

  if (ctx->linux_io_uring) {
      return ctx->linux_io_uring;
  }

  ctx->linux_io_uring = luring_init(errp);
  if (!ctx->linux_io_uring) {
      return NULL;
  }

  luring_attach_aio_context(ctx->linux_io_uring, ctx);
  return ctx->linux_io_uring;

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 6/9] block/io_uring: implements interfaces for io_uring
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 6/9] block/io_uring: implements " Aarushi Mehta
@ 2019-05-22 15:43   ` Stefan Hajnoczi
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-05-22 15:43 UTC (permalink / raw)
  To: Aarushi Mehta
  Cc: saket.sinha89, Stefan Hajnoczi, Julia Suvorova, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 16097 bytes --]

On Wed, May 22, 2019 at 05:22:12AM +0530, Aarushi Mehta wrote:
> Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
> ---
>  block/Makefile.objs |   2 +
>  block/io_uring.c    | 385 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 387 insertions(+)
>  create mode 100644 block/io_uring.c
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 7a81892a52..262d413c6d 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -18,6 +18,7 @@ block-obj-y += block-backend.o snapshot.o qapi.o
>  block-obj-$(CONFIG_WIN32) += file-win32.o win32-aio.o
>  block-obj-$(CONFIG_POSIX) += file-posix.o
>  block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
> +block-obj-$(CONFIG_LINUX_IO_URING) += io_uring.o
>  block-obj-y += null.o mirror.o commit.o io.o create.o
>  block-obj-y += throttle-groups.o
>  block-obj-$(CONFIG_LINUX) += nvme.o
> @@ -61,5 +62,6 @@ block-obj-$(if $(CONFIG_LZFSE),m,n) += dmg-lzfse.o
>  dmg-lzfse.o-libs   := $(LZFSE_LIBS)
>  qcow.o-libs        := -lz
>  linux-aio.o-libs   := -laio
> +io_uring.o-libs    := -luring
>  parallels.o-cflags := $(LIBXML2_CFLAGS)
>  parallels.o-libs   := $(LIBXML2_LIBS)
> diff --git a/block/io_uring.c b/block/io_uring.c
> new file mode 100644
> index 0000000000..f8b0df90d4
> --- /dev/null
> +++ b/block/io_uring.c
> @@ -0,0 +1,385 @@
> +/*
> + * Linux io_uring support.
> + *
> + * Copyright (C) 2009 IBM, Corp.
> + * Copyright (C) 2009 Red Hat, Inc.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +#include "block/aio.h"
> +#include "qemu/queue.h"
> +#include "block/block.h"
> +#include "block/raw-aio.h"
> +#include "qemu/event_notifier.h"
> +#include "qemu/coroutine.h"
> +#include "qapi/error.h"
> +
> +#include <liburing.h>

Please include system headers (<>) after "qemu/osdep.h" and before other
QEMU headers.  See the ./HACKING file "1.2. Include directives" for more
info.

> +
> +/*
> + * Queue size (per-device).
> + *
> + * XXX: eventually we need to communicate this to the guest and/or make it
> + *      tunable by the guest.  If we get more outstanding requests at a time
> + *      than this we will get EAGAIN from io_submit which is communicated to
> + *      the guest as an I/O error.
> + */

This comment mentions Linux AIO io_submit() and needs to be updated.
The issue mentioned shouldn't apply since io_q.pending holds requests
that cannot be issued to the kernel while the submission queue is
exhausted.

> +#define MAX_EVENTS 128
> +
> +struct qemu_luringcb {

Please follow QEMU coding style (using typedef and naming structs
LikeThis):

  typedef struct LuringAIOCB {
      ...
  } LuringAIOCB;

> +    BlockAIOCB common;
> +    Coroutine *co;
> +    LuringState *ctx;

Please remove this field and pass it as an argument to
luring_do_submit() instead.  ctx isn't used after request submission.

> +    struct io_uring_sqe sqeq;
> +    int ret;

This field is not written in the completion callback.  This means the
luring_co_submit() is currently broken!

> +    size_t nbytes;

This field is written but never read.  Please remove it.

> +    QEMUIOVector *qiov;

Please remove this field and pass it as an argument to
luring_do_submit() instead.  qiov isn't used after request submission.

> +    bool is_read;

This field is written but never read.  Please remove it.

> +    QSIMPLEQ_ENTRY(qemu_luringcb) next;
> +};
> +
> +typedef struct {
> +    int plugged;
> +    unsigned int in_queue;
> +    unsigned int in_flight;
> +    bool blocked;
> +    QSIMPLEQ_HEAD(, qemu_luringcb) pending;
> +} LuringQueue;
> +
> +struct LuringState {
> +    AioContext *aio_context;
> +
> +    struct io_uring ring;
> +    EventNotifier e;

Where is this EventNotifier used?  I thought ring->ring_fd is the file
descriptor that we monitor.

> +
> +    /* io queue for submit at batch.  Protected by AioContext lock. */
> +    LuringQueue io_q;
> +
> +    /* I/O completion processing.  Only runs in I/O thread.  */
> +    QEMUBH *completion_bh;
> +    int event_idx;
> +    int event_max;
> +};
> +
> +static void ioq_submit(LuringState *s);
> +
> +static inline int32_t io_cqe_ret(struct io_uring_cqe *cqe)
> +{
> +    return cqe->res;
> +}
> +
> +/**
> + * io_getevents_peek:
> + * @ring: io_uring instance
> + * @cqes: Completion event array
> +
> + * Returns the number of completed events and sets a pointer
> + * on events queue.  This function does not update the head and tail.
> + */
> +static inline unsigned int io_getevents_peek(struct io_uring *ring,
> +                                             struct io_uring_cqe **cqes)
> +{
> +    unsigned int nr;
> +    read_barrier();
> +    struct io_uring_cq *cq_ring = &ring->cq;
> +    unsigned int head = *cq_ring->khead & *cq_ring->kring_mask;
> +    unsigned int tail = *cq_ring->ktail & *cq_ring->kring_mask;
> +    nr = tail - head;
> +    *cqes = ring->cq.cqes;
> +    return nr;
> +}

Is it possible to use liburing's io_uring_peek_cqe() instead?  I don't
see a need to know the number of available completed requests.  We can
just collect completed requests one-by-one.

> +
> +/**
> + * qemu_luring_process_completions:
> + * @s: AIO state
> + *
> + * Fetches completed I/O requests, consumes cqes and invokes their callbacks.
> + *
> + */
> +static void qemu_luring_process_completions(LuringState *s)
> +{
> +    struct io_uring_cqe *cqes;
> +    qemu_bh_schedule(s->completion_bh);
> +
> +    while ((s->event_max = io_getevents_peek(&s->ring, &cqes))) {
> +        for (s->event_idx = 0; s->event_idx < s->event_max; ) {
> +                io_uring_cqe_seen(&s->ring, cqes);
> +
> +            /* Change counters one-by-one because we can be nested. */
> +            s->io_q.in_flight--;
> +            s->event_idx++;
> +        }
> +    }

Wait, where does this function invoke the aiocb callback?  Something is
missing here.  I guess later patches do it but this patch is incomplete
without it...

> +
> +    qemu_bh_cancel(s->completion_bh);
> +
> +    /*
> +     *If we are nested we have to notify the level above that we are done
> +     * by setting event_max to zero, upper level will then jump out of it's
> +     * own `for` loop.  If we are the last all counters dropped to zero.
> +     */
> +    s->event_max = 0;
> +    s->event_idx = 0;
> +}
> +
> +static void qemu_luring_process_completions_and_submit(LuringState *s)
> +{
> +    aio_context_acquire(s->aio_context);
> +    qemu_luring_process_completions(s);
> +
> +    if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {

It's safer to use io_q.in_queue > 0 instead of
QSIMPLEQ_EMPTY(&s->io_q.pending).  If io_uring_submit() consumed fewer
sqes than available, then io_q.pending may be empty but we still need to
call io_uring_submit() again to get those remaining sqes submitted.

> +        ioq_submit(s);
> +    }
> +    aio_context_release(s->aio_context);
> +}
> +
> +static void qemu_luring_completion_bh(void *opaque)
> +{
> +    LuringState *s = opaque;
> +
> +    qemu_luring_process_completions_and_submit(s);
> +}
> +
> +static void qemu_luring_completion_cb(EventNotifier *e)
> +{
> +    LuringState *s = container_of(e, LuringState, e);
> +
> +    if (event_notifier_test_and_clear(&s->e)) {

The EventNotifier isn't used so this call can be removed.  We need to
call qemu_luring_process_completions_and_submit() directly.

> +        qemu_luring_process_completions_and_submit(s);
> +    }
> +}
> +
> +static bool qemu_luring_poll_cb(void *opaque)
> +{
> +    EventNotifier *e = opaque;
> +    LuringState *s = container_of(e, LuringState, e);
> +    struct io_uring_cqe *cqes;
> +
> +    if (!io_getevents_peek(&s->ring, &cqes)) {
> +        return false;
> +    }
> +
> +    qemu_luring_process_completions_and_submit(s);
> +    return true;
> +}

This is unused and can be removed for now.  Only the unused
EventNotifier references this.

> +
> +static const AIOCBInfo luring_aiocb_info = {
> +    .aiocb_size         = sizeof(struct qemu_luringcb),
> +};
> +
> +
> +static void ioq_init(LuringQueue *io_q)
> +{
> +    QSIMPLEQ_INIT(&io_q->pending);
> +    io_q->plugged = 0;
> +    io_q->in_queue = 0;
> +    io_q->in_flight = 0;
> +    io_q->blocked = false;
> +}
> +
> +static void ioq_submit(LuringState *s)
> +{
> +    int ret, len;
> +    struct qemu_luringcb *aiocb;
> +    QSIMPLEQ_HEAD(, qemu_luringcb) completed;
> +
> +    do {
> +        if (s->io_q.in_flight >= MAX_EVENTS) {
> +            break;
> +        }

This do { if (...) } while (...) loop can be written more concisely as:

  while (!QSIMPLEQ_EMPTY(&s->io_q.pending) &&
         s->io_q.in_flight >= MAX_EVENTS) {

> +        len = 0;
> +        QSIMPLEQ_FOREACH(aiocb, &s->io_q.pending, next) {
> +            if (s->io_q.in_flight + len++ >= MAX_EVENTS) {
> +                break;
> +            }
> +            struct io_uring_sqe *sqes = io_uring_get_sqe(&s->ring);
> +            if (sqes)  { /* Prep sqe for subission */

If the ring is exhausted we should stop trying to add more requests:

  if (!sqes) {
      break;
  }

> +                memset(sqes, 0, sizeof(*sqes));
> +                sqes->opcode = aiocb->sqeq.opcode;
> +                sqes->fd = aiocb->sqeq.fd;
> +                if (sqes->opcode == IORING_OP_FSYNC) {
> +                    sqes->fsync_flags = aiocb->sqeq.fsync_flags;
> +                }
> +                else {
> +                    sqes->off = aiocb->sqeq.off;
> +                    sqes->addr = aiocb->sqeq.addr;
> +                    sqes->len = aiocb->sqeq.len;
> +                }

struct io_uring_sqe can safely be copied since it does not contain
pointers to itself.  The preceding lines of code can be simplified to:

  *sqes = aiocb->sqeq;

> +                QSIMPLEQ_REMOVE_HEAD(&s->io_q.pending, next);
> +            }
> +        }
> +
> +        ret =  io_uring_submit(&s->ring);
> +        if (ret == -EAGAIN) {
> +            break;
> +        }
> +
> +        s->io_q.in_flight += ret;
> +        s->io_q.in_queue  -= ret;
> +        QSIMPLEQ_SPLIT_AFTER(&s->io_q.pending, aiocb, next, &completed);
> +    } while (!QSIMPLEQ_EMPTY(&s->io_q.pending));
> +    s->io_q.blocked = (s->io_q.in_queue > 0);
> +
> +    if (s->io_q.in_flight) {
> +        /*
> +         * We can try to complete something just right away if there are
> +         * still requests in-flight.
> +         */
> +        qemu_luring_process_completions(s);
> +    }
> +}
> +
> +void luring_io_plug(BlockDriverState *bs, LuringState *s)
> +{
> +    s->io_q.plugged++;
> +}
> +
> +void luring_io_unplug(BlockDriverState *bs, LuringState *s)
> +{
> +    assert(s->io_q.plugged);
> +    if (--s->io_q.plugged == 0 &&
> +        !s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
> +        ioq_submit(s);
> +    }
> +}
> +
> +static int luring_do_submit(int fd, struct qemu_luringcb *luringcb,
> +                            off_t offset, int type)
> +{
> +    LuringState *s = luringcb->ctx;
> +    struct io_uring_sqe *sqes = io_uring_get_sqe(&s->ring);
> +    if (!sqes) {
> +        sqes = &luringcb->sqeq;
> +        QSIMPLEQ_INSERT_TAIL(&s->io_q.pending, luringcb, next);
> +    }
> +    QEMUIOVector *qiov = luringcb->qiov;
> +
> +    switch (type) {
> +    case QEMU_AIO_WRITE:
> +        io_uring_prep_writev(sqes, fd, qiov->iov, qiov->niov, offset);
> +        break;
> +    case QEMU_AIO_READ:
> +        io_uring_prep_readv(sqes, fd, qiov->iov, qiov->niov, offset);
> +        break;
> +    case QEMU_AIO_FLUSH:
> +        io_uring_prep_fsync(sqes, fd, 0);
> +        break;
> +    default:
> +        fprintf(stderr, "%s: invalid AIO request type 0x%x.\n",
> +                        __func__, type);

sqes is leaked if io_uring_get_sqe() succeeded.  Looking at the io_uring
API, there doesn't seem to be a way to return an unused sqe to the ring
:(.

Perhaps just make return -EIO an abort() call instead.  This case should
never happen.  If it does happen, fail loudly.

> +        return -EIO;
> +    }
> +
> +    s->io_q.in_queue++;
> +    if (!s->io_q.blocked &&
> +        (!s->io_q.plugged ||
> +         s->io_q.in_flight + s->io_q.in_queue >= MAX_EVENTS)) {
> +        ioq_submit(s);
> +    }
> +
> +    return 0;
> +}
> +
> +int coroutine_fn luring_co_submit(BlockDriverState *bs, LuringState *s, int fd,
> +                                uint64_t offset, QEMUIOVector *qiov, int type)
> +{
> +    int ret;
> +    struct qemu_luringcb luringcb = {
> +        .co         = qemu_coroutine_self(),
> +        .nbytes     = qiov->size,
> +        .ctx        = s,
> +        .ret        = -EINPROGRESS,
> +        .is_read    = (type == QEMU_AIO_READ),
> +        .qiov       = qiov,
> +    };
> +
> +    ret = luring_do_submit(fd, &luringcb, offset, type);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    if (luringcb.ret == -EINPROGRESS) {
> +        qemu_coroutine_yield();
> +    }
> +    return luringcb.ret;
> +}
> +
> +BlockAIOCB *luring_submit(BlockDriverState *bs, LuringState *s, int fd,
> +        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
> +        BlockCompletionFunc *cb, void *opaque, int type)
> +{
> +    struct qemu_luringcb *luringcb;
> +    off_t offset = sector_num * BDRV_SECTOR_SIZE;
> +    int ret;
> +
> +    luringcb = qemu_aio_get(&luring_aiocb_info, bs, cb, opaque);
> +    luringcb->nbytes = nb_sectors * BDRV_SECTOR_SIZE;
> +    luringcb->ctx = s;
> +    luringcb->ret = -EINPROGRESS;
> +    luringcb->is_read = (type == QEMU_AIO_READ);
> +    luringcb->qiov = qiov;
> +
> +    ret = luring_do_submit(fd, luringcb, offset, type);
> +    if (ret < 0) {
> +        qemu_aio_unref(luringcb);
> +        return NULL;
> +    }
> +
> +    return &luringcb->common;
> +}
> +
> +void luring_detach_aio_context(LuringState *s, AioContext *old_context)
> +{
> +    aio_set_event_notifier(old_context, &s->e, false, NULL, NULL);
> +    qemu_bh_delete(s->completion_bh);
> +    s->aio_context = NULL;
> +}
> +
> +void luring_attach_aio_context(LuringState *s, AioContext *new_context)
> +{
> +    s->aio_context = new_context;
> +    s->completion_bh = aio_bh_new(new_context, qemu_luring_completion_bh, s);
> +    aio_set_event_notifier(new_context, &s->e, false,
> +                           qemu_luring_completion_cb,
> +                           qemu_luring_poll_cb);
> +}
> +
> +LuringState *luring_init(Error **errp)
> +{
> +    int rc;
> +    LuringState *s;
> +    s = g_malloc0(sizeof(*s));
> +    rc = event_notifier_init(&s->e, false);
> +    if (rc < 0) {
> +        error_setg_errno(errp, -rc, "failed to to initialize event notifier");
> +        goto out_free_state;
> +    }
> +
> +    struct io_uring *ring = &s->ring;
> +    rc =  io_uring_queue_init(MAX_EVENTS, ring, 0);
> +    if (rc < 0) {
> +        error_setg_errno(errp, -rc, "failed to create linux io_uring queue");

liburing documents the error return as -1.  The error can be found in
errno:

s/-rc/errno/

> +        goto out_close_efd;
> +    }
> +
> +    ioq_init(&s->io_q);
> +    aio_set_fd_handler(s->aio_context, ring->ring_fd, false,
> +                       (IOHandler *)qemu_luring_completion_cb, NULL, NULL, &s);

Please do this in attach/detach_aio_context().

> +    return s;
> +
> +out_close_efd:
> +    event_notifier_cleanup(&s->e);
> +out_free_state:
> +    g_free(s);
> +    return NULL;
> +}
> +
> +void luring_cleanup(LuringState *s)
> +{
> +    event_notifier_cleanup(&s->e);
> +    io_uring_queue_exit(&s->ring);
> +    g_free(s);
> +}
> --
> 2.17.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 8/9] block/file-posix: extends to use with io_uring
  2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 8/9] block/file-posix: extends to use with io_uring Aarushi Mehta
@ 2019-05-22 15:45   ` Stefan Hajnoczi
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-05-22 15:45 UTC (permalink / raw)
  To: Aarushi Mehta
  Cc: saket.sinha89, Stefan Hajnoczi, Julia Suvorova, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 647 bytes --]

On Wed, May 22, 2019 at 05:22:14AM +0530, Aarushi Mehta wrote:
> Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
> ---
>  block/file-posix.c | 63 ++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 58 insertions(+), 5 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 1cf4ee49eb..41952217a4 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -154,6 +154,7 @@ typedef struct BDRVRawState {
>      bool has_write_zeroes:1;
>      bool discard_zeroes:1;
>      bool use_linux_aio:1;
> +    bool use_linux_io_uring;

:1 for consistency with the surrounding bool fields.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/9] Add support for io_uring
  2019-05-21 23:52 [Qemu-devel] [RFC PATCH 0/9] Add support for io_uring Aarushi Mehta
                   ` (9 preceding siblings ...)
  2019-05-22  0:09 ` [Qemu-devel] [RFC PATCH 0/9] Add support for io_uring no-reply
@ 2019-05-22 15:46 ` Stefan Hajnoczi
  10 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2019-05-22 15:46 UTC (permalink / raw)
  To: Aarushi Mehta
  Cc: saket.sinha89, Stefan Hajnoczi, Julia Suvorova, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1448 bytes --]

On Wed, May 22, 2019 at 05:22:06AM +0530, Aarushi Mehta wrote:
> This patch series adds supports for the newly developed io_uring Linux AIO interface. Testing it requires a host kernel with it and the liburing library. Use the option -drive aio=io_uring to enable it.
> 
> Aarushi Mehta (9):
>   qapi/block-core: add option for io_uring
>   block/block: add BDRV flag for io_uring
>   include/block: declare interfaces for io_uring
>   stubs: add aio interface stubs for io_uring
>   util/asyn: add aio interfaces for io_uring
>   block/io_uring: implements interfaces for io_uring
>   blockdev: accept io_uring as option
>   block/file-posix: extends to use with io_uring
>   configure: permits use of io_uring with probe
> 
>  block/Makefile.objs     |   2 +
>  block/file-posix.c      |  63 ++++++-
>  block/io_uring.c        | 385 ++++++++++++++++++++++++++++++++++++++++
>  blockdev.c              |   4 +-
>  configure               |  27 +++
>  include/block/aio.h     |  16 +-
>  include/block/block.h   |   1 +
>  include/block/raw-aio.h |  15 ++
>  qapi/block-core.json    |   3 +-
>  stubs/Makefile.objs     |   1 +
>  stubs/io_uring.c        |  32 ++++
>  util/async.c            |  32 ++++
>  12 files changed, 573 insertions(+), 8 deletions(-)
>  create mode 100644 block/io_uring.c
>  create mode 100644 stubs/io_uring.c

Thanks, I've posted my review feedback!  Looking forward to the next
revision.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-05-22 15:56 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-21 23:52 [Qemu-devel] [RFC PATCH 0/9] Add support for io_uring Aarushi Mehta
2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 1/9] qapi/block-core: add option " Aarushi Mehta
2019-05-22  0:39   ` Eric Blake
2019-05-22  0:51     ` Aarushi Mehta
2019-05-22  1:01       ` Eric Blake
2019-05-22  9:16       ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2019-05-22 13:07       ` [Qemu-devel] " Stefan Hajnoczi
2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 2/9] block/block: add BDRV flag " Aarushi Mehta
2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 3/9] include/block: declare interfaces " Aarushi Mehta
2019-05-22 13:09   ` Stefan Hajnoczi
2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 4/9] stubs: add aio interface stubs " Aarushi Mehta
2019-05-22 13:10   ` Stefan Hajnoczi
2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 5/9] util/asyn: add aio interfaces " Aarushi Mehta
2019-05-22 13:16   ` Stefan Hajnoczi
2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 6/9] block/io_uring: implements " Aarushi Mehta
2019-05-22 15:43   ` Stefan Hajnoczi
2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 7/9] blockdev: accept io_uring as option Aarushi Mehta
2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 8/9] block/file-posix: extends to use with io_uring Aarushi Mehta
2019-05-22 15:45   ` Stefan Hajnoczi
2019-05-21 23:52 ` [Qemu-devel] [RFC PATCH 9/9] configure: permits use of io_uring with probe Aarushi Mehta
2019-05-22  0:09 ` [Qemu-devel] [RFC PATCH 0/9] Add support for io_uring no-reply
2019-05-22 15:46 ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.