All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] linux-aio: reduce completion latency
@ 2016-07-19 10:25 Roman Pen
  2016-07-19 10:25 ` [Qemu-devel] [PATCH 1/3] linux-aio: consume events in userspace instead of calling io_getevents Roman Pen
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Roman Pen @ 2016-07-19 10:25 UTC (permalink / raw)
  Cc: Roman Pen, Stefan Hajnoczi, Paolo Bonzini, qemu-devel

This series are intended to reduce completion latencies by two changes:

1. QEMU does not use any timeout value for harvesting completed AIO
   requests from the ring buffer, thus io_getevents() can be implemented
   in userspace (first patch).

2. In order to reduce completion latency it makes sense to harvest completed
   requests ASAP.  Very fast backend device can complete requests just after
   submission, so it is worth trying to check ring buffer and peek completed
   requests directly after io_submit() has been called (third patch).

Indeed, the series reduces the completions latencies and increases the
overall throughput, e.g. the following is the percentiles of number of
completed requests at once:

        1th 10th  20th  30th  40th  50th  60th  70th  80th  90th  99.99th
Before    2    4    42   112   128   128   128   128   128   128    128
 After    1    1     4    14    33    45    47    48    50    51    108

That means, that before the third patch is applied the ring buffer is
observed as full (128 requests were consumed at once) in 60% of calls.

After the third patch is applied the distribution of number of completed
requests is "smoother" and the queue (requests in-flight) is almost never
full.

The fio read results are the following (write results are almost the
same and are not showed here):

  Before
  ------
job: (groupid=0, jobs=8): err= 0: pid=2227: Tue Jul 19 11:29:50 2016
  Description  : [Emulation of Storage Server Access Pattern]
  read : io=54681MB, bw=1822.7MB/s, iops=179779, runt= 30001msec
    slat (usec): min=172, max=16883, avg=338.35, stdev=109.66
    clat (usec): min=1, max=21977, avg=1051.45, stdev=299.29
     lat (usec): min=317, max=22521, avg=1389.83, stdev=300.73
    clat percentiles (usec):
     |  1.00th=[  346],  5.00th=[  596], 10.00th=[  708], 20.00th=[  852],
     | 30.00th=[  932], 40.00th=[  996], 50.00th=[ 1048], 60.00th=[ 1112],
     | 70.00th=[ 1176], 80.00th=[ 1256], 90.00th=[ 1384], 95.00th=[ 1496],
     | 99.00th=[ 1800], 99.50th=[ 1928], 99.90th=[ 2320], 99.95th=[ 2672],
     | 99.99th=[ 4704]
    bw (KB  /s): min=205229, max=553181, per=12.50%, avg=233278.26, stdev=18383.51

  After
  ------
job: (groupid=0, jobs=8): err= 0: pid=2220: Tue Jul 19 11:31:51 2016
  Description  : [Emulation of Storage Server Access Pattern]
  read : io=57637MB, bw=1921.2MB/s, iops=189529, runt= 30002msec
    slat (usec): min=169, max=20636, avg=329.61, stdev=124.18
    clat (usec): min=2, max=19592, avg=988.78, stdev=251.04
     lat (usec): min=381, max=21067, avg=1318.42, stdev=243.58
    clat percentiles (usec):
     |  1.00th=[  310],  5.00th=[  580], 10.00th=[  748], 20.00th=[  876],
     | 30.00th=[  908], 40.00th=[  948], 50.00th=[ 1012], 60.00th=[ 1064],
     | 70.00th=[ 1080], 80.00th=[ 1128], 90.00th=[ 1224], 95.00th=[ 1288],
     | 99.00th=[ 1496], 99.50th=[ 1608], 99.90th=[ 1960], 99.95th=[ 2256],
     | 99.99th=[ 5408]
    bw (KB  /s): min=212149, max=390160, per=12.49%, avg=245746.04, stdev=11606.75

Throughput increased from 1822MB/s to 1921MB/s, average completion latencies
decreased from 1051us to 988us.

Roman Pen (3):
  linux-aio: consume events in userspace instead of calling io_getevents
  linux-aio: split processing events function
  linux-aio: process completions from ioq_submit()

 block/linux-aio.c | 175 ++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 137 insertions(+), 38 deletions(-)

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
-- 
2.8.2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 1/3] linux-aio: consume events in userspace instead of calling io_getevents
  2016-07-19 10:25 [Qemu-devel] [PATCH 0/3] linux-aio: reduce completion latency Roman Pen
@ 2016-07-19 10:25 ` Roman Pen
  2016-07-19 10:25 ` [Qemu-devel] [PATCH 2/3] linux-aio: split processing events function Roman Pen
  2016-07-19 10:25 ` [Qemu-devel] [PATCH 3/3] linux-aio: process completions from ioq_submit() Roman Pen
  2 siblings, 0 replies; 10+ messages in thread
From: Roman Pen @ 2016-07-19 10:25 UTC (permalink / raw)
  Cc: Roman Pen, Stefan Hajnoczi, Paolo Bonzini, qemu-devel

AIO context in userspace is represented as a simple ring buffer, which
can be consumed directly without entering the kernel, which obviously
can bring some performance gain.  QEMU does not use timeout value for
waiting for events completions, so we can consume all events from
userspace.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
---
 block/linux-aio.c | 125 ++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 99 insertions(+), 26 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 78f4524..3e29f1d 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -57,7 +57,6 @@ struct LinuxAioState {
 
     /* I/O completion processing */
     QEMUBH *completion_bh;
-    struct io_event events[MAX_EVENTS];
     int event_idx;
     int event_max;
 };
@@ -100,6 +99,85 @@ static void qemu_laio_process_completion(struct qemu_laiocb *laiocb)
     }
 }
 
+/**
+ * aio_ring buffer which is shared between userspace and kernel.
+ *
+ * This copied from linux/fs/aio.c, common header does not exist
+ * but AIO exists for ages so we assume ABI is stable.
+ */
+struct aio_ring {
+    unsigned    id;    /* kernel internal index number */
+    unsigned    nr;    /* number of io_events */
+    unsigned    head;  /* Written to by userland or by kernel. */
+    unsigned    tail;
+
+    unsigned    magic;
+    unsigned    compat_features;
+    unsigned    incompat_features;
+    unsigned    header_length;  /* size of aio_ring */
+
+    struct io_event io_events[0];
+};
+
+/**
+ * io_getevents_peek:
+ * @ctx: AIO context
+ * @events: pointer on events array, output value
+
+ * Returns the number of completed events and sets a pointer
+ * on events array.  This function does not update the internal
+ * ring buffer, only reads head and tail.  When @events has been
+ * processed io_getevents_commit() must be called.
+ */
+static inline unsigned int io_getevents_peek(io_context_t ctx,
+                                             struct io_event **events)
+{
+    struct aio_ring *ring = (struct aio_ring *)ctx;
+    unsigned int head = ring->head, tail = ring->tail;
+    unsigned int nr;
+
+    nr = tail >= head ? tail - head : ring->nr - head;
+    *events = ring->io_events + head;
+    /* To avoid speculative loads of s->events[i] before observing tail.
+       Paired with smp_wmb() inside linux/fs/aio.c: aio_complete(). */
+    smp_rmb();
+
+    return nr;
+}
+
+/**
+ * io_getevents_commit:
+ * @ctx: AIO context
+ * @nr: the number of events on which head should be advanced
+ *
+ * Advances head of a ring buffer.
+ */
+static inline void io_getevents_commit(io_context_t ctx, unsigned int nr)
+{
+    struct aio_ring *ring = (struct aio_ring *)ctx;
+
+    if (nr) {
+        ring->head = (ring->head + nr) % ring->nr;
+    }
+}
+
+/**
+ * io_getevents_advance_and_peek:
+ * @ctx: AIO context
+ * @events: pointer on events array, output value
+ * @nr: the number of events on which head should be advanced
+ *
+ * Advances head of a ring buffer and returns number of elements left.
+ */
+static inline unsigned int
+io_getevents_advance_and_peek(io_context_t ctx,
+                              struct io_event **events,
+                              unsigned int nr)
+{
+    io_getevents_commit(ctx, nr);
+    return io_getevents_peek(ctx, events);
+}
+
 /* The completion BH fetches completed I/O requests and invokes their
  * callbacks.
  *
@@ -114,43 +192,38 @@ static void qemu_laio_process_completion(struct qemu_laiocb *laiocb)
 static void qemu_laio_completion_bh(void *opaque)
 {
     LinuxAioState *s = opaque;
-
-    /* Fetch more completion events when empty */
-    if (s->event_idx == s->event_max) {
-        do {
-            struct timespec ts = { 0 };
-            s->event_max = io_getevents(s->ctx, MAX_EVENTS, MAX_EVENTS,
-                                        s->events, &ts);
-        } while (s->event_max == -EINTR);
-
-        s->event_idx = 0;
-        if (s->event_max <= 0) {
-            s->event_max = 0;
-            return; /* no more events */
-        }
-        s->io_q.in_flight -= s->event_max;
-    }
+    struct io_event *events;
 
     /* Reschedule so nested event loops see currently pending completions */
     qemu_bh_schedule(s->completion_bh);
 
-    /* Process completion events */
-    while (s->event_idx < s->event_max) {
-        struct iocb *iocb = s->events[s->event_idx].obj;
-        struct qemu_laiocb *laiocb =
+    while ((s->event_max = io_getevents_advance_and_peek(s->ctx, &events,
+                                                         s->event_idx))) {
+        for (s->event_idx = 0; s->event_idx < s->event_max; ) {
+            struct iocb *iocb = events[s->event_idx].obj;
+            struct qemu_laiocb *laiocb =
                 container_of(iocb, struct qemu_laiocb, iocb);
 
-        laiocb->ret = io_event_ret(&s->events[s->event_idx]);
-        s->event_idx++;
+            laiocb->ret = io_event_ret(&events[s->event_idx]);
 
-        qemu_laio_process_completion(laiocb);
+            /* Change counters one-by-one because we can be nested. */
+            s->io_q.in_flight--;
+            s->event_idx++;
+            qemu_laio_process_completion(laiocb);
+        }
     }
 
+    qemu_bh_cancel(s->completion_bh);
+
+    /* If we are nested we have to notify the level above that we are done
+     * by setting event_max to zero, upper level will then jump out of it's
+     * own `for` loop.  If we are the last all counters droped to zero. */
+    s->event_max = 0;
+    s->event_idx = 0;
+
     if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
         ioq_submit(s);
     }
-
-    qemu_bh_cancel(s->completion_bh);
 }
 
 static void qemu_laio_completion_cb(EventNotifier *e)
-- 
2.8.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 2/3] linux-aio: split processing events function
  2016-07-19 10:25 [Qemu-devel] [PATCH 0/3] linux-aio: reduce completion latency Roman Pen
  2016-07-19 10:25 ` [Qemu-devel] [PATCH 1/3] linux-aio: consume events in userspace instead of calling io_getevents Roman Pen
@ 2016-07-19 10:25 ` Roman Pen
  2016-07-19 10:25 ` [Qemu-devel] [PATCH 3/3] linux-aio: process completions from ioq_submit() Roman Pen
  2 siblings, 0 replies; 10+ messages in thread
From: Roman Pen @ 2016-07-19 10:25 UTC (permalink / raw)
  Cc: Roman Pen, Stefan Hajnoczi, Paolo Bonzini, qemu-devel

Prepare processing events function to be called from ioq_submit(),
thus split function on two parts: the first harvests completed IO
requests, the second submits pending requests.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
---
 block/linux-aio.c | 31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 3e29f1d..fae5c82 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -178,20 +178,20 @@ io_getevents_advance_and_peek(io_context_t ctx,
     return io_getevents_peek(ctx, events);
 }
 
-/* The completion BH fetches completed I/O requests and invokes their
- * callbacks.
+/**
+ * qemu_laio_process_completions:
+ * @s: AIO state
+ *
+ * Fetches completed I/O requests and invokes their callbacks.
  *
  * The function is somewhat tricky because it supports nested event loops, for
  * example when a request callback invokes aio_poll().  In order to do this,
- * the completion events array and index are kept in LinuxAioState.  The BH
- * reschedules itself as long as there are completions pending so it will
- * either be called again in a nested event loop or will be called after all
- * events have been completed.  When there are no events left to complete, the
- * BH returns without rescheduling.
+ * indices are kept in LinuxAioState.  Function schedules BH completion so it
+ * can be called again in a nested event loop.  When there are no events left
+ * to complete the BH is being canceled.
  */
-static void qemu_laio_completion_bh(void *opaque)
+static void qemu_laio_process_completions(LinuxAioState *s)
 {
-    LinuxAioState *s = opaque;
     struct io_event *events;
 
     /* Reschedule so nested event loops see currently pending completions */
@@ -220,18 +220,29 @@ static void qemu_laio_completion_bh(void *opaque)
      * own `for` loop.  If we are the last all counters droped to zero. */
     s->event_max = 0;
     s->event_idx = 0;
+}
 
+static void qemu_laio_process_completions_and_submit(LinuxAioState *s)
+{
+    qemu_laio_process_completions(s);
     if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
         ioq_submit(s);
     }
 }
 
+static void qemu_laio_completion_bh(void *opaque)
+{
+    LinuxAioState *s = opaque;
+
+    qemu_laio_process_completions_and_submit(s);
+}
+
 static void qemu_laio_completion_cb(EventNotifier *e)
 {
     LinuxAioState *s = container_of(e, LinuxAioState, e);
 
     if (event_notifier_test_and_clear(&s->e)) {
-        qemu_laio_completion_bh(s);
+        qemu_laio_process_completions_and_submit(s);
     }
 }
 
-- 
2.8.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 3/3] linux-aio: process completions from ioq_submit()
  2016-07-19 10:25 [Qemu-devel] [PATCH 0/3] linux-aio: reduce completion latency Roman Pen
  2016-07-19 10:25 ` [Qemu-devel] [PATCH 1/3] linux-aio: consume events in userspace instead of calling io_getevents Roman Pen
  2016-07-19 10:25 ` [Qemu-devel] [PATCH 2/3] linux-aio: split processing events function Roman Pen
@ 2016-07-19 10:25 ` Roman Pen
  2016-07-19 10:36   ` Paolo Bonzini
  2 siblings, 1 reply; 10+ messages in thread
From: Roman Pen @ 2016-07-19 10:25 UTC (permalink / raw)
  Cc: Roman Pen, Stefan Hajnoczi, Paolo Bonzini, qemu-devel

In order to reduce completion latency it makes sense to harvest completed
requests ASAP.  Very fast backend device can complete requests just after
submission, so it is worth trying to check ring buffer in order to peek
completed requests directly after io_submit() has been called.

Indeed, this patch reduces the completions latencies and increases the
overall throughput, e.g. the following is the percentiles of number of
completed requests at once:

        1th 10th  20th  30th  40th  50th  60th  70th  80th  90th  99.99th
Before    2    4    42   112   128   128   128   128   128   128    128
 After    1    1     4    14    33    45    47    48    50    51    108

That means, that before the current patch is applied the ring buffer is
observed as full (128 requests were consumed at once) in 60% of calls.

After patch is applied the distribution of number of completed requests
is "smoother" and the queue (requests in-flight) is almost never full.

The fio read results are the following (write results are almost the
same and are not showed here):

  Before
  ------
job: (groupid=0, jobs=8): err= 0: pid=2227: Tue Jul 19 11:29:50 2016
  Description  : [Emulation of Storage Server Access Pattern]
  read : io=54681MB, bw=1822.7MB/s, iops=179779, runt= 30001msec
    slat (usec): min=172, max=16883, avg=338.35, stdev=109.66
    clat (usec): min=1, max=21977, avg=1051.45, stdev=299.29
     lat (usec): min=317, max=22521, avg=1389.83, stdev=300.73
    clat percentiles (usec):
     |  1.00th=[  346],  5.00th=[  596], 10.00th=[  708], 20.00th=[  852],
     | 30.00th=[  932], 40.00th=[  996], 50.00th=[ 1048], 60.00th=[ 1112],
     | 70.00th=[ 1176], 80.00th=[ 1256], 90.00th=[ 1384], 95.00th=[ 1496],
     | 99.00th=[ 1800], 99.50th=[ 1928], 99.90th=[ 2320], 99.95th=[ 2672],
     | 99.99th=[ 4704]
    bw (KB  /s): min=205229, max=553181, per=12.50%, avg=233278.26, stdev=18383.51

  After
  ------
job: (groupid=0, jobs=8): err= 0: pid=2220: Tue Jul 19 11:31:51 2016
  Description  : [Emulation of Storage Server Access Pattern]
  read : io=57637MB, bw=1921.2MB/s, iops=189529, runt= 30002msec
    slat (usec): min=169, max=20636, avg=329.61, stdev=124.18
    clat (usec): min=2, max=19592, avg=988.78, stdev=251.04
     lat (usec): min=381, max=21067, avg=1318.42, stdev=243.58
    clat percentiles (usec):
     |  1.00th=[  310],  5.00th=[  580], 10.00th=[  748], 20.00th=[  876],
     | 30.00th=[  908], 40.00th=[  948], 50.00th=[ 1012], 60.00th=[ 1064],
     | 70.00th=[ 1080], 80.00th=[ 1128], 90.00th=[ 1224], 95.00th=[ 1288],
     | 99.00th=[ 1496], 99.50th=[ 1608], 99.90th=[ 1960], 99.95th=[ 2256],
     | 99.99th=[ 5408]
    bw (KB  /s): min=212149, max=390160, per=12.49%, avg=245746.04, stdev=11606.75

Throughput increased from 1822MB/s to 1921MB/s, average completion latencies
decreased from 1051us to 988us.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
---
 block/linux-aio.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index fae5c82..a02e692 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -34,6 +34,7 @@ struct qemu_laiocb {
     LinuxAioState *ctx;
     struct iocb iocb;
     ssize_t ret;
+    bool self_completed;
     size_t nbytes;
     QEMUIOVector *qiov;
     bool is_read;
@@ -92,7 +93,11 @@ static void qemu_laio_process_completion(struct qemu_laiocb *laiocb)
 
     laiocb->ret = ret;
     if (laiocb->co) {
-        qemu_coroutine_enter(laiocb->co, NULL);
+        if (laiocb->co == qemu_coroutine_self()) {
+            laiocb->self_completed = true;
+        } else {
+            qemu_coroutine_enter(laiocb->co, NULL);
+        }
     } else {
         laiocb->common.cb(laiocb->common.opaque, ret);
         qemu_aio_unref(laiocb);
@@ -312,6 +317,12 @@ static void ioq_submit(LinuxAioState *s)
         QSIMPLEQ_SPLIT_AFTER(&s->io_q.pending, aiocb, next, &completed);
     } while (ret == len && !QSIMPLEQ_EMPTY(&s->io_q.pending));
     s->io_q.blocked = (s->io_q.in_queue > 0);
+
+    if (s->io_q.in_flight) {
+        /* We can try to complete something just right away if there are
+         * still requests in-flight. */
+        qemu_laio_process_completions(s);
+    }
 }
 
 void laio_io_plug(BlockDriverState *bs, LinuxAioState *s)
@@ -379,7 +390,9 @@ int coroutine_fn laio_co_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
         return ret;
     }
 
-    qemu_coroutine_yield();
+    if (!laiocb.self_completed) {
+        qemu_coroutine_yield();
+    }
     return laiocb.ret;
 }
 
-- 
2.8.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] linux-aio: process completions from ioq_submit()
  2016-07-19 10:25 ` [Qemu-devel] [PATCH 3/3] linux-aio: process completions from ioq_submit() Roman Pen
@ 2016-07-19 10:36   ` Paolo Bonzini
  2016-07-19 11:18     ` Roman Penyaev
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2016-07-19 10:36 UTC (permalink / raw)
  To: Roman Pen; +Cc: Stefan Hajnoczi, qemu-devel



On 19/07/2016 12:25, Roman Pen wrote:
>      if (laiocb->co) {
> -        qemu_coroutine_enter(laiocb->co, NULL);
> +        if (laiocb->co == qemu_coroutine_self()) {
> +            laiocb->self_completed = true;

No need for this new field.  You can just do nothing here and check
laiocb.ret == -EINPROGRESS here in laio_co_submit.

> +        } else {
> +            qemu_coroutine_enter(laiocb->co, NULL);
> +        }
>      } else {
>          laiocb->common.cb(laiocb->common.opaque, ret);
>          qemu_aio_unref(laiocb);
> @@ -312,6 +317,12 @@ static void ioq_submit(LinuxAioState *s)
>          QSIMPLEQ_SPLIT_AFTER(&s->io_q.pending, aiocb, next, &completed);
>      } while (ret == len && !QSIMPLEQ_EMPTY(&s->io_q.pending));
>      s->io_q.blocked = (s->io_q.in_queue > 0);
> +
> +    if (s->io_q.in_flight) {
> +        /* We can try to complete something just right away if there are
> +         * still requests in-flight. */
> +        qemu_laio_process_completions(s);
> +    }

Can this leave I/O stuck if in_queue > 0 && in_flight == 0 after the
return from qemu_laio_process_completions?  I think you need to goto the
beginning of the function to submit more I/O requests in that case.

In fact, perhaps it's useful to always do so if any I/Os were completed.
 Should qemu_laio_process_completions return the number of completed
requests?

Paolo

>  }
>  
>  void laio_io_plug(BlockDriverState *bs, LinuxAioState *s)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] linux-aio: process completions from ioq_submit()
  2016-07-19 10:36   ` Paolo Bonzini
@ 2016-07-19 11:18     ` Roman Penyaev
  2016-07-19 11:30       ` Paolo Bonzini
  2016-07-19 11:44       ` Roman Penyaev
  0 siblings, 2 replies; 10+ messages in thread
From: Roman Penyaev @ 2016-07-19 11:18 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Stefan Hajnoczi, qemu-devel

On Tue, Jul 19, 2016 at 12:36 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 19/07/2016 12:25, Roman Pen wrote:
>>      if (laiocb->co) {
>> -        qemu_coroutine_enter(laiocb->co, NULL);
>> +        if (laiocb->co == qemu_coroutine_self()) {
>> +            laiocb->self_completed = true;
>
> No need for this new field.  You can just do nothing here and check
> laiocb.ret == -EINPROGRESS here in laio_co_submit.

I have thought but did not like it, because we depend on the value,
which kernel writes there.  If kernel by chance writes -EINPROGRESS
(whatever that means, bug in some ll driver?) we are screwed up.
But probably that is my paranoia.

>
>> +        } else {
>> +            qemu_coroutine_enter(laiocb->co, NULL);
>> +        }
>>      } else {
>>          laiocb->common.cb(laiocb->common.opaque, ret);
>>          qemu_aio_unref(laiocb);
>> @@ -312,6 +317,12 @@ static void ioq_submit(LinuxAioState *s)
>>          QSIMPLEQ_SPLIT_AFTER(&s->io_q.pending, aiocb, next, &completed);
>>      } while (ret == len && !QSIMPLEQ_EMPTY(&s->io_q.pending));
>>      s->io_q.blocked = (s->io_q.in_queue > 0);
>> +
>> +    if (s->io_q.in_flight) {
>> +        /* We can try to complete something just right away if there are
>> +         * still requests in-flight. */
>> +        qemu_laio_process_completions(s);
>> +    }
>
> Can this leave I/O stuck if in_queue > 0 && in_flight == 0 after the
> return from qemu_laio_process_completions?  I think you need to goto the
> beginning of the function to submit more I/O requests in that case.

Indeed that can happen.  Will resend.

>
> In fact, perhaps it's useful to always do so if any I/Os were completed.
>  Should qemu_laio_process_completions return the number of completed
> requests?

I would always check the in_flight counter, since we are interested in what
is really left.  Also, returning the value of completed requests by _this_
qemu_laio_process_completions() call does not mean anything, since we can
be nested and bunch of completions can happen below us.  We can realy on true
of false.  Not exact value.

Also, I hope (I do not know how to reproduce this, virtio_blk does not nest),
that we are allowed to nest (calling aio_poll() and all this machinery) from
co-routine.  Are we?  We can lead to deep nesting inside the following sequence:
ioq_submit() -> complete() -> aio_poll() -> ioq_submit() -> complete() ...
but that can happen of course even without this patch.  I would say this nesting
is a clumsy stuff.

Locally I have another series, which allow completions from ioq_submit() only
if upper block devices does not nest (like virtio_blk), but I decided to send
the current changes and not to overcomplicate things.

--
Roman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] linux-aio: process completions from ioq_submit()
  2016-07-19 11:18     ` Roman Penyaev
@ 2016-07-19 11:30       ` Paolo Bonzini
  2016-07-19 11:44       ` Roman Penyaev
  1 sibling, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2016-07-19 11:30 UTC (permalink / raw)
  To: Roman Penyaev; +Cc: Stefan Hajnoczi, qemu-devel



On 19/07/2016 13:18, Roman Penyaev wrote:
>> > No need for this new field.  You can just do nothing here and check
>> > laiocb.ret == -EINPROGRESS here in laio_co_submit.
> 
> I have thought but did not like it, because we depend on the value,
> which kernel writes there.

(The kernel actually writes to ev->res).

> If kernel by chance writes -EINPROGRESS
> (whatever that means, bug in some ll driver?) we are screwed up.
> But probably that is my paranoia.

Understood.  However, QEMU relies elsewhere on EINPROGRESS not being
returned for file I/O, so I think it's safe.

> Also, I hope (I do not know how to reproduce this, virtio_blk does not nest),
> that we are allowed to nest (calling aio_poll() and all this machinery) from
> co-routine.

Hmm, good question.  The nesting scenario originally happened exactly
from a coroutine (commit_run), but I suspect that it cannot happen
anymore since we've introduced block_job_defer_to_main_loop and
bdrv_co_drain.  In any case your patch wouldn't change that.

Paolo

> Are we?  We can lead to deep nesting inside the following sequence:
> ioq_submit() -> complete() -> aio_poll() -> ioq_submit() -> complete() ...
> but that can happen of course even without this patch.  I would say this nesting
> is a clumsy stuff.

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] linux-aio: process completions from ioq_submit()
  2016-07-19 11:18     ` Roman Penyaev
  2016-07-19 11:30       ` Paolo Bonzini
@ 2016-07-19 11:44       ` Roman Penyaev
  2016-07-19 11:47         ` Paolo Bonzini
  1 sibling, 1 reply; 10+ messages in thread
From: Roman Penyaev @ 2016-07-19 11:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Stefan Hajnoczi, qemu-devel

On Tue, Jul 19, 2016 at 1:18 PM, Roman Penyaev
<roman.penyaev@profitbricks.com> wrote:
> On Tue, Jul 19, 2016 at 12:36 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>>
>> On 19/07/2016 12:25, Roman Pen wrote:
>>>      if (laiocb->co) {
>>> -        qemu_coroutine_enter(laiocb->co, NULL);
>>> +        if (laiocb->co == qemu_coroutine_self()) {
>>> +            laiocb->self_completed = true;
>>
>> No need for this new field.  You can just do nothing here and check
>> laiocb.ret == -EINPROGRESS here in laio_co_submit.
>
> I have thought but did not like it, because we depend on the value,
> which kernel writes there.  If kernel by chance writes -EINPROGRESS
> (whatever that means, bug in some ll driver?) we are screwed up.
> But probably that is my paranoia.
>
>>
>>> +        } else {
>>> +            qemu_coroutine_enter(laiocb->co, NULL);
>>> +        }
>>>      } else {
>>>          laiocb->common.cb(laiocb->common.opaque, ret);
>>>          qemu_aio_unref(laiocb);
>>> @@ -312,6 +317,12 @@ static void ioq_submit(LinuxAioState *s)
>>>          QSIMPLEQ_SPLIT_AFTER(&s->io_q.pending, aiocb, next, &completed);
>>>      } while (ret == len && !QSIMPLEQ_EMPTY(&s->io_q.pending));
>>>      s->io_q.blocked = (s->io_q.in_queue > 0);
>>> +
>>> +    if (s->io_q.in_flight) {
>>> +        /* We can try to complete something just right away if there are
>>> +         * still requests in-flight. */
>>> +        qemu_laio_process_completions(s);
>>> +    }
>>
>> Can this leave I/O stuck if in_queue > 0 && in_flight == 0 after the
>> return from qemu_laio_process_completions?  I think you need to goto the
>> beginning of the function to submit more I/O requests in that case.

Not quite.  I still leave '&s->e' as set, so we will return in a generic
qemu_laio_completion_cb(), will do 'event_notifier_test_and_clear(&s->e)'
and again will process events with normal submission.

IMHO this is better variant than spinning inside ioq_submit doing goto.
We do not occupy the whole events processing for our IO needs and give
a chance to complete other stuff.  But of course this is guts feeling
and I do not have any serious numbers.

--
Roman

>
> Indeed that can happen.  Will resend.
>
>>
>> In fact, perhaps it's useful to always do so if any I/Os were completed.
>>  Should qemu_laio_process_completions return the number of completed
>> requests?
>
> I would always check the in_flight counter, since we are interested in what
> is really left.  Also, returning the value of completed requests by _this_
> qemu_laio_process_completions() call does not mean anything, since we can
> be nested and bunch of completions can happen below us.  We can realy on true
> of false.  Not exact value.
>
> Also, I hope (I do not know how to reproduce this, virtio_blk does not nest),
> that we are allowed to nest (calling aio_poll() and all this machinery) from
> co-routine.  Are we?  We can lead to deep nesting inside the following sequence:
> ioq_submit() -> complete() -> aio_poll() -> ioq_submit() -> complete() ...
> but that can happen of course even without this patch.  I would say this nesting
> is a clumsy stuff.
>
> Locally I have another series, which allow completions from ioq_submit() only
> if upper block devices does not nest (like virtio_blk), but I decided to send
> the current changes and not to overcomplicate things.
>
> --
> Roman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] linux-aio: process completions from ioq_submit()
  2016-07-19 11:44       ` Roman Penyaev
@ 2016-07-19 11:47         ` Paolo Bonzini
  2016-07-19 11:52           ` Roman Penyaev
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2016-07-19 11:47 UTC (permalink / raw)
  To: Roman Penyaev; +Cc: Stefan Hajnoczi, qemu-devel



On 19/07/2016 13:44, Roman Penyaev wrote:
>>> >>
>>> >> Can this leave I/O stuck if in_queue > 0 && in_flight == 0 after the
>>> >> return from qemu_laio_process_completions?  I think you need to goto the
>>> >> beginning of the function to submit more I/O requests in that case.
> Not quite.  I still leave '&s->e' as set, so we will return in a generic
> qemu_laio_completion_cb(), will do 'event_notifier_test_and_clear(&s->e)'
> and again will process events with normal submission.
> 
> IMHO this is better variant than spinning inside ioq_submit doing goto.
> We do not occupy the whole events processing for our IO needs and give
> a chance to complete other stuff.  But of course this is guts feeling
> and I do not have any serious numbers.

That's fine as long as it doesn't hang. :)

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] linux-aio: process completions from ioq_submit()
  2016-07-19 11:47         ` Paolo Bonzini
@ 2016-07-19 11:52           ` Roman Penyaev
  0 siblings, 0 replies; 10+ messages in thread
From: Roman Penyaev @ 2016-07-19 11:52 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Stefan Hajnoczi, qemu-devel

On Tue, Jul 19, 2016 at 1:47 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 19/07/2016 13:44, Roman Penyaev wrote:
>>>> >>
>>>> >> Can this leave I/O stuck if in_queue > 0 && in_flight == 0 after the
>>>> >> return from qemu_laio_process_completions?  I think you need to goto the
>>>> >> beginning of the function to submit more I/O requests in that case.
>> Not quite.  I still leave '&s->e' as set, so we will return in a generic
>> qemu_laio_completion_cb(), will do 'event_notifier_test_and_clear(&s->e)'
>> and again will process events with normal submission.
>>
>> IMHO this is better variant than spinning inside ioq_submit doing goto.
>> We do not occupy the whole events processing for our IO needs and give
>> a chance to complete other stuff.  But of course this is guts feeling
>> and I do not have any serious numbers.
>
> That's fine as long as it doesn't hang. :)

Yeah, but of course would be nice to have it also as fast as possible.
I would say "not to hang as fast as possible" :)

I will write the big comment there why we leave everything as-is
and do not touch the eventfd.  Seems should be fine.

Will send shortly.

--
Roman

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-07-19 11:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-19 10:25 [Qemu-devel] [PATCH 0/3] linux-aio: reduce completion latency Roman Pen
2016-07-19 10:25 ` [Qemu-devel] [PATCH 1/3] linux-aio: consume events in userspace instead of calling io_getevents Roman Pen
2016-07-19 10:25 ` [Qemu-devel] [PATCH 2/3] linux-aio: split processing events function Roman Pen
2016-07-19 10:25 ` [Qemu-devel] [PATCH 3/3] linux-aio: process completions from ioq_submit() Roman Pen
2016-07-19 10:36   ` Paolo Bonzini
2016-07-19 11:18     ` Roman Penyaev
2016-07-19 11:30       ` Paolo Bonzini
2016-07-19 11:44       ` Roman Penyaev
2016-07-19 11:47         ` Paolo Bonzini
2016-07-19 11:52           ` Roman Penyaev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.