All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-2.11 0/4] Fix qemu-iotests failures
@ 2017-11-28 15:43 Kevin Wolf
  2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 1/4] Revert "coroutine: abort if we try to schedule or enter a pending coroutine" Kevin Wolf
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Kevin Wolf @ 2017-11-28 15:43 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, stefanha, jcody, pbonzini, famz, qemu-devel

This reverts the recent fix for the case of qemu-iotests 200 (block job
completion races when using a separate I/O thread) that only fixed a
symptom and caused many regressions, and replaces it with a new
solution that hopefully attacks the right root cause.

The series also contains a bonus fix for a bug that was uncovered by the
different timing with the patches that are reverted here.

Kevin Wolf (4):
  Revert "coroutine: abort if we try to schedule or enter a pending
    coroutine"
  Revert "blockjob: do not allow coroutine double entry or
    entry-after-completion"
  coroutine: Cancel aio_co_schedule() on direct entry
  block: Expect graph changes in bdrv_parent_drained_begin/end

 include/block/blockjob_int.h |  3 +--
 include/qemu/coroutine_int.h | 14 ++++----------
 block/io.c                   |  8 ++++----
 blockjob.c                   |  7 ++-----
 util/async.c                 | 26 +++++++++++++-------------
 util/qemu-coroutine-sleep.c  | 12 ------------
 util/qemu-coroutine.c        | 31 +++++++++++++++++--------------
 7 files changed, 41 insertions(+), 60 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 1/4] Revert "coroutine: abort if we try to schedule or enter a pending coroutine"
  2017-11-28 15:43 [Qemu-devel] [PATCH for-2.11 0/4] Fix qemu-iotests failures Kevin Wolf
@ 2017-11-28 15:43 ` Kevin Wolf
  2017-11-28 16:00   ` Jeff Cody
  2017-11-28 16:18   ` Paolo Bonzini
  2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 2/4] Revert "blockjob: do not allow coroutine double entry or entry-after-completion" Kevin Wolf
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Kevin Wolf @ 2017-11-28 15:43 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, stefanha, jcody, pbonzini, famz, qemu-devel

This reverts commit 6133b39f3c36623425a6ede9e89d93175fde15cd.

The commit checked conditions that would expose a bug, but there is no
real reason to forbid them apart from the bug, which we'll fix in a
minute.

In particular, reentering a coroutine during co_aio_sleep_ns() is fine;
the function is explicitly written to allow this.

aio_co_schedule() can indeed conflict with direct coroutine invocations,
but this is exactky what we want to fix, so remove that check again,
too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/qemu/coroutine_int.h | 13 +++----------
 util/async.c                 | 13 -------------
 util/qemu-coroutine-sleep.c  | 12 ------------
 util/qemu-coroutine.c        | 14 --------------
 4 files changed, 3 insertions(+), 49 deletions(-)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index 59e8406398..cb98892bba 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -46,21 +46,14 @@ struct Coroutine {
 
     size_t locks_held;
 
-    /* Only used when the coroutine has yielded.  */
-    AioContext *ctx;
-
-    /* Used to catch and abort on illegal co-routine entry.
-     * Will contain the name of the function that had first
-     * scheduled the coroutine. */
-    const char *scheduled;
-
-    QSIMPLEQ_ENTRY(Coroutine) co_queue_next;
-
     /* Coroutines that should be woken up when we yield or terminate.
      * Only used when the coroutine is running.
      */
     QSIMPLEQ_HEAD(, Coroutine) co_queue_wakeup;
 
+    /* Only used when the coroutine has yielded.  */
+    AioContext *ctx;
+    QSIMPLEQ_ENTRY(Coroutine) co_queue_next;
     QSLIST_ENTRY(Coroutine) co_scheduled_next;
 };
 
diff --git a/util/async.c b/util/async.c
index 4dd9d95a9e..0e1bd8780a 100644
--- a/util/async.c
+++ b/util/async.c
@@ -388,9 +388,6 @@ static void co_schedule_bh_cb(void *opaque)
         QSLIST_REMOVE_HEAD(&straight, co_scheduled_next);
         trace_aio_co_schedule_bh_cb(ctx, co);
         aio_context_acquire(ctx);
-
-        /* Protected by write barrier in qemu_aio_coroutine_enter */
-        atomic_set(&co->scheduled, NULL);
         qemu_coroutine_enter(co);
         aio_context_release(ctx);
     }
@@ -441,16 +438,6 @@ fail:
 void aio_co_schedule(AioContext *ctx, Coroutine *co)
 {
     trace_aio_co_schedule(ctx, co);
-    const char *scheduled = atomic_cmpxchg(&co->scheduled, NULL,
-                                           __func__);
-
-    if (scheduled) {
-        fprintf(stderr,
-                "%s: Co-routine was already scheduled in '%s'\n",
-                __func__, scheduled);
-        abort();
-    }
-
     QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
                               co, co_scheduled_next);
     qemu_bh_schedule(ctx->co_schedule_bh);
diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c
index 254349cdbb..9c5655041b 100644
--- a/util/qemu-coroutine-sleep.c
+++ b/util/qemu-coroutine-sleep.c
@@ -13,7 +13,6 @@
 
 #include "qemu/osdep.h"
 #include "qemu/coroutine.h"
-#include "qemu/coroutine_int.h"
 #include "qemu/timer.h"
 #include "block/aio.h"
 
@@ -26,8 +25,6 @@ static void co_sleep_cb(void *opaque)
 {
     CoSleepCB *sleep_cb = opaque;
 
-    /* Write of schedule protected by barrier write in aio_co_schedule */
-    atomic_set(&sleep_cb->co->scheduled, NULL);
     aio_co_wake(sleep_cb->co);
 }
 
@@ -37,15 +34,6 @@ void coroutine_fn co_aio_sleep_ns(AioContext *ctx, QEMUClockType type,
     CoSleepCB sleep_cb = {
         .co = qemu_coroutine_self(),
     };
-
-    const char *scheduled = atomic_cmpxchg(&sleep_cb.co->scheduled, NULL,
-                                           __func__);
-    if (scheduled) {
-        fprintf(stderr,
-                "%s: Co-routine was already scheduled in '%s'\n",
-                __func__, scheduled);
-        abort();
-    }
     sleep_cb.ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, &sleep_cb);
     timer_mod(sleep_cb.ts, qemu_clock_get_ns(type) + ns);
     qemu_coroutine_yield();
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index 9eff7fd450..d6095c1d5a 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -107,22 +107,8 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co)
     Coroutine *self = qemu_coroutine_self();
     CoroutineAction ret;
 
-    /* Cannot rely on the read barrier for co in aio_co_wake(), as there are
-     * callers outside of aio_co_wake() */
-    const char *scheduled = atomic_mb_read(&co->scheduled);
-
     trace_qemu_aio_coroutine_enter(ctx, self, co, co->entry_arg);
 
-    /* if the Coroutine has already been scheduled, entering it again will
-     * cause us to enter it twice, potentially even after the coroutine has
-     * been deleted */
-    if (scheduled) {
-        fprintf(stderr,
-                "%s: Co-routine was already scheduled in '%s'\n",
-                __func__, scheduled);
-        abort();
-    }
-
     if (co->caller) {
         fprintf(stderr, "Co-routine re-entered recursively\n");
         abort();
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 2/4] Revert "blockjob: do not allow coroutine double entry or entry-after-completion"
  2017-11-28 15:43 [Qemu-devel] [PATCH for-2.11 0/4] Fix qemu-iotests failures Kevin Wolf
  2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 1/4] Revert "coroutine: abort if we try to schedule or enter a pending coroutine" Kevin Wolf
@ 2017-11-28 15:43 ` Kevin Wolf
  2017-11-28 16:00   ` Jeff Cody
  2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry Kevin Wolf
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Kevin Wolf @ 2017-11-28 15:43 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, stefanha, jcody, pbonzini, famz, qemu-devel

This reverts commit 4afeffc8572f40d8844b946a30c00b10da4442b1.

This fixed the symptom of a bug rather than the root cause. Waking up a
sleeping coroutine is generally fine, we just need to make it work
correctly across AioContexts, which we'll do in a minute.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/blockjob_int.h | 3 +--
 blockjob.c                   | 7 ++-----
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 43f3be2965..f13ad05c0d 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -143,8 +143,7 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
  * @ns: How many nanoseconds to stop for.
  *
  * Put the job to sleep (assuming that it wasn't canceled) for @ns
- * nanoseconds.  Canceling the job will not interrupt the wait, so the
- * cancel will not process until the coroutine wakes up.
+ * nanoseconds.  Canceling the job will interrupt the wait immediately.
  */
 void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
 
diff --git a/blockjob.c b/blockjob.c
index ff9a614531..3a0c49137e 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -797,14 +797,11 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
         return;
     }
 
-    /* We need to leave job->busy set here, because when we have
-     * put a coroutine to 'sleep', we have scheduled it to run in
-     * the future.  We cannot enter that same coroutine again before
-     * it wakes and runs, otherwise we risk double-entry or entry after
-     * completion. */
+    job->busy = false;
     if (!block_job_should_pause(job)) {
         co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
     }
+    job->busy = true;
 
     block_job_pause_point(job);
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry
  2017-11-28 15:43 [Qemu-devel] [PATCH for-2.11 0/4] Fix qemu-iotests failures Kevin Wolf
  2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 1/4] Revert "coroutine: abort if we try to schedule or enter a pending coroutine" Kevin Wolf
  2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 2/4] Revert "blockjob: do not allow coroutine double entry or entry-after-completion" Kevin Wolf
@ 2017-11-28 15:43 ` Kevin Wolf
  2017-11-28 16:09   ` Jeff Cody
                     ` (3 more replies)
  2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 4/4] block: Expect graph changes in bdrv_parent_drained_begin/end Kevin Wolf
                   ` (2 subsequent siblings)
  5 siblings, 4 replies; 27+ messages in thread
From: Kevin Wolf @ 2017-11-28 15:43 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, stefanha, jcody, pbonzini, famz, qemu-devel

If a coroutine can be reentered from multiple possible sources, we need
to be careful in the case that two sources try to reenter it at the same
time.

There are two different cases where this can happen:

1. A coroutine spawns multiple asynchronous jobs and waits for all of
   them to complete. In this case, multiple reentries are expected and
   the coroutine is probably looping around a yield, so entering it
   twice is generally fine (but entering it just once after all jobs
   have completed would be enough, too).

   Exception: When the loop condition becomes false and the first
   reenter already leaves the loop, we must not do a second reenter.

2. A coroutine yields and provides multiple alternative ways to be
   reentered. In this case, we must always only process the first
   reenter.

Direct entry is not a problem. It requires that the AioContext locks for
the coroutine context are held, which means that the coroutine has
enough time to set some state that simply prevents the second caller
from reentering the coroutine, too.

Things get more interesting with aio_co_schedule() because the coroutine
may be scheduled before it is (directly) entered from a second place.
Then changing some state inside the coroutine doesn't help because it is
already scheduled and the state won't be checked again.

For this case, we'll cancel any pending aio_co_schedule() when the
coroutine is actually entered. Reentering it once is enough for all
cases explained above, and a requirement for part of them.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/qemu/coroutine_int.h |  1 +
 util/async.c                 | 15 ++++++++++++++-
 util/qemu-coroutine.c        | 17 +++++++++++++++++
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index cb98892bba..73fc35e54b 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -40,6 +40,7 @@ struct Coroutine {
     CoroutineEntry *entry;
     void *entry_arg;
     Coroutine *caller;
+    AioContext *scheduled;
 
     /* Only used when the coroutine has terminated.  */
     QSLIST_ENTRY(Coroutine) pool_next;
diff --git a/util/async.c b/util/async.c
index 0e1bd8780a..dc249e9404 100644
--- a/util/async.c
+++ b/util/async.c
@@ -372,6 +372,7 @@ static bool event_notifier_poll(void *opaque)
 static void co_schedule_bh_cb(void *opaque)
 {
     AioContext *ctx = opaque;
+    AioContext *scheduled_ctx;
     QSLIST_HEAD(, Coroutine) straight, reversed;
 
     QSLIST_MOVE_ATOMIC(&reversed, &ctx->scheduled_coroutines);
@@ -388,7 +389,16 @@ static void co_schedule_bh_cb(void *opaque)
         QSLIST_REMOVE_HEAD(&straight, co_scheduled_next);
         trace_aio_co_schedule_bh_cb(ctx, co);
         aio_context_acquire(ctx);
-        qemu_coroutine_enter(co);
+        scheduled_ctx = atomic_mb_read(&co->scheduled);
+        if (scheduled_ctx == ctx) {
+            qemu_coroutine_enter(co);
+        } else {
+            /* This must be a cancelled aio_co_schedule() because the coroutine
+             * was already entered before this BH had a chance to run. If a
+             * coroutine is scheduled for two different AioContexts, something
+             * is very wrong. */
+            assert(scheduled_ctx == NULL);
+        }
         aio_context_release(ctx);
     }
 }
@@ -438,6 +448,9 @@ fail:
 void aio_co_schedule(AioContext *ctx, Coroutine *co)
 {
     trace_aio_co_schedule(ctx, co);
+
+    /* Set co->scheduled before the coroutine becomes visible in the list */
+    atomic_mb_set(&co->scheduled, ctx);
     QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
                               co, co_scheduled_next);
     qemu_bh_schedule(ctx->co_schedule_bh);
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index d6095c1d5a..c515b3cb4a 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -122,6 +122,23 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co)
      */
     smp_wmb();
 
+    /* Make sure that a coroutine that can alternatively reentered from two
+     * different sources isn't reentered more than once when the first caller
+     * uses aio_co_schedule() and the other one enters to coroutine directly.
+     * This is achieved by cancelling the pending aio_co_schedule().
+     *
+     * The other way round, if aio_co_schedule() would be called after this
+     * point, this would be a problem, too, but in practice it doesn't happen
+     * because we're holding the AioContext lock here and aio_co_schedule()
+     * callers must do the same. This means that the coroutine just needs to
+     * prevent other callers from calling aio_co_schedule() before it yields
+     * (e.g. block job coroutines by setting job->busy = true).
+     *
+     * We still want to ensure that the second case doesn't happen, so reset
+     * co->scheduled only after setting co->caller to make the above check
+     * effective for the co_schedule_bh_cb() case. */
+    atomic_set(&co->scheduled, NULL);
+
     ret = qemu_coroutine_switch(self, co, COROUTINE_ENTER);
 
     qemu_co_queue_run_restart(co);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-2.11 4/4] block: Expect graph changes in bdrv_parent_drained_begin/end
  2017-11-28 15:43 [Qemu-devel] [PATCH for-2.11 0/4] Fix qemu-iotests failures Kevin Wolf
                   ` (2 preceding siblings ...)
  2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry Kevin Wolf
@ 2017-11-28 15:43 ` Kevin Wolf
  2017-11-28 16:10   ` Jeff Cody
  2017-11-28 15:56 ` [Qemu-devel] [PATCH for-2.11 0/4] Fix qemu-iotests failures Jeff Cody
  2017-11-28 16:00 ` Stefan Hajnoczi
  5 siblings, 1 reply; 27+ messages in thread
From: Kevin Wolf @ 2017-11-28 15:43 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, stefanha, jcody, pbonzini, famz, qemu-devel

The .drained_begin/end callbacks can (directly or indirectly via
aio_poll()) cause block nodes to be removed or the current BdrvChild to
point to a different child node.

Use QLIST_FOREACH_SAFE() to make sure we don't access invalid
BlockDriverStates or accidentally continue iterating the parents of the
new child node instead of the node we actually came from.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/io.c b/block/io.c
index 4fdf93a014..6773926fc1 100644
--- a/block/io.c
+++ b/block/io.c
@@ -42,9 +42,9 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
 
 void bdrv_parent_drained_begin(BlockDriverState *bs)
 {
-    BdrvChild *c;
+    BdrvChild *c, *next;
 
-    QLIST_FOREACH(c, &bs->parents, next_parent) {
+    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
         if (c->role->drained_begin) {
             c->role->drained_begin(c);
         }
@@ -53,9 +53,9 @@ void bdrv_parent_drained_begin(BlockDriverState *bs)
 
 void bdrv_parent_drained_end(BlockDriverState *bs)
 {
-    BdrvChild *c;
+    BdrvChild *c, *next;
 
-    QLIST_FOREACH(c, &bs->parents, next_parent) {
+    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
         if (c->role->drained_end) {
             c->role->drained_end(c);
         }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 0/4] Fix qemu-iotests failures
  2017-11-28 15:43 [Qemu-devel] [PATCH for-2.11 0/4] Fix qemu-iotests failures Kevin Wolf
                   ` (3 preceding siblings ...)
  2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 4/4] block: Expect graph changes in bdrv_parent_drained_begin/end Kevin Wolf
@ 2017-11-28 15:56 ` Jeff Cody
  2017-11-28 16:00 ` Stefan Hajnoczi
  5 siblings, 0 replies; 27+ messages in thread
From: Jeff Cody @ 2017-11-28 15:56 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, stefanha, pbonzini, famz, qemu-devel

On Tue, Nov 28, 2017 at 04:43:46PM +0100, Kevin Wolf wrote:
> This reverts the recent fix for the case of qemu-iotests 200 (block job
> completion races when using a separate I/O thread) that only fixed a
> symptom and caused many regressions, and replaces it with a new
> solution that hopefully attacks the right root cause.
> 
> The series also contains a bonus fix for a bug that was uncovered by the
> different timing with the patches that are reverted here.
> 
> Kevin Wolf (4):
>   Revert "coroutine: abort if we try to schedule or enter a pending
>     coroutine"
>   Revert "blockjob: do not allow coroutine double entry or
>     entry-after-completion"
>   coroutine: Cancel aio_co_schedule() on direct entry
>   block: Expect graph changes in bdrv_parent_drained_begin/end
> 
>  include/block/blockjob_int.h |  3 +--
>  include/qemu/coroutine_int.h | 14 ++++----------
>  block/io.c                   |  8 ++++----
>  blockjob.c                   |  7 ++-----
>  util/async.c                 | 26 +++++++++++++-------------
>  util/qemu-coroutine-sleep.c  | 12 ------------
>  util/qemu-coroutine.c        | 31 +++++++++++++++++--------------
>  7 files changed, 41 insertions(+), 60 deletions(-)
> 
> -- 
> 2.13.6
>

Tested-by: Jeff Cody <jcody@redhat.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 1/4] Revert "coroutine: abort if we try to schedule or enter a pending coroutine"
  2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 1/4] Revert "coroutine: abort if we try to schedule or enter a pending coroutine" Kevin Wolf
@ 2017-11-28 16:00   ` Jeff Cody
  2017-11-28 16:18   ` Paolo Bonzini
  1 sibling, 0 replies; 27+ messages in thread
From: Jeff Cody @ 2017-11-28 16:00 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, stefanha, pbonzini, famz, qemu-devel

On Tue, Nov 28, 2017 at 04:43:47PM +0100, Kevin Wolf wrote:
> This reverts commit 6133b39f3c36623425a6ede9e89d93175fde15cd.
> 
> The commit checked conditions that would expose a bug, but there is no
> real reason to forbid them apart from the bug, which we'll fix in a
> minute.
> 
> In particular, reentering a coroutine during co_aio_sleep_ns() is fine;
> the function is explicitly written to allow this.
> 
> aio_co_schedule() can indeed conflict with direct coroutine invocations,
> but this is exactky what we want to fix, so remove that check again,

s/exactky/exactly/


Reviewed-by: Jeff Cody <jcody@redhat.com>

> too.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/qemu/coroutine_int.h | 13 +++----------
>  util/async.c                 | 13 -------------
>  util/qemu-coroutine-sleep.c  | 12 ------------
>  util/qemu-coroutine.c        | 14 --------------
>  4 files changed, 3 insertions(+), 49 deletions(-)
> 
> diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
> index 59e8406398..cb98892bba 100644
> --- a/include/qemu/coroutine_int.h
> +++ b/include/qemu/coroutine_int.h
> @@ -46,21 +46,14 @@ struct Coroutine {
>  
>      size_t locks_held;
>  
> -    /* Only used when the coroutine has yielded.  */
> -    AioContext *ctx;
> -
> -    /* Used to catch and abort on illegal co-routine entry.
> -     * Will contain the name of the function that had first
> -     * scheduled the coroutine. */
> -    const char *scheduled;
> -
> -    QSIMPLEQ_ENTRY(Coroutine) co_queue_next;
> -
>      /* Coroutines that should be woken up when we yield or terminate.
>       * Only used when the coroutine is running.
>       */
>      QSIMPLEQ_HEAD(, Coroutine) co_queue_wakeup;
>  
> +    /* Only used when the coroutine has yielded.  */
> +    AioContext *ctx;
> +    QSIMPLEQ_ENTRY(Coroutine) co_queue_next;
>      QSLIST_ENTRY(Coroutine) co_scheduled_next;
>  };
>  
> diff --git a/util/async.c b/util/async.c
> index 4dd9d95a9e..0e1bd8780a 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -388,9 +388,6 @@ static void co_schedule_bh_cb(void *opaque)
>          QSLIST_REMOVE_HEAD(&straight, co_scheduled_next);
>          trace_aio_co_schedule_bh_cb(ctx, co);
>          aio_context_acquire(ctx);
> -
> -        /* Protected by write barrier in qemu_aio_coroutine_enter */
> -        atomic_set(&co->scheduled, NULL);
>          qemu_coroutine_enter(co);
>          aio_context_release(ctx);
>      }
> @@ -441,16 +438,6 @@ fail:
>  void aio_co_schedule(AioContext *ctx, Coroutine *co)
>  {
>      trace_aio_co_schedule(ctx, co);
> -    const char *scheduled = atomic_cmpxchg(&co->scheduled, NULL,
> -                                           __func__);
> -
> -    if (scheduled) {
> -        fprintf(stderr,
> -                "%s: Co-routine was already scheduled in '%s'\n",
> -                __func__, scheduled);
> -        abort();
> -    }
> -
>      QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
>                                co, co_scheduled_next);
>      qemu_bh_schedule(ctx->co_schedule_bh);
> diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c
> index 254349cdbb..9c5655041b 100644
> --- a/util/qemu-coroutine-sleep.c
> +++ b/util/qemu-coroutine-sleep.c
> @@ -13,7 +13,6 @@
>  
>  #include "qemu/osdep.h"
>  #include "qemu/coroutine.h"
> -#include "qemu/coroutine_int.h"
>  #include "qemu/timer.h"
>  #include "block/aio.h"
>  
> @@ -26,8 +25,6 @@ static void co_sleep_cb(void *opaque)
>  {
>      CoSleepCB *sleep_cb = opaque;
>  
> -    /* Write of schedule protected by barrier write in aio_co_schedule */
> -    atomic_set(&sleep_cb->co->scheduled, NULL);
>      aio_co_wake(sleep_cb->co);
>  }
>  
> @@ -37,15 +34,6 @@ void coroutine_fn co_aio_sleep_ns(AioContext *ctx, QEMUClockType type,
>      CoSleepCB sleep_cb = {
>          .co = qemu_coroutine_self(),
>      };
> -
> -    const char *scheduled = atomic_cmpxchg(&sleep_cb.co->scheduled, NULL,
> -                                           __func__);
> -    if (scheduled) {
> -        fprintf(stderr,
> -                "%s: Co-routine was already scheduled in '%s'\n",
> -                __func__, scheduled);
> -        abort();
> -    }
>      sleep_cb.ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, &sleep_cb);
>      timer_mod(sleep_cb.ts, qemu_clock_get_ns(type) + ns);
>      qemu_coroutine_yield();
> diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> index 9eff7fd450..d6095c1d5a 100644
> --- a/util/qemu-coroutine.c
> +++ b/util/qemu-coroutine.c
> @@ -107,22 +107,8 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co)
>      Coroutine *self = qemu_coroutine_self();
>      CoroutineAction ret;
>  
> -    /* Cannot rely on the read barrier for co in aio_co_wake(), as there are
> -     * callers outside of aio_co_wake() */
> -    const char *scheduled = atomic_mb_read(&co->scheduled);
> -
>      trace_qemu_aio_coroutine_enter(ctx, self, co, co->entry_arg);
>  
> -    /* if the Coroutine has already been scheduled, entering it again will
> -     * cause us to enter it twice, potentially even after the coroutine has
> -     * been deleted */
> -    if (scheduled) {
> -        fprintf(stderr,
> -                "%s: Co-routine was already scheduled in '%s'\n",
> -                __func__, scheduled);
> -        abort();
> -    }
> -
>      if (co->caller) {
>          fprintf(stderr, "Co-routine re-entered recursively\n");
>          abort();
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 0/4] Fix qemu-iotests failures
  2017-11-28 15:43 [Qemu-devel] [PATCH for-2.11 0/4] Fix qemu-iotests failures Kevin Wolf
                   ` (4 preceding siblings ...)
  2017-11-28 15:56 ` [Qemu-devel] [PATCH for-2.11 0/4] Fix qemu-iotests failures Jeff Cody
@ 2017-11-28 16:00 ` Stefan Hajnoczi
  5 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2017-11-28 16:00 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, jcody, pbonzini, famz, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1313 bytes --]

On Tue, Nov 28, 2017 at 04:43:46PM +0100, Kevin Wolf wrote:
> This reverts the recent fix for the case of qemu-iotests 200 (block job
> completion races when using a separate I/O thread) that only fixed a
> symptom and caused many regressions, and replaces it with a new
> solution that hopefully attacks the right root cause.
> 
> The series also contains a bonus fix for a bug that was uncovered by the
> different timing with the patches that are reverted here.
> 
> Kevin Wolf (4):
>   Revert "coroutine: abort if we try to schedule or enter a pending
>     coroutine"
>   Revert "blockjob: do not allow coroutine double entry or
>     entry-after-completion"
>   coroutine: Cancel aio_co_schedule() on direct entry
>   block: Expect graph changes in bdrv_parent_drained_begin/end
> 
>  include/block/blockjob_int.h |  3 +--
>  include/qemu/coroutine_int.h | 14 ++++----------
>  block/io.c                   |  8 ++++----
>  blockjob.c                   |  7 ++-----
>  util/async.c                 | 26 +++++++++++++-------------
>  util/qemu-coroutine-sleep.c  | 12 ------------
>  util/qemu-coroutine.c        | 31 +++++++++++++++++--------------
>  7 files changed, 41 insertions(+), 60 deletions(-)
> 
> -- 
> 2.13.6
> 

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 2/4] Revert "blockjob: do not allow coroutine double entry or entry-after-completion"
  2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 2/4] Revert "blockjob: do not allow coroutine double entry or entry-after-completion" Kevin Wolf
@ 2017-11-28 16:00   ` Jeff Cody
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff Cody @ 2017-11-28 16:00 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, stefanha, pbonzini, famz, qemu-devel

On Tue, Nov 28, 2017 at 04:43:48PM +0100, Kevin Wolf wrote:
> This reverts commit 4afeffc8572f40d8844b946a30c00b10da4442b1.
> 
> This fixed the symptom of a bug rather than the root cause. Waking up a
> sleeping coroutine is generally fine, we just need to make it work
> correctly across AioContexts, which we'll do in a minute.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Jeff Cody <jcody@redhat.com>

> ---
>  include/block/blockjob_int.h | 3 +--
>  blockjob.c                   | 7 ++-----
>  2 files changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
> index 43f3be2965..f13ad05c0d 100644
> --- a/include/block/blockjob_int.h
> +++ b/include/block/blockjob_int.h
> @@ -143,8 +143,7 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
>   * @ns: How many nanoseconds to stop for.
>   *
>   * Put the job to sleep (assuming that it wasn't canceled) for @ns
> - * nanoseconds.  Canceling the job will not interrupt the wait, so the
> - * cancel will not process until the coroutine wakes up.
> + * nanoseconds.  Canceling the job will interrupt the wait immediately.
>   */
>  void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
>  
> diff --git a/blockjob.c b/blockjob.c
> index ff9a614531..3a0c49137e 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -797,14 +797,11 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
>          return;
>      }
>  
> -    /* We need to leave job->busy set here, because when we have
> -     * put a coroutine to 'sleep', we have scheduled it to run in
> -     * the future.  We cannot enter that same coroutine again before
> -     * it wakes and runs, otherwise we risk double-entry or entry after
> -     * completion. */
> +    job->busy = false;
>      if (!block_job_should_pause(job)) {
>          co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
>      }
> +    job->busy = true;
>  
>      block_job_pause_point(job);
>  }
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry
  2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry Kevin Wolf
@ 2017-11-28 16:09   ` Jeff Cody
  2017-11-28 16:14   ` Paolo Bonzini
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: Jeff Cody @ 2017-11-28 16:09 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, stefanha, pbonzini, famz, qemu-devel

On Tue, Nov 28, 2017 at 04:43:49PM +0100, Kevin Wolf wrote:
> If a coroutine can be reentered from multiple possible sources, we need
> to be careful in the case that two sources try to reenter it at the same
> time.
> 
> There are two different cases where this can happen:
> 
> 1. A coroutine spawns multiple asynchronous jobs and waits for all of
>    them to complete. In this case, multiple reentries are expected and
>    the coroutine is probably looping around a yield, so entering it
>    twice is generally fine (but entering it just once after all jobs
>    have completed would be enough, too).
> 
>    Exception: When the loop condition becomes false and the first
>    reenter already leaves the loop, we must not do a second reenter.
> 
> 2. A coroutine yields and provides multiple alternative ways to be
>    reentered. In this case, we must always only process the first
>    reenter.
> 
> Direct entry is not a problem. It requires that the AioContext locks for
> the coroutine context are held, which means that the coroutine has
> enough time to set some state that simply prevents the second caller
> from reentering the coroutine, too.
> 
> Things get more interesting with aio_co_schedule() because the coroutine
> may be scheduled before it is (directly) entered from a second place.
> Then changing some state inside the coroutine doesn't help because it is
> already scheduled and the state won't be checked again.
> 
> For this case, we'll cancel any pending aio_co_schedule() when the
> coroutine is actually entered. Reentering it once is enough for all
> cases explained above, and a requirement for part of them.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/qemu/coroutine_int.h |  1 +
>  util/async.c                 | 15 ++++++++++++++-
>  util/qemu-coroutine.c        | 17 +++++++++++++++++
>  3 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
> index cb98892bba..73fc35e54b 100644
> --- a/include/qemu/coroutine_int.h
> +++ b/include/qemu/coroutine_int.h
> @@ -40,6 +40,7 @@ struct Coroutine {
>      CoroutineEntry *entry;
>      void *entry_arg;
>      Coroutine *caller;
> +    AioContext *scheduled;
>  
>      /* Only used when the coroutine has terminated.  */
>      QSLIST_ENTRY(Coroutine) pool_next;
> diff --git a/util/async.c b/util/async.c
> index 0e1bd8780a..dc249e9404 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -372,6 +372,7 @@ static bool event_notifier_poll(void *opaque)
>  static void co_schedule_bh_cb(void *opaque)
>  {
>      AioContext *ctx = opaque;
> +    AioContext *scheduled_ctx;
>      QSLIST_HEAD(, Coroutine) straight, reversed;
>  
>      QSLIST_MOVE_ATOMIC(&reversed, &ctx->scheduled_coroutines);
> @@ -388,7 +389,16 @@ static void co_schedule_bh_cb(void *opaque)
>          QSLIST_REMOVE_HEAD(&straight, co_scheduled_next);
>          trace_aio_co_schedule_bh_cb(ctx, co);
>          aio_context_acquire(ctx);
> -        qemu_coroutine_enter(co);
> +        scheduled_ctx = atomic_mb_read(&co->scheduled);
> +        if (scheduled_ctx == ctx) {
> +            qemu_coroutine_enter(co);
> +        } else {
> +            /* This must be a cancelled aio_co_schedule() because the coroutine
> +             * was already entered before this BH had a chance to run. If a
> +             * coroutine is scheduled for two different AioContexts, something
> +             * is very wrong. */
> +            assert(scheduled_ctx == NULL);
> +        }
>          aio_context_release(ctx);
>      }
>  }
> @@ -438,6 +448,9 @@ fail:
>  void aio_co_schedule(AioContext *ctx, Coroutine *co)
>  {
>      trace_aio_co_schedule(ctx, co);
> +

Do we want to assert(!co->scheduled || co->scheduled == ctx) here?


> +    /* Set co->scheduled before the coroutine becomes visible in the list */
> +    atomic_mb_set(&co->scheduled, ctx);


>      QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
>                                co, co_scheduled_next);
>      qemu_bh_schedule(ctx->co_schedule_bh);
> diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> index d6095c1d5a..c515b3cb4a 100644
> --- a/util/qemu-coroutine.c
> +++ b/util/qemu-coroutine.c
> @@ -122,6 +122,23 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co)
>       */
>      smp_wmb();
>  
> +    /* Make sure that a coroutine that can alternatively reentered from two
> +     * different sources isn't reentered more than once when the first caller
> +     * uses aio_co_schedule() and the other one enters to coroutine directly.
> +     * This is achieved by cancelling the pending aio_co_schedule().
> +     *
> +     * The other way round, if aio_co_schedule() would be called after this
> +     * point, this would be a problem, too, but in practice it doesn't happen
> +     * because we're holding the AioContext lock here and aio_co_schedule()
> +     * callers must do the same. This means that the coroutine just needs to
> +     * prevent other callers from calling aio_co_schedule() before it yields
> +     * (e.g. block job coroutines by setting job->busy = true).
> +     *
> +     * We still want to ensure that the second case doesn't happen, so reset
> +     * co->scheduled only after setting co->caller to make the above check
> +     * effective for the co_schedule_bh_cb() case. */

Nice comment

> +    atomic_set(&co->scheduled, NULL);
> +
>      ret = qemu_coroutine_switch(self, co, COROUTINE_ENTER);
>  
>      qemu_co_queue_run_restart(co);
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 4/4] block: Expect graph changes in bdrv_parent_drained_begin/end
  2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 4/4] block: Expect graph changes in bdrv_parent_drained_begin/end Kevin Wolf
@ 2017-11-28 16:10   ` Jeff Cody
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff Cody @ 2017-11-28 16:10 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, stefanha, pbonzini, famz, qemu-devel

On Tue, Nov 28, 2017 at 04:43:50PM +0100, Kevin Wolf wrote:
> The .drained_begin/end callbacks can (directly or indirectly via
> aio_poll()) cause block nodes to be removed or the current BdrvChild to
> point to a different child node.
> 
> Use QLIST_FOREACH_SAFE() to make sure we don't access invalid
> BlockDriverStates or accidentally continue iterating the parents of the
> new child node instead of the node we actually came from.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/io.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index 4fdf93a014..6773926fc1 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -42,9 +42,9 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>  
>  void bdrv_parent_drained_begin(BlockDriverState *bs)
>  {
> -    BdrvChild *c;
> +    BdrvChild *c, *next;
>  
> -    QLIST_FOREACH(c, &bs->parents, next_parent) {
> +    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
>          if (c->role->drained_begin) {
>              c->role->drained_begin(c);
>          }
> @@ -53,9 +53,9 @@ void bdrv_parent_drained_begin(BlockDriverState *bs)
>  
>  void bdrv_parent_drained_end(BlockDriverState *bs)
>  {
> -    BdrvChild *c;
> +    BdrvChild *c, *next;
>  
> -    QLIST_FOREACH(c, &bs->parents, next_parent) {
> +    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
>          if (c->role->drained_end) {
>              c->role->drained_end(c);
>          }
> -- 
> 2.13.6
> 
Reviewed-by: Jeff Cody <jcody@redhat.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry
  2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry Kevin Wolf
  2017-11-28 16:09   ` Jeff Cody
@ 2017-11-28 16:14   ` Paolo Bonzini
  2017-11-28 16:28     ` Kevin Wolf
  2017-11-28 16:30   ` Fam Zheng
  2017-11-28 16:46   ` Eric Blake
  3 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2017-11-28 16:14 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, stefanha, jcody, famz, qemu-devel

On 28/11/2017 16:43, Kevin Wolf wrote:
> +    /* Make sure that a coroutine that can alternatively reentered from two
> +     * different sources isn't reentered more than once when the first caller
> +     * uses aio_co_schedule() and the other one enters to coroutine directly.
> +     * This is achieved by cancelling the pending aio_co_schedule().
> +     *
> +     * The other way round, if aio_co_schedule() would be called after this
> +     * point, this would be a problem, too, but in practice it doesn't happen
> +     * because we're holding the AioContext lock here and aio_co_schedule()
> +     * callers must do the same.

No, this is not true.  aio_co_schedule is thread-safe.

> This means that the coroutine just needs to
> +     * prevent other callers from calling aio_co_schedule() before it yields
> +     * (e.g. block job coroutines by setting job->busy = true).
> +     *
> +     * We still want to ensure that the second case doesn't happen, so reset
> +     * co->scheduled only after setting co->caller to make the above check
> +     * effective for the co_schedule_bh_cb() case. */
> +    atomic_set(&co->scheduled, NULL);

This doesn't work.  The coroutine is still in the list, and if someone
calls aio_co_schedule again now, any coroutines linked from "co" via
co_scheduled_next are lost.

(Also, the AioContext lock is by design not protecting any state in
AioContext itself; the AioContext lock is only protecting things that
run in an AioContext but do not have their own lock).

Paolo

> +
>      ret = qemu_coroutine_switch(self, co, COROUTINE_ENTER);
>  
>      qemu_co_queue_run_restart(co);
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 1/4] Revert "coroutine: abort if we try to schedule or enter a pending coroutine"
  2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 1/4] Revert "coroutine: abort if we try to schedule or enter a pending coroutine" Kevin Wolf
  2017-11-28 16:00   ` Jeff Cody
@ 2017-11-28 16:18   ` Paolo Bonzini
  2017-11-28 16:37     ` Kevin Wolf
  1 sibling, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2017-11-28 16:18 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, stefanha, jcody, famz, qemu-devel

On 28/11/2017 16:43, Kevin Wolf wrote:
> This reverts commit 6133b39f3c36623425a6ede9e89d93175fde15cd.
> 
> The commit checked conditions that would expose a bug, but there is no
> real reason to forbid them apart from the bug, which we'll fix in a
> minute.
> 
> In particular, reentering a coroutine during co_aio_sleep_ns() is fine;
> the function is explicitly written to allow this.

This is true.

> aio_co_schedule() can indeed conflict with direct coroutine invocations,
> but this is exactky what we want to fix, so remove that check again,
> too.

I'm not sure this is a good idea, as I answered in patch 3.

It can also conflict badly with another aio_co_schedule().  Your patch
here removes the assertion in this case, and patch 3 makes it easier to
get into the situation where two aio_co_schedule()s conflict with each
other.

For example, say you have a coroutine that calls aio_co_schedule on
itself, like

	while (true) {
		aio_co_schedule(qemu_get_current_aio_context(),
				qemu_coroutine_self());
	}

If somebody else calls qemu_coroutine_enter on this coroutine, *that* is
the bug.  These patches would just cause some random corruption or
(perhaps worse) hang.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry
  2017-11-28 16:14   ` Paolo Bonzini
@ 2017-11-28 16:28     ` Kevin Wolf
  2017-11-28 16:42       ` Jeff Cody
  2017-11-28 16:45       ` Paolo Bonzini
  0 siblings, 2 replies; 27+ messages in thread
From: Kevin Wolf @ 2017-11-28 16:28 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, mreitz, stefanha, jcody, famz, qemu-devel

Am 28.11.2017 um 17:14 hat Paolo Bonzini geschrieben:
> On 28/11/2017 16:43, Kevin Wolf wrote:
> > +    /* Make sure that a coroutine that can alternatively reentered from two
> > +     * different sources isn't reentered more than once when the first caller
> > +     * uses aio_co_schedule() and the other one enters to coroutine directly.
> > +     * This is achieved by cancelling the pending aio_co_schedule().
> > +     *
> > +     * The other way round, if aio_co_schedule() would be called after this
> > +     * point, this would be a problem, too, but in practice it doesn't happen
> > +     * because we're holding the AioContext lock here and aio_co_schedule()
> > +     * callers must do the same.
> 
> No, this is not true.  aio_co_schedule is thread-safe.

Hm... With the reproducer we were specfically looking at
qmp_block_job_cancel(), which does take the AioContext locks. But it
might not be as universal as I thought.

To be honest, I just wasn't sure what to do with this case anyway. It
means that the coroutine is already running when someone else schedules
it. We don't really know whether we have to enter it a second time or
not.

So if it can indeed happen in practice, we need to think a bit more
about this.

> > This means that the coroutine just needs to
> > +     * prevent other callers from calling aio_co_schedule() before it yields
> > +     * (e.g. block job coroutines by setting job->busy = true).
> > +     *
> > +     * We still want to ensure that the second case doesn't happen, so reset
> > +     * co->scheduled only after setting co->caller to make the above check
> > +     * effective for the co_schedule_bh_cb() case. */
> > +    atomic_set(&co->scheduled, NULL);
> 
> This doesn't work.  The coroutine is still in the list, and if someone
> calls aio_co_schedule again now, any coroutines linked from "co" via
> co_scheduled_next are lost.

Why would they? We still iterate the whole list in co_schedule_bh_cb(),
we just skip the single qemu_coroutine_enter().

> (Also, the AioContext lock is by design not protecting any state in
> AioContext itself; the AioContext lock is only protecting things that
> run in an AioContext but do not have their own lock).

Such as the coroutine we want to enter, no?

Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry
  2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry Kevin Wolf
  2017-11-28 16:09   ` Jeff Cody
  2017-11-28 16:14   ` Paolo Bonzini
@ 2017-11-28 16:30   ` Fam Zheng
  2017-11-28 16:46   ` Eric Blake
  3 siblings, 0 replies; 27+ messages in thread
From: Fam Zheng @ 2017-11-28 16:30 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, stefanha, jcody, pbonzini, qemu-devel

On Tue, 11/28 16:43, Kevin Wolf wrote:
> If a coroutine can be reentered from multiple possible sources, we need
> to be careful in the case that two sources try to reenter it at the same
> time.
> 
> There are two different cases where this can happen:
> 
> 1. A coroutine spawns multiple asynchronous jobs and waits for all of
>    them to complete. In this case, multiple reentries are expected and
>    the coroutine is probably looping around a yield, so entering it
>    twice is generally fine (but entering it just once after all jobs
>    have completed would be enough, too).
> 
>    Exception: When the loop condition becomes false and the first
>    reenter already leaves the loop, we must not do a second reenter.
> 
> 2. A coroutine yields and provides multiple alternative ways to be
>    reentered. In this case, we must always only process the first
>    reenter.
> 
> Direct entry is not a problem. It requires that the AioContext locks for
> the coroutine context are held, which means that the coroutine has
> enough time to set some state that simply prevents the second caller
> from reentering the coroutine, too.
> 
> Things get more interesting with aio_co_schedule() because the coroutine
> may be scheduled before it is (directly) entered from a second place.
> Then changing some state inside the coroutine doesn't help because it is
> already scheduled and the state won't be checked again.
> 
> For this case, we'll cancel any pending aio_co_schedule() when the
> coroutine is actually entered. Reentering it once is enough for all
> cases explained above, and a requirement for part of them.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/qemu/coroutine_int.h |  1 +
>  util/async.c                 | 15 ++++++++++++++-
>  util/qemu-coroutine.c        | 17 +++++++++++++++++
>  3 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
> index cb98892bba..73fc35e54b 100644
> --- a/include/qemu/coroutine_int.h
> +++ b/include/qemu/coroutine_int.h
> @@ -40,6 +40,7 @@ struct Coroutine {
>      CoroutineEntry *entry;
>      void *entry_arg;
>      Coroutine *caller;
> +    AioContext *scheduled;
>  
>      /* Only used when the coroutine has terminated.  */
>      QSLIST_ENTRY(Coroutine) pool_next;
> diff --git a/util/async.c b/util/async.c
> index 0e1bd8780a..dc249e9404 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -372,6 +372,7 @@ static bool event_notifier_poll(void *opaque)
>  static void co_schedule_bh_cb(void *opaque)
>  {
>      AioContext *ctx = opaque;
> +    AioContext *scheduled_ctx;
>      QSLIST_HEAD(, Coroutine) straight, reversed;
>  
>      QSLIST_MOVE_ATOMIC(&reversed, &ctx->scheduled_coroutines);
> @@ -388,7 +389,16 @@ static void co_schedule_bh_cb(void *opaque)
>          QSLIST_REMOVE_HEAD(&straight, co_scheduled_next);
>          trace_aio_co_schedule_bh_cb(ctx, co);
>          aio_context_acquire(ctx);
> -        qemu_coroutine_enter(co);
> +        scheduled_ctx = atomic_mb_read(&co->scheduled);
> +        if (scheduled_ctx == ctx) {
> +            qemu_coroutine_enter(co);
> +        } else {
> +            /* This must be a cancelled aio_co_schedule() because the coroutine
> +             * was already entered before this BH had a chance to run. If a
> +             * coroutine is scheduled for two different AioContexts, something
> +             * is very wrong. */
> +            assert(scheduled_ctx == NULL);
> +        }
>          aio_context_release(ctx);
>      }
>  }
> @@ -438,6 +448,9 @@ fail:
>  void aio_co_schedule(AioContext *ctx, Coroutine *co)
>  {
>      trace_aio_co_schedule(ctx, co);
> +
> +    /* Set co->scheduled before the coroutine becomes visible in the list */
> +    atomic_mb_set(&co->scheduled, ctx);
>      QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
>                                co, co_scheduled_next);
>      qemu_bh_schedule(ctx->co_schedule_bh);
> diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> index d6095c1d5a..c515b3cb4a 100644
> --- a/util/qemu-coroutine.c
> +++ b/util/qemu-coroutine.c
> @@ -122,6 +122,23 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co)
>       */
>      smp_wmb();
>  
> +    /* Make sure that a coroutine that can alternatively reentered from two

"that can be"?

> +     * different sources isn't reentered more than once when the first caller
> +     * uses aio_co_schedule() and the other one enters to coroutine directly.
> +     * This is achieved by cancelling the pending aio_co_schedule().
> +     *
> +     * The other way round, if aio_co_schedule() would be called after this
> +     * point, this would be a problem, too, but in practice it doesn't happen
> +     * because we're holding the AioContext lock here and aio_co_schedule()
> +     * callers must do the same. This means that the coroutine just needs to
> +     * prevent other callers from calling aio_co_schedule() before it yields
> +     * (e.g. block job coroutines by setting job->busy = true).
> +     *
> +     * We still want to ensure that the second case doesn't happen, so reset
> +     * co->scheduled only after setting co->caller to make the above check
> +     * effective for the co_schedule_bh_cb() case. */
> +    atomic_set(&co->scheduled, NULL);

Looks like accesses to co->scheduled are fully protected by AioContext lock, why
are atomic ops necessary? To catch bugs?

> +
>      ret = qemu_coroutine_switch(self, co, COROUTINE_ENTER);
>  
>      qemu_co_queue_run_restart(co);
> -- 
> 2.13.6
> 

Fam

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 1/4] Revert "coroutine: abort if we try to schedule or enter a pending coroutine"
  2017-11-28 16:18   ` Paolo Bonzini
@ 2017-11-28 16:37     ` Kevin Wolf
  2017-11-28 17:01       ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin Wolf @ 2017-11-28 16:37 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, mreitz, stefanha, jcody, famz, qemu-devel

Am 28.11.2017 um 17:18 hat Paolo Bonzini geschrieben:
> On 28/11/2017 16:43, Kevin Wolf wrote:
> > This reverts commit 6133b39f3c36623425a6ede9e89d93175fde15cd.
> > 
> > The commit checked conditions that would expose a bug, but there is no
> > real reason to forbid them apart from the bug, which we'll fix in a
> > minute.
> > 
> > In particular, reentering a coroutine during co_aio_sleep_ns() is fine;
> > the function is explicitly written to allow this.
> 
> This is true.
> 
> > aio_co_schedule() can indeed conflict with direct coroutine invocations,
> > but this is exactky what we want to fix, so remove that check again,
> > too.
> 
> I'm not sure this is a good idea, as I answered in patch 3.
> 
> It can also conflict badly with another aio_co_schedule().  Your patch
> here removes the assertion in this case, and patch 3 makes it easier to
> get into the situation where two aio_co_schedule()s conflict with each
> other.

I don't see how they conflict. If the second aio_co_schedule() comes
before the coroutine is actually entered, they are effectively simply
merged into a single one. Which is exactly what was intended.

> For example, say you have a coroutine that calls aio_co_schedule on
> itself, like
> 
> 	while (true) {
> 		aio_co_schedule(qemu_get_current_aio_context(),
> 				qemu_coroutine_self());
> 	}
> 
> If somebody else calls qemu_coroutine_enter on this coroutine, *that* is
> the bug.  These patches would just cause some random corruption or
> (perhaps worse) hang.

Obviously not every coroutine is made to be reentered from multiple
places, so for some cases it just might not make a whole lot of sense.
Coroutines that are made for it generally are one of the types I
explained in the commit message of patch 3.

But anyway, how would this cause corruption or a hang (apart from the
fact that this example doesn't have any state that could even be
corrupted)? The external qemu_coroutine_enter() would just replace the
scheduled coroutine call, so the coroutine wouldn't even notice that it
was called from qemu_coroutine_enter() rather than its own scheduled
call.

Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry
  2017-11-28 16:28     ` Kevin Wolf
@ 2017-11-28 16:42       ` Jeff Cody
  2017-11-28 16:51         ` Paolo Bonzini
  2017-11-28 17:03         ` Kevin Wolf
  2017-11-28 16:45       ` Paolo Bonzini
  1 sibling, 2 replies; 27+ messages in thread
From: Jeff Cody @ 2017-11-28 16:42 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, qemu-block, mreitz, stefanha, famz, qemu-devel

On Tue, Nov 28, 2017 at 05:28:50PM +0100, Kevin Wolf wrote:
> Am 28.11.2017 um 17:14 hat Paolo Bonzini geschrieben:
> > On 28/11/2017 16:43, Kevin Wolf wrote:
> > > +    /* Make sure that a coroutine that can alternatively reentered from two
> > > +     * different sources isn't reentered more than once when the first caller
> > > +     * uses aio_co_schedule() and the other one enters to coroutine directly.
> > > +     * This is achieved by cancelling the pending aio_co_schedule().
> > > +     *
> > > +     * The other way round, if aio_co_schedule() would be called after this
> > > +     * point, this would be a problem, too, but in practice it doesn't happen
> > > +     * because we're holding the AioContext lock here and aio_co_schedule()
> > > +     * callers must do the same.
> > 
> > No, this is not true.  aio_co_schedule is thread-safe.
> 
> Hm... With the reproducer we were specfically looking at
> qmp_block_job_cancel(), which does take the AioContext locks. But it
> might not be as universal as I thought.
> 
> To be honest, I just wasn't sure what to do with this case anyway. It
> means that the coroutine is already running when someone else schedules
> it. We don't really know whether we have to enter it a second time or
> not.
> 
> So if it can indeed happen in practice, we need to think a bit more
> about this.

It would be nice if, on coroutine termination, we could unschedule all
pending executions for that coroutine.  I think use-after-free is the main
concern for someone else calling aio_co_schedule() while the coroutine is
currently running.

> 
> > > This means that the coroutine just needs to
> > > +     * prevent other callers from calling aio_co_schedule() before it yields
> > > +     * (e.g. block job coroutines by setting job->busy = true).
> > > +     *
> > > +     * We still want to ensure that the second case doesn't happen, so reset
> > > +     * co->scheduled only after setting co->caller to make the above check
> > > +     * effective for the co_schedule_bh_cb() case. */
> > > +    atomic_set(&co->scheduled, NULL);
> > 
> > This doesn't work.  The coroutine is still in the list, and if someone
> > calls aio_co_schedule again now, any coroutines linked from "co" via
> > co_scheduled_next are lost.
> 
> Why would they? We still iterate the whole list in co_schedule_bh_cb(),
> we just skip the single qemu_coroutine_enter().
> 
> > (Also, the AioContext lock is by design not protecting any state in
> > AioContext itself; the AioContext lock is only protecting things that
> > run in an AioContext but do not have their own lock).
> 
> Such as the coroutine we want to enter, no?
> 
> Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry
  2017-11-28 16:28     ` Kevin Wolf
  2017-11-28 16:42       ` Jeff Cody
@ 2017-11-28 16:45       ` Paolo Bonzini
  1 sibling, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2017-11-28 16:45 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, stefanha, jcody, famz, qemu-devel

On 28/11/2017 17:28, Kevin Wolf wrote:
> To be honest, I just wasn't sure what to do with this case anyway. It
> means that the coroutine is already running when someone else schedules
> it. We don't really know whether we have to enter it a second time or
> not.
>
> So if it can indeed happen in practice, we need to think a bit more
> about this.

>>> This means that the coroutine just needs to
>>> +     * prevent other callers from calling aio_co_schedule() before it yields
>>> +     * (e.g. block job coroutines by setting job->busy = true).
>>> +     *
>>> +     * We still want to ensure that the second case doesn't happen, so reset
>>> +     * co->scheduled only after setting co->caller to make the above check
>>> +     * effective for the co_schedule_bh_cb() case. */
>>> +    atomic_set(&co->scheduled, NULL);
>>
>> This doesn't work.  The coroutine is still in the list, and if someone
>> calls aio_co_schedule again now, any coroutines linked from "co" via
>> co_scheduled_next are lost.
> 
> Why would they? We still iterate the whole list in co_schedule_bh_cb(),
> we just skip the single qemu_coroutine_enter().

Because you modify co_scheduled_next (in QSLIST_INSERT_HEAD_ATOMIC)
before it has been consumed.

>> (Also, the AioContext lock is by design not protecting any state in
>> AioContext itself; the AioContext lock is only protecting things that
>> run in an AioContext but do not have their own lock).
> 
> Such as the coroutine we want to enter, no?

If you mean the Coroutine object then no, coroutines are entirely
thread-safe.  All you need to do is: 1) re-enter them with aio_co_wake
or aio_co_schedule after the first time you've entered them; 2) make
sure you don't enter/schedule them twice.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry
  2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry Kevin Wolf
                     ` (2 preceding siblings ...)
  2017-11-28 16:30   ` Fam Zheng
@ 2017-11-28 16:46   ` Eric Blake
  3 siblings, 0 replies; 27+ messages in thread
From: Eric Blake @ 2017-11-28 16:46 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block
  Cc: famz, jcody, qemu-devel, mreitz, stefanha, pbonzini

On 11/28/2017 09:43 AM, Kevin Wolf wrote:
> If a coroutine can be reentered from multiple possible sources, we need
> to be careful in the case that two sources try to reenter it at the same
> time.
> 

> For this case, we'll cancel any pending aio_co_schedule() when the
> coroutine is actually entered. Reentering it once is enough for all
> cases explained above, and a requirement for part of them.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   include/qemu/coroutine_int.h |  1 +

> +++ b/util/qemu-coroutine.c
> @@ -122,6 +122,23 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co)
>        */
>       smp_wmb();
>   
> +    /* Make sure that a coroutine that can alternatively reentered from two

s/reentered/be reentered/

> +     * different sources isn't reentered more than once when the first caller
> +     * uses aio_co_schedule() and the other one enters to coroutine directly.

s/enters to/enters the/

> +     * This is achieved by cancelling the pending aio_co_schedule().
> +     *
> +     * The other way round, if aio_co_schedule() would be called after this
> +     * point, this would be a problem, too, but in practice it doesn't happen

s/this //

> +     * because we're holding the AioContext lock here and aio_co_schedule()
> +     * callers must do the same. This means that the coroutine just needs to
> +     * prevent other callers from calling aio_co_schedule() before it yields
> +     * (e.g. block job coroutines by setting job->busy = true).

Is 'block job coroutines' the set of coroutines owned by 'block jobs', 
or is it the verb that we are blocking all 'job coroutines'?  I don't 
know if a wording tweak would help avoid ambiguity here.

> +     *
> +     * We still want to ensure that the second case doesn't happen, so reset
> +     * co->scheduled only after setting co->caller to make the above check
> +     * effective for the co_schedule_bh_cb() case. */
> +    atomic_set(&co->scheduled, NULL);
> +
>       ret = qemu_coroutine_switch(self, co, COROUTINE_ENTER);
>   
>       qemu_co_queue_run_restart(co);
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry
  2017-11-28 16:42       ` Jeff Cody
@ 2017-11-28 16:51         ` Paolo Bonzini
  2017-11-28 17:09           ` Jeff Cody
  2017-11-28 17:03         ` Kevin Wolf
  1 sibling, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2017-11-28 16:51 UTC (permalink / raw)
  To: Jeff Cody, Kevin Wolf; +Cc: qemu-block, mreitz, stefanha, famz, qemu-devel

On 28/11/2017 17:42, Jeff Cody wrote:
> On Tue, Nov 28, 2017 at 05:28:50PM +0100, Kevin Wolf wrote:
>> Am 28.11.2017 um 17:14 hat Paolo Bonzini geschrieben:
>>> On 28/11/2017 16:43, Kevin Wolf wrote:
>>>> +    /* Make sure that a coroutine that can alternatively reentered from two
>>>> +     * different sources isn't reentered more than once when the first caller
>>>> +     * uses aio_co_schedule() and the other one enters to coroutine directly.
>>>> +     * This is achieved by cancelling the pending aio_co_schedule().
>>>> +     *
>>>> +     * The other way round, if aio_co_schedule() would be called after this
>>>> +     * point, this would be a problem, too, but in practice it doesn't happen
>>>> +     * because we're holding the AioContext lock here and aio_co_schedule()
>>>> +     * callers must do the same.
>>>
>>> No, this is not true.  aio_co_schedule is thread-safe.
>>
>> Hm... With the reproducer we were specfically looking at
>> qmp_block_job_cancel(), which does take the AioContext locks. But it
>> might not be as universal as I thought.
>>
>> To be honest, I just wasn't sure what to do with this case anyway. It
>> means that the coroutine is already running when someone else schedules
>> it. We don't really know whether we have to enter it a second time or
>> not.
>>
>> So if it can indeed happen in practice, we need to think a bit more
>> about this.
> 
> It would be nice if, on coroutine termination, we could unschedule all
> pending executions for that coroutine.  I think use-after-free is the main
> concern for someone else calling aio_co_schedule() while the coroutine is
> currently running.

Yes, terminating a scheduled coroutine is a bug; same for scheduling a
terminated coroutine, both orders are wrong. However, "unscheduling" is
not the solution; you would just be papering over the issue.

aio_co_schedule() on a running coroutine can only happen when the
coroutine is going to yield soon.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 1/4] Revert "coroutine: abort if we try to schedule or enter a pending coroutine"
  2017-11-28 16:37     ` Kevin Wolf
@ 2017-11-28 17:01       ` Paolo Bonzini
  2017-11-28 17:19         ` Kevin Wolf
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2017-11-28 17:01 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, stefanha, jcody, famz, qemu-devel

On 28/11/2017 17:37, Kevin Wolf wrote:
>>
>> It can also conflict badly with another aio_co_schedule().  Your patch
>> here removes the assertion in this case, and patch 3 makes it easier to
>> get into the situation where two aio_co_schedule()s conflict with each
>> other.
> I don't see how they conflict. If the second aio_co_schedule() comes
> before the coroutine is actually entered, they are effectively simply
> merged into a single one. Which is exactly what was intended.

Suppose you have

	ctx->scheduled_coroutines
	  |
	  '---> co
                |
                '---> NULL

Then aio_co_schedule(ctx, co) does QSLIST_INSERT_HEAD_ATOMIC, which is

	/* On entry, ctx->scheduled_coroutines == co.  */
	co->next = ctx->scheduled_coroutines
	change ctx->scheduled_coroutines from co->next to co

This results in a loop:
	
	ctx->scheduled_coroutines
	  |
	  '---> co <-.
                |    |
                '----'

This is an obvious hang due to list corruption.  In other more
complicated cases it can skip a wakeup, which is a more subtle kind of
hang.  You can also get memory corruption if the coroutine terminates
(and is freed) before aio_co_schedule executes.

Basically, once you do aio_co_schedule or aio_co_wake the coroutine is
not any more yours.  It's owned by the context that will run it and you
should not do anything with it.

The same is true for co_aio_sleep_ns.  Just because in practice it works
(and it seemed like a clever idea at the time) it doesn't mean it *is* a
good idea...

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry
  2017-11-28 16:42       ` Jeff Cody
  2017-11-28 16:51         ` Paolo Bonzini
@ 2017-11-28 17:03         ` Kevin Wolf
  1 sibling, 0 replies; 27+ messages in thread
From: Kevin Wolf @ 2017-11-28 17:03 UTC (permalink / raw)
  To: Jeff Cody; +Cc: Paolo Bonzini, qemu-block, mreitz, stefanha, famz, qemu-devel

Am 28.11.2017 um 17:42 hat Jeff Cody geschrieben:
> On Tue, Nov 28, 2017 at 05:28:50PM +0100, Kevin Wolf wrote:
> > Am 28.11.2017 um 17:14 hat Paolo Bonzini geschrieben:
> > > On 28/11/2017 16:43, Kevin Wolf wrote:
> > > > +    /* Make sure that a coroutine that can alternatively reentered from two
> > > > +     * different sources isn't reentered more than once when the first caller
> > > > +     * uses aio_co_schedule() and the other one enters to coroutine directly.
> > > > +     * This is achieved by cancelling the pending aio_co_schedule().
> > > > +     *
> > > > +     * The other way round, if aio_co_schedule() would be called after this
> > > > +     * point, this would be a problem, too, but in practice it doesn't happen
> > > > +     * because we're holding the AioContext lock here and aio_co_schedule()
> > > > +     * callers must do the same.
> > > 
> > > No, this is not true.  aio_co_schedule is thread-safe.
> > 
> > Hm... With the reproducer we were specfically looking at
> > qmp_block_job_cancel(), which does take the AioContext locks. But it
> > might not be as universal as I thought.
> > 
> > To be honest, I just wasn't sure what to do with this case anyway. It
> > means that the coroutine is already running when someone else schedules
> > it. We don't really know whether we have to enter it a second time or
> > not.
> > 
> > So if it can indeed happen in practice, we need to think a bit more
> > about this.
> 
> It would be nice if, on coroutine termination, we could unschedule all
> pending executions for that coroutine.  I think use-after-free is the main
> concern for someone else calling aio_co_schedule() while the coroutine is
> currently running.

I'd rather assert that this never happens.

Because if you have the same situation, but the coroutine doesn't
terminate, but just yields for another reason, then reentering it with
the assumption that it's still in the previous yield is a bad bug that
you definitely want to catch.

Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry
  2017-11-28 16:51         ` Paolo Bonzini
@ 2017-11-28 17:09           ` Jeff Cody
  2017-11-28 17:14             ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Cody @ 2017-11-28 17:09 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, qemu-block, mreitz, stefanha, famz, qemu-devel

On Tue, Nov 28, 2017 at 05:51:21PM +0100, Paolo Bonzini wrote:
> On 28/11/2017 17:42, Jeff Cody wrote:
> > On Tue, Nov 28, 2017 at 05:28:50PM +0100, Kevin Wolf wrote:
> >> Am 28.11.2017 um 17:14 hat Paolo Bonzini geschrieben:
> >>> On 28/11/2017 16:43, Kevin Wolf wrote:
> >>>> +    /* Make sure that a coroutine that can alternatively reentered from two
> >>>> +     * different sources isn't reentered more than once when the first caller
> >>>> +     * uses aio_co_schedule() and the other one enters to coroutine directly.
> >>>> +     * This is achieved by cancelling the pending aio_co_schedule().
> >>>> +     *
> >>>> +     * The other way round, if aio_co_schedule() would be called after this
> >>>> +     * point, this would be a problem, too, but in practice it doesn't happen
> >>>> +     * because we're holding the AioContext lock here and aio_co_schedule()
> >>>> +     * callers must do the same.
> >>>
> >>> No, this is not true.  aio_co_schedule is thread-safe.
> >>
> >> Hm... With the reproducer we were specfically looking at
> >> qmp_block_job_cancel(), which does take the AioContext locks. But it
> >> might not be as universal as I thought.
> >>
> >> To be honest, I just wasn't sure what to do with this case anyway. It
> >> means that the coroutine is already running when someone else schedules
> >> it. We don't really know whether we have to enter it a second time or
> >> not.
> >>
> >> So if it can indeed happen in practice, we need to think a bit more
> >> about this.
> > 
> > It would be nice if, on coroutine termination, we could unschedule all
> > pending executions for that coroutine.  I think use-after-free is the main
> > concern for someone else calling aio_co_schedule() while the coroutine is
> > currently running.
> 
> Yes, terminating a scheduled coroutine is a bug; same for scheduling a
> terminated coroutine, both orders are wrong. However, "unscheduling" is
> not the solution; you would just be papering over the issue.
> 

Maybe we should at least add an abort on coroutine termination if there are
still outstanding schedules, as that is preferable to operating in the
weeds.


> aio_co_schedule() on a running coroutine can only happen when the
> coroutine is going to yield soon.
> 

That is a bit vague.  What is "soon", and how does an external caller know
if a coroutine is going to yield in this timeframe?


Jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry
  2017-11-28 17:09           ` Jeff Cody
@ 2017-11-28 17:14             ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2017-11-28 17:14 UTC (permalink / raw)
  To: Jeff Cody; +Cc: Kevin Wolf, qemu-block, mreitz, stefanha, famz, qemu-devel

On 28/11/2017 18:09, Jeff Cody wrote:
>> Yes, terminating a scheduled coroutine is a bug; same for scheduling a
>> terminated coroutine, both orders are wrong. However, "unscheduling" is
>> not the solution; you would just be papering over the issue.
>
> Maybe we should at least add an abort on coroutine termination if there are
> still outstanding schedules, as that is preferable to operating in the
> weeds.

Sure, why not.  I'm all for adding more assertions (not less :)).

>> aio_co_schedule() on a running coroutine can only happen when the
>> coroutine is going to yield soon.
>>
> That is a bit vague.  What is "soon", and how does an external caller know
> if a coroutine is going to yield in this timeframe?

Soon really means "eventually"; basically if you do

	f();
	qemu_coroutine_yield();

then f() can call aio_co_wake() or aio_co_schedule() and knows that it
will be entirely thread-safe.

However, remember that only one aio_co_schedule() can be pending at a
single time; so if you have

	f();
	g();
	qemu_coroutine_yield();

either f() or g() can wake you up but not both.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 1/4] Revert "coroutine: abort if we try to schedule or enter a pending coroutine"
  2017-11-28 17:01       ` Paolo Bonzini
@ 2017-11-28 17:19         ` Kevin Wolf
  2017-11-28 17:33           ` Jeff Cody
  2017-11-28 17:35           ` Paolo Bonzini
  0 siblings, 2 replies; 27+ messages in thread
From: Kevin Wolf @ 2017-11-28 17:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, mreitz, stefanha, jcody, famz, qemu-devel

Am 28.11.2017 um 18:01 hat Paolo Bonzini geschrieben:
> On 28/11/2017 17:37, Kevin Wolf wrote:
> >>
> >> It can also conflict badly with another aio_co_schedule().  Your patch
> >> here removes the assertion in this case, and patch 3 makes it easier to
> >> get into the situation where two aio_co_schedule()s conflict with each
> >> other.
> > I don't see how they conflict. If the second aio_co_schedule() comes
> > before the coroutine is actually entered, they are effectively simply
> > merged into a single one. Which is exactly what was intended.
> 
> Suppose you have
> 
> 	ctx->scheduled_coroutines
> 	  |
> 	  '---> co
>                 |
>                 '---> NULL
> 
> Then aio_co_schedule(ctx, co) does QSLIST_INSERT_HEAD_ATOMIC, which is
> 
> 	/* On entry, ctx->scheduled_coroutines == co.  */
> 	co->next = ctx->scheduled_coroutines
> 	change ctx->scheduled_coroutines from co->next to co
> 
> This results in a loop:
> 	
> 	ctx->scheduled_coroutines
> 	  |
> 	  '---> co <-.
>                 |    |
>                 '----'
> 
> This is an obvious hang due to list corruption.  In other more
> complicated cases it can skip a wakeup, which is a more subtle kind of
> hang.  You can also get memory corruption if the coroutine terminates
> (and is freed) before aio_co_schedule executes.

Okay, I can see that. I thought you were talking about the logic
introduced by this series, but you're actually talking about preexisting
behaviour which limits the usefulness of the patches.

> Basically, once you do aio_co_schedule or aio_co_wake the coroutine is
> not any more yours.  It's owned by the context that will run it and you
> should not do anything with it.
> 
> The same is true for co_aio_sleep_ns.  Just because in practice it works
> (and it seemed like a clever idea at the time) it doesn't mean it *is* a
> good idea...

Well, but that poses the question how you can implement any coroutine
that can be reentered from more than one place. Not being able to do
that would be a severe restriction.

For example, take quorum, which handles requests by spawning a coroutine
for every child and then yielding until acb->count == s->num_children.
This means that the main request coroutine can be reentered from the
child request coroutines in any order and timing.

What saves us currently is that they are all in the same AioContext, so
we won't actually end up in aio_co_schedule(), but I'm sure that sooner
or later we'll find cases where we're waiting for several workers that
are spread across different I/O threads.

I don't think "don't do that" is a good answer to this.

And yes, reentering co_aio_sleep_ns() early is a real requirement, too.
The test case that used speed=1 and would have waited a few hours before
actually cancelling the job is an extreme example, but even just
delaying cancellation for a second is bad if this is a second of
migration downtime.

Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 1/4] Revert "coroutine: abort if we try to schedule or enter a pending coroutine"
  2017-11-28 17:19         ` Kevin Wolf
@ 2017-11-28 17:33           ` Jeff Cody
  2017-11-28 17:35           ` Paolo Bonzini
  1 sibling, 0 replies; 27+ messages in thread
From: Jeff Cody @ 2017-11-28 17:33 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, qemu-block, mreitz, stefanha, famz, qemu-devel

On Tue, Nov 28, 2017 at 06:19:31PM +0100, Kevin Wolf wrote:
> Am 28.11.2017 um 18:01 hat Paolo Bonzini geschrieben:
> > On 28/11/2017 17:37, Kevin Wolf wrote:
> > >>
> > >> It can also conflict badly with another aio_co_schedule().  Your patch
> > >> here removes the assertion in this case, and patch 3 makes it easier to
> > >> get into the situation where two aio_co_schedule()s conflict with each
> > >> other.
> > > I don't see how they conflict. If the second aio_co_schedule() comes
> > > before the coroutine is actually entered, they are effectively simply
> > > merged into a single one. Which is exactly what was intended.
> > 
> > Suppose you have
> > 
> > 	ctx->scheduled_coroutines
> > 	  |
> > 	  '---> co
> >                 |
> >                 '---> NULL
> > 
> > Then aio_co_schedule(ctx, co) does QSLIST_INSERT_HEAD_ATOMIC, which is
> > 
> > 	/* On entry, ctx->scheduled_coroutines == co.  */
> > 	co->next = ctx->scheduled_coroutines
> > 	change ctx->scheduled_coroutines from co->next to co
> > 
> > This results in a loop:
> > 	
> > 	ctx->scheduled_coroutines
> > 	  |
> > 	  '---> co <-.
> >                 |    |
> >                 '----'
> > 
> > This is an obvious hang due to list corruption.  In other more
> > complicated cases it can skip a wakeup, which is a more subtle kind of
> > hang.  You can also get memory corruption if the coroutine terminates
> > (and is freed) before aio_co_schedule executes.
> 
> Okay, I can see that. I thought you were talking about the logic
> introduced by this series, but you're actually talking about preexisting
> behaviour which limits the usefulness of the patches.
> 
> > Basically, once you do aio_co_schedule or aio_co_wake the coroutine is
> > not any more yours.  It's owned by the context that will run it and you
> > should not do anything with it.
> > 
> > The same is true for co_aio_sleep_ns.  Just because in practice it works
> > (and it seemed like a clever idea at the time) it doesn't mean it *is* a
> > good idea...
> 
> Well, but that poses the question how you can implement any coroutine
> that can be reentered from more than one place. Not being able to do
> that would be a severe restriction.
> 
> For example, take quorum, which handles requests by spawning a coroutine
> for every child and then yielding until acb->count == s->num_children.
> This means that the main request coroutine can be reentered from the
> child request coroutines in any order and timing.
> 
> What saves us currently is that they are all in the same AioContext, so
> we won't actually end up in aio_co_schedule(), but I'm sure that sooner
> or later we'll find cases where we're waiting for several workers that
> are spread across different I/O threads.
> 
> I don't think "don't do that" is a good answer to this.
> 
> And yes, reentering co_aio_sleep_ns() early is a real requirement, too.
> The test case that used speed=1 and would have waited a few hours before
> actually cancelling the job is an extreme example, but even just
> delaying cancellation for a second is bad if this is a second of
> migration downtime.
> 

At least this last part should be doable; reentering co_aio_sleep_ns()
really should mean that we want to short-circuit the QEMU timer for the
sleep cb.  Any reason we can't allow a reenter by removing the timer for
that sleep, and then doing an aio_co_wake?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 1/4] Revert "coroutine: abort if we try to schedule or enter a pending coroutine"
  2017-11-28 17:19         ` Kevin Wolf
  2017-11-28 17:33           ` Jeff Cody
@ 2017-11-28 17:35           ` Paolo Bonzini
  1 sibling, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2017-11-28 17:35 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, stefanha, jcody, famz, qemu-devel

On 28/11/2017 18:19, Kevin Wolf wrote:
> Am 28.11.2017 um 18:01 hat Paolo Bonzini geschrieben:
>> Basically, once you do aio_co_schedule or aio_co_wake the coroutine is
>> not any more yours.  It's owned by the context that will run it and you
>> should not do anything with it.
> 
> Well, but that poses the question how you can implement any coroutine
> that can be reentered from more than one place. Not being able to do
> that would be a severe restriction.
>
> For example, take quorum, which handles requests by spawning a coroutine
> for every child and then yielding until acb->count == s->num_children.
> This means that the main request coroutine can be reentered from the
> child request coroutines in any order and timing.
> 
> What saves us currently is that they are all in the same AioContext, so
> we won't actually end up in aio_co_schedule(), but I'm sure that sooner
> or later we'll find cases where we're waiting for several workers that
> are spread across different I/O threads.

That can work as long as you add synchronization among those "more than
one place" (quorum doesn't need it because as you say there's only one
AioContext involved).  Take the previous example of

	f();
	g();
	qemu_coroutine_yield();

If f() and g() do

	lock a mutex
	x--;
	wake = (x == 0)
	unlock a mutex
	if (wake)
		aio_co_wake(co)

then that's fine, because aio_co_wake is only entered in one place.

> I don't think "don't do that" is a good answer to this.
> 
> And yes, reentering co_aio_sleep_ns() early is a real requirement, too.
> The test case that used speed=1 and would have waited a few hours before
> actually cancelling the job is an extreme example, but even just
> delaying cancellation for a second is bad if this is a second of
> migration downtime.

Ok, that can be fixed along the lines of what Jeff has just written.
I'll take a look.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-11-28 17:36 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-28 15:43 [Qemu-devel] [PATCH for-2.11 0/4] Fix qemu-iotests failures Kevin Wolf
2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 1/4] Revert "coroutine: abort if we try to schedule or enter a pending coroutine" Kevin Wolf
2017-11-28 16:00   ` Jeff Cody
2017-11-28 16:18   ` Paolo Bonzini
2017-11-28 16:37     ` Kevin Wolf
2017-11-28 17:01       ` Paolo Bonzini
2017-11-28 17:19         ` Kevin Wolf
2017-11-28 17:33           ` Jeff Cody
2017-11-28 17:35           ` Paolo Bonzini
2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 2/4] Revert "blockjob: do not allow coroutine double entry or entry-after-completion" Kevin Wolf
2017-11-28 16:00   ` Jeff Cody
2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry Kevin Wolf
2017-11-28 16:09   ` Jeff Cody
2017-11-28 16:14   ` Paolo Bonzini
2017-11-28 16:28     ` Kevin Wolf
2017-11-28 16:42       ` Jeff Cody
2017-11-28 16:51         ` Paolo Bonzini
2017-11-28 17:09           ` Jeff Cody
2017-11-28 17:14             ` Paolo Bonzini
2017-11-28 17:03         ` Kevin Wolf
2017-11-28 16:45       ` Paolo Bonzini
2017-11-28 16:30   ` Fam Zheng
2017-11-28 16:46   ` Eric Blake
2017-11-28 15:43 ` [Qemu-devel] [PATCH for-2.11 4/4] block: Expect graph changes in bdrv_parent_drained_begin/end Kevin Wolf
2017-11-28 16:10   ` Jeff Cody
2017-11-28 15:56 ` [Qemu-devel] [PATCH for-2.11 0/4] Fix qemu-iotests failures Jeff Cody
2017-11-28 16:00 ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.