All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/14] Fix some jobs/drain/aio_poll related hangs
@ 2018-09-07 16:15 Kevin Wolf
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 01/14] blockjob: Wake up BDS when job becomes idle Kevin Wolf
                   ` (13 more replies)
  0 siblings, 14 replies; 33+ messages in thread
From: Kevin Wolf @ 2018-09-07 16:15 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, qemu-devel

Especially the combination of iothreads, block jobs and drain tends to
lead to hangs currently. This series fixes a few of these bugs, although
there are more of them, to be addressed in separate patches.

The primary goal of this series is to fix the scenario from:
https://bugzilla.redhat.com/show_bug.cgi?id=1601212

A simplified reproducer of the reported problem looks like this (two concurrent
commit block jobs for disks in an iothread):

$qemu -qmp stdio \
    -object iothread,id=iothread1 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x6,iothread=iothread1 \
    -drive  id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=hd0 \
    -device scsi-hd,drive=drive_image1,id=image1,bootindex=1 \
    -drive  id=drive_image2,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=hd1 \
    -device scsi-hd,drive=drive_image2,id=image2,bootindex=2

{"execute":"qmp_capabilities"}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image1","snapshot-file":"sn1"}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image1","snapshot-file":"sn11"}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image1","snapshot-file":"sn111"}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image2","snapshot-file":"sn2"}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image2","snapshot-file":"sn22"}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image2","snapshot-file":"sn222"}}

{ "execute": "block-commit", "arguments": { "device": "drive_image2","base":"sn2","backing-file":"sn2","top":"sn22"}}
{ "execute": "block-commit", "arguments": { "device": "drive_image1","base":"sn1","backing-file":"sn1","top":"sn11"}}

{"execute":"quit"}

Kevin Wolf (14):
  blockjob: Wake up BDS when job becomes idle
  test-bdrv-drain: Drain with block jobs in an I/O thread
  test-blockjob: Acquire AioContext around job_finish_sync()
  job: Use AIO_WAIT_WHILE() in job_finish_sync()
  test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback
  block: Add missing locking in bdrv_co_drain_bh_cb()
  aio-wait: Increase num_waiters even in home thread
  block-backend: Add .drained_poll callback
  block-backend: Fix potential double blk_delete()
  block-backend: Decrease in_flight only after callback
  mirror: Fix potential use-after-free in active commit
  blockjob: Lie better in child_job_drained_poll()
  block: Remove aio_poll() in bdrv_drain_poll variants
  test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level()

 include/block/aio-wait.h |   2 +
 include/block/blockjob.h |  13 ++++++
 include/qemu/coroutine.h |   5 +++
 include/qemu/job.h       |  12 ++++++
 block/block-backend.c    |  26 ++++++++++-
 block/io.c               |  23 ++++++----
 block/mirror.c           |   9 ++++
 blockjob.c               |  20 ++++++++-
 job.c                    |  29 +++++++++----
 tests/test-bdrv-drain.c  | 110 ++++++++++++++++++++++++++++++++++++++++++++---
 tests/test-blockjob.c    |   6 +++
 util/qemu-coroutine.c    |   5 +++
 12 files changed, 235 insertions(+), 25 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH 01/14] blockjob: Wake up BDS when job becomes idle
  2018-09-07 16:15 [Qemu-devel] [PATCH 00/14] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
@ 2018-09-07 16:15 ` Kevin Wolf
  2018-09-11  7:58   ` Fam Zheng
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 02/14] test-bdrv-drain: Drain with block jobs in an I/O thread Kevin Wolf
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Kevin Wolf @ 2018-09-07 16:15 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, qemu-devel

In the context of draining a BDS, the .drained_poll callback of block
jobs is called. If this returns true (i.e. there is still some activity
pending), the drain operation may call aio_poll() with blocking=true to
wait for completion.

As soon as the pending activity is completed and the job finally arrives
in a quiescent state (i.e. its coroutine either yields with busy=false
or terminates), the block job must notify the aio_poll() loop to wake
up, otherwise we get a deadlock if both are running in different
threads.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/blockjob.h | 13 +++++++++++++
 include/qemu/job.h       |  3 +++
 blockjob.c               | 18 ++++++++++++++++++
 job.c                    |  7 +++++++
 4 files changed, 41 insertions(+)

diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 32c00b7dc0..2290bbb824 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -70,6 +70,9 @@ typedef struct BlockJob {
     /** Called when the job transitions to READY */
     Notifier ready_notifier;
 
+    /** Called when the job coroutine yields or terminates */
+    Notifier idle_notifier;
+
     /** BlockDriverStates that are involved in this block job */
     GSList *nodes;
 } BlockJob;
@@ -119,6 +122,16 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
 void block_job_remove_all_bdrv(BlockJob *job);
 
 /**
+ * block_job_wakeup_all_bdrv:
+ * @job: The block job
+ *
+ * Calls bdrv_wakeup() for all BlockDriverStates that have been added to the
+ * job. This function is to be called whenever child_job_drained_poll() would
+ * go from true to false to notify waiting drain requests.
+ */
+void block_job_wakeup_all_bdrv(BlockJob *job);
+
+/**
  * block_job_set_speed:
  * @job: The job to set the speed for.
  * @speed: The new value
diff --git a/include/qemu/job.h b/include/qemu/job.h
index 18c9223e31..0dae5b8481 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -148,6 +148,9 @@ typedef struct Job {
     /** Notifiers called when the job transitions to READY */
     NotifierList on_ready;
 
+    /** Notifiers called when the job coroutine yields or terminates */
+    NotifierList on_idle;
+
     /** Element of the list of jobs */
     QLIST_ENTRY(Job) job_list;
 
diff --git a/blockjob.c b/blockjob.c
index be5903aa96..8d27e8e1ea 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -221,6 +221,22 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
     return 0;
 }
 
+void block_job_wakeup_all_bdrv(BlockJob *job)
+{
+    GSList *l;
+
+    for (l = job->nodes; l; l = l->next) {
+        BdrvChild *c = l->data;
+        bdrv_wakeup(c->bs);
+    }
+}
+
+static void block_job_on_idle(Notifier *n, void *opaque)
+{
+    BlockJob *job = opaque;
+    block_job_wakeup_all_bdrv(job);
+}
+
 bool block_job_is_internal(BlockJob *job)
 {
     return (job->job.id == NULL);
@@ -419,6 +435,7 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     job->finalize_completed_notifier.notify = block_job_event_completed;
     job->pending_notifier.notify = block_job_event_pending;
     job->ready_notifier.notify = block_job_event_ready;
+    job->idle_notifier.notify = block_job_on_idle;
 
     notifier_list_add(&job->job.on_finalize_cancelled,
                       &job->finalize_cancelled_notifier);
@@ -426,6 +443,7 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
                       &job->finalize_completed_notifier);
     notifier_list_add(&job->job.on_pending, &job->pending_notifier);
     notifier_list_add(&job->job.on_ready, &job->ready_notifier);
+    notifier_list_add(&job->job.on_idle, &job->idle_notifier);
 
     error_setg(&job->blocker, "block device is in use by block job: %s",
                job_type_str(&job->job));
diff --git a/job.c b/job.c
index e36ebaafd8..9ad0b7476a 100644
--- a/job.c
+++ b/job.c
@@ -410,6 +410,11 @@ static void job_event_ready(Job *job)
     notifier_list_notify(&job->on_ready, job);
 }
 
+static void job_event_idle(Job *job)
+{
+    notifier_list_notify(&job->on_idle, job);
+}
+
 void job_enter_cond(Job *job, bool(*fn)(Job *job))
 {
     if (!job_started(job)) {
@@ -455,6 +460,7 @@ static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
         timer_mod(&job->sleep_timer, ns);
     }
     job->busy = false;
+    job_event_idle(job);
     job_unlock();
     qemu_coroutine_yield();
 
@@ -547,6 +553,7 @@ static void coroutine_fn job_co_entry(void *opaque)
     assert(job && job->driver && job->driver->start);
     job_pause_point(job);
     job->driver->start(job);
+    job_event_idle(job);
 }
 
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH 02/14] test-bdrv-drain: Drain with block jobs in an I/O thread
  2018-09-07 16:15 [Qemu-devel] [PATCH 00/14] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 01/14] blockjob: Wake up BDS when job becomes idle Kevin Wolf
@ 2018-09-07 16:15 ` Kevin Wolf
  2018-09-11  8:09   ` Fam Zheng
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 03/14] test-blockjob: Acquire AioContext around job_finish_sync() Kevin Wolf
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Kevin Wolf @ 2018-09-07 16:15 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, qemu-devel

This extends the existing drain test with a block job to include
variants where the block job runs in a different AioContext.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/test-bdrv-drain.c | 92 +++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 86 insertions(+), 6 deletions(-)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 17bb8508ae..ab055e85f8 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -174,6 +174,28 @@ static void do_drain_end(enum drain_type drain_type, BlockDriverState *bs)
     }
 }
 
+static void do_drain_begin_unlocked(enum drain_type drain_type, BlockDriverState *bs)
+{
+    if (drain_type != BDRV_DRAIN_ALL) {
+        aio_context_acquire(bdrv_get_aio_context(bs));
+    }
+    do_drain_begin(drain_type, bs);
+    if (drain_type != BDRV_DRAIN_ALL) {
+        aio_context_release(bdrv_get_aio_context(bs));
+    }
+}
+
+static void do_drain_end_unlocked(enum drain_type drain_type, BlockDriverState *bs)
+{
+    if (drain_type != BDRV_DRAIN_ALL) {
+        aio_context_acquire(bdrv_get_aio_context(bs));
+    }
+    do_drain_end(drain_type, bs);
+    if (drain_type != BDRV_DRAIN_ALL) {
+        aio_context_release(bdrv_get_aio_context(bs));
+    }
+}
+
 static void test_drv_cb_common(enum drain_type drain_type, bool recursive)
 {
     BlockBackend *blk;
@@ -790,11 +812,13 @@ BlockJobDriver test_job_driver = {
     },
 };
 
-static void test_blockjob_common(enum drain_type drain_type)
+static void test_blockjob_common(enum drain_type drain_type, bool use_iothread)
 {
     BlockBackend *blk_src, *blk_target;
     BlockDriverState *src, *target;
     BlockJob *job;
+    IOThread *iothread = NULL;
+    AioContext *ctx;
     int ret;
 
     src = bdrv_new_open_driver(&bdrv_test, "source", BDRV_O_RDWR,
@@ -802,21 +826,31 @@ static void test_blockjob_common(enum drain_type drain_type)
     blk_src = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
     blk_insert_bs(blk_src, src, &error_abort);
 
+    if (use_iothread) {
+        iothread = iothread_new();
+        ctx = iothread_get_aio_context(iothread);
+        blk_set_aio_context(blk_src, ctx);
+    } else {
+        ctx = qemu_get_aio_context();
+    }
+
     target = bdrv_new_open_driver(&bdrv_test, "target", BDRV_O_RDWR,
                                   &error_abort);
     blk_target = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
     blk_insert_bs(blk_target, target, &error_abort);
 
+    aio_context_acquire(ctx);
     job = block_job_create("job0", &test_job_driver, NULL, src, 0, BLK_PERM_ALL,
                            0, 0, NULL, NULL, &error_abort);
     block_job_add_bdrv(job, "target", target, 0, BLK_PERM_ALL, &error_abort);
     job_start(&job->job);
+    aio_context_release(ctx);
 
     g_assert_cmpint(job->job.pause_count, ==, 0);
     g_assert_false(job->job.paused);
     g_assert_true(job->job.busy); /* We're in job_sleep_ns() */
 
-    do_drain_begin(drain_type, src);
+    do_drain_begin_unlocked(drain_type, src);
 
     if (drain_type == BDRV_DRAIN_ALL) {
         /* bdrv_drain_all() drains both src and target */
@@ -827,7 +861,14 @@ static void test_blockjob_common(enum drain_type drain_type)
     g_assert_true(job->job.paused);
     g_assert_false(job->job.busy); /* The job is paused */
 
-    do_drain_end(drain_type, src);
+    do_drain_end_unlocked(drain_type, src);
+
+    if (use_iothread) {
+        /* paused is reset in the I/O thread, wait for it */
+        while (job->job.paused) {
+            aio_poll(qemu_get_aio_context(), false);
+        }
+    }
 
     g_assert_cmpint(job->job.pause_count, ==, 0);
     g_assert_false(job->job.paused);
@@ -846,32 +887,64 @@ static void test_blockjob_common(enum drain_type drain_type)
 
     do_drain_end(drain_type, target);
 
+    if (use_iothread) {
+        /* paused is reset in the I/O thread, wait for it */
+        while (job->job.paused) {
+            aio_poll(qemu_get_aio_context(), false);
+        }
+    }
+
     g_assert_cmpint(job->job.pause_count, ==, 0);
     g_assert_false(job->job.paused);
     g_assert_true(job->job.busy); /* We're in job_sleep_ns() */
 
+    aio_context_acquire(ctx);
     ret = job_complete_sync(&job->job, &error_abort);
     g_assert_cmpint(ret, ==, 0);
 
+    if (use_iothread) {
+        blk_set_aio_context(blk_src, qemu_get_aio_context());
+    }
+    aio_context_release(ctx);
+
     blk_unref(blk_src);
     blk_unref(blk_target);
     bdrv_unref(src);
     bdrv_unref(target);
+
+    if (iothread) {
+        iothread_join(iothread);
+    }
 }
 
 static void test_blockjob_drain_all(void)
 {
-    test_blockjob_common(BDRV_DRAIN_ALL);
+    test_blockjob_common(BDRV_DRAIN_ALL, false);
 }
 
 static void test_blockjob_drain(void)
 {
-    test_blockjob_common(BDRV_DRAIN);
+    test_blockjob_common(BDRV_DRAIN, false);
 }
 
 static void test_blockjob_drain_subtree(void)
 {
-    test_blockjob_common(BDRV_SUBTREE_DRAIN);
+    test_blockjob_common(BDRV_SUBTREE_DRAIN, false);
+}
+
+static void test_blockjob_iothread_drain_all(void)
+{
+    test_blockjob_common(BDRV_DRAIN_ALL, true);
+}
+
+static void test_blockjob_iothread_drain(void)
+{
+    test_blockjob_common(BDRV_DRAIN, true);
+}
+
+static void test_blockjob_iothread_drain_subtree(void)
+{
+    test_blockjob_common(BDRV_SUBTREE_DRAIN, true);
 }
 
 
@@ -1342,6 +1415,13 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/blockjob/drain_subtree",
                     test_blockjob_drain_subtree);
 
+    g_test_add_func("/bdrv-drain/blockjob/iothread/drain_all",
+                    test_blockjob_iothread_drain_all);
+    g_test_add_func("/bdrv-drain/blockjob/iothread/drain",
+                    test_blockjob_iothread_drain);
+    g_test_add_func("/bdrv-drain/blockjob/iothread/drain_subtree",
+                    test_blockjob_iothread_drain_subtree);
+
     g_test_add_func("/bdrv-drain/deletion/drain", test_delete_by_drain);
     g_test_add_func("/bdrv-drain/detach/drain_all", test_detach_by_drain_all);
     g_test_add_func("/bdrv-drain/detach/drain", test_detach_by_drain);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH 03/14] test-blockjob: Acquire AioContext around job_finish_sync()
  2018-09-07 16:15 [Qemu-devel] [PATCH 00/14] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 01/14] blockjob: Wake up BDS when job becomes idle Kevin Wolf
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 02/14] test-bdrv-drain: Drain with block jobs in an I/O thread Kevin Wolf
@ 2018-09-07 16:15 ` Kevin Wolf
  2018-09-11  8:11   ` Fam Zheng
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 04/14] job: Use AIO_WAIT_WHILE() in job_finish_sync() Kevin Wolf
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Kevin Wolf @ 2018-09-07 16:15 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, qemu-devel

All callers in QEMU proper hold the AioContext lock when calling
job_finish_sync(). test-blockjob should do the same.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/qemu/job.h    | 6 ++++++
 tests/test-blockjob.c | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 0dae5b8481..8ac48dbd28 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -520,6 +520,8 @@ void job_user_cancel(Job *job, bool force, Error **errp);
  *
  * Returns the return value from the job if the job actually completed
  * during the call, or -ECANCELED if it was canceled.
+ *
+ * Callers must hold the AioContext lock of job->aio_context.
  */
 int job_cancel_sync(Job *job);
 
@@ -537,6 +539,8 @@ void job_cancel_sync_all(void);
  * function).
  *
  * Returns the return value from the job.
+ *
+ * Callers must hold the AioContext lock of job->aio_context.
  */
 int job_complete_sync(Job *job, Error **errp);
 
@@ -579,6 +583,8 @@ void job_defer_to_main_loop(Job *job, JobDeferToMainLoopFn *fn, void *opaque);
  *
  * Returns 0 if the job is successfully completed, -ECANCELED if the job was
  * cancelled before completing, and -errno in other error cases.
+ *
+ * Callers must hold the AioContext lock of job->aio_context.
  */
 int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp);
 
diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
index cb42f06e61..8c2babbe35 100644
--- a/tests/test-blockjob.c
+++ b/tests/test-blockjob.c
@@ -230,6 +230,10 @@ static void cancel_common(CancelJob *s)
     BlockJob *job = &s->common;
     BlockBackend *blk = s->blk;
     JobStatus sts = job->job.status;
+    AioContext *ctx;
+
+    ctx = job->job.aio_context;
+    aio_context_acquire(ctx);
 
     job_cancel_sync(&job->job);
     if (sts != JOB_STATUS_CREATED && sts != JOB_STATUS_CONCLUDED) {
@@ -239,6 +243,8 @@ static void cancel_common(CancelJob *s)
     assert(job->job.status == JOB_STATUS_NULL);
     job_unref(&job->job);
     destroy_blk(blk);
+
+    aio_context_release(ctx);
 }
 
 static void test_cancel_created(void)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH 04/14] job: Use AIO_WAIT_WHILE() in job_finish_sync()
  2018-09-07 16:15 [Qemu-devel] [PATCH 00/14] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (2 preceding siblings ...)
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 03/14] test-blockjob: Acquire AioContext around job_finish_sync() Kevin Wolf
@ 2018-09-07 16:15 ` Kevin Wolf
  2018-09-11  8:17   ` Fam Zheng
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 05/14] test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback Kevin Wolf
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Kevin Wolf @ 2018-09-07 16:15 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, qemu-devel

job_finish_sync() needs to release the AioContext lock of the job before
calling aio_poll(). Otherwise, callbacks called by aio_poll() would
possibly take the lock a second time and run into a deadlock with a
nested AIO_WAIT_WHILE() call.

Also, job_drain() without aio_poll() isn't necessarily enough to make
progress on a job, it could depend on bottom halves to be executed.

Combine both open-coded while loops into a single AIO_WAIT_WHILE() call
that solves both of these problems.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 job.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/job.c b/job.c
index 9ad0b7476a..8480eda188 100644
--- a/job.c
+++ b/job.c
@@ -29,6 +29,7 @@
 #include "qemu/job.h"
 #include "qemu/id.h"
 #include "qemu/main-loop.h"
+#include "block/aio-wait.h"
 #include "trace-root.h"
 #include "qapi/qapi-events-job.h"
 
@@ -998,6 +999,7 @@ void job_defer_to_main_loop(Job *job, JobDeferToMainLoopFn *fn, void *opaque)
 int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
 {
     Error *local_err = NULL;
+    AioWait dummy_wait = {};
     int ret;
 
     job_ref(job);
@@ -1010,14 +1012,10 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
         job_unref(job);
         return -EBUSY;
     }
-    /* job_drain calls job_enter, and it should be enough to induce progress
-     * until the job completes or moves to the main thread. */
-    while (!job->deferred_to_main_loop && !job_is_completed(job)) {
-        job_drain(job);
-    }
-    while (!job_is_completed(job)) {
-        aio_poll(qemu_get_aio_context(), true);
-    }
+
+    AIO_WAIT_WHILE(&dummy_wait, job->aio_context,
+                   (job_drain(job), !job_is_completed(job)));
+
     ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : job->ret;
     job_unref(job);
     return ret;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH 05/14] test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback
  2018-09-07 16:15 [Qemu-devel] [PATCH 00/14] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (3 preceding siblings ...)
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 04/14] job: Use AIO_WAIT_WHILE() in job_finish_sync() Kevin Wolf
@ 2018-09-07 16:15 ` Kevin Wolf
  2018-09-11  8:17   ` Fam Zheng
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 06/14] block: Add missing locking in bdrv_co_drain_bh_cb() Kevin Wolf
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Kevin Wolf @ 2018-09-07 16:15 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, qemu-devel

This is a regression test for a deadlock that occurred in block job
completion callbacks (via job_defer_to_main_loop) because the AioContext
lock was taken twice: once in job_finish_sync() and then again in
job_defer_to_main_loop_bh(). This would cause AIO_WAIT_WHILE() to hang.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/test-bdrv-drain.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index ab055e85f8..9641a20dd8 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -776,6 +776,11 @@ typedef struct TestBlockJob {
 
 static void test_job_completed(Job *job, void *opaque)
 {
+    TestBlockJob *s = container_of(job, TestBlockJob, common.job);
+
+    /* Provoke an AIO_WAIT_WHILE() call to verify there is no deadlock */
+    blk_flush(s->common.blk);
+
     job_completed(job, 0, NULL);
 }
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH 06/14] block: Add missing locking in bdrv_co_drain_bh_cb()
  2018-09-07 16:15 [Qemu-devel] [PATCH 00/14] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (4 preceding siblings ...)
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 05/14] test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback Kevin Wolf
@ 2018-09-07 16:15 ` Kevin Wolf
  2018-09-11  8:23   ` Fam Zheng
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 07/14] aio-wait: Increase num_waiters even in home thread Kevin Wolf
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Kevin Wolf @ 2018-09-07 16:15 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, qemu-devel

bdrv_do_drained_begin/end() assume that they are called with the
AioContext lock of bs held. If we call drain functions from a coroutine
with the AioContext lock held, we yield and schedule a BH to move out of
coroutine context. This means that the lock for the home context of the
coroutine is released and must be re-acquired in the bottom half.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/qemu/coroutine.h |  5 +++++
 block/io.c               | 15 +++++++++++++++
 util/qemu-coroutine.c    |  5 +++++
 3 files changed, 25 insertions(+)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 6f8a487041..9801e7f5a4 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -90,6 +90,11 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co);
 void coroutine_fn qemu_coroutine_yield(void);
 
 /**
+ * Get the AioContext of the given coroutine
+ */
+AioContext *coroutine_fn qemu_coroutine_get_aio_context(Coroutine *co);
+
+/**
  * Get the currently executing coroutine
  */
 Coroutine *coroutine_fn qemu_coroutine_self(void);
diff --git a/block/io.c b/block/io.c
index 7100344c7b..914ba78f1a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -288,6 +288,18 @@ static void bdrv_co_drain_bh_cb(void *opaque)
     BlockDriverState *bs = data->bs;
 
     if (bs) {
+        AioContext *ctx = bdrv_get_aio_context(bs);
+        AioContext *co_ctx = qemu_coroutine_get_aio_context(co);
+
+        /*
+         * When the coroutine yielded, the lock for its home context was
+         * released, so we need to re-acquire it here. If it explicitly
+         * acquired a different context, the lock is still held and we don't
+         * want to lock it a second time (or AIO_WAIT_WHILE() would hang).
+         */
+        if (ctx == co_ctx) {
+            aio_context_acquire(ctx);
+        }
         bdrv_dec_in_flight(bs);
         if (data->begin) {
             bdrv_do_drained_begin(bs, data->recursive, data->parent,
@@ -296,6 +308,9 @@ static void bdrv_co_drain_bh_cb(void *opaque)
             bdrv_do_drained_end(bs, data->recursive, data->parent,
                                 data->ignore_bds_parents);
         }
+        if (ctx == co_ctx) {
+            aio_context_release(ctx);
+        }
     } else {
         assert(data->begin);
         bdrv_drain_all_begin();
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index 1ba4191b84..2295928d33 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -198,3 +198,8 @@ bool qemu_coroutine_entered(Coroutine *co)
 {
     return co->caller;
 }
+
+AioContext *coroutine_fn qemu_coroutine_get_aio_context(Coroutine *co)
+{
+    return co->ctx;
+}
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH 07/14] aio-wait: Increase num_waiters even in home thread
  2018-09-07 16:15 [Qemu-devel] [PATCH 00/14] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (5 preceding siblings ...)
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 06/14] block: Add missing locking in bdrv_co_drain_bh_cb() Kevin Wolf
@ 2018-09-07 16:15 ` Kevin Wolf
  2018-09-11  8:25   ` Fam Zheng
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 08/14] block-backend: Add .drained_poll callback Kevin Wolf
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Kevin Wolf @ 2018-09-07 16:15 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, qemu-devel

Even if AIO_WAIT_WHILE() is called in the home context of the
AioContext, we still want to allow the condition to change depending on
other threads as long as they kick the AioWait. Specfically block jobs
can be running in an I/O thread and should then be able to kick a drain
in the main loop context.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/aio-wait.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index c85a62f798..46ba7f9111 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -77,10 +77,12 @@ typedef struct {
     AioWait *wait_ = (wait);                                       \
     AioContext *ctx_ = (ctx);                                      \
     if (ctx_ && in_aio_context_home_thread(ctx_)) {                \
+        atomic_inc(&wait_->num_waiters);                           \
         while ((cond)) {                                           \
             aio_poll(ctx_, true);                                  \
             waited_ = true;                                        \
         }                                                          \
+        atomic_dec(&wait_->num_waiters);                           \
     } else {                                                       \
         assert(qemu_get_current_aio_context() ==                   \
                qemu_get_aio_context());                            \
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH 08/14] block-backend: Add .drained_poll callback
  2018-09-07 16:15 [Qemu-devel] [PATCH 00/14] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (6 preceding siblings ...)
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 07/14] aio-wait: Increase num_waiters even in home thread Kevin Wolf
@ 2018-09-07 16:15 ` Kevin Wolf
  2018-09-11  8:26   ` Fam Zheng
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 09/14] block-backend: Fix potential double blk_delete() Kevin Wolf
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Kevin Wolf @ 2018-09-07 16:15 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, qemu-devel

A bdrv_drain operation must ensure that all parents are quiesced, this
includes BlockBackends. Otherwise, callbacks called by requests that are
completed on the BDS layer, but not quite yet on the BlockBackend layer
could still create new requests.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/block-backend.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index fa120630be..efa8d8011c 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -121,6 +121,7 @@ static void blk_root_inherit_options(int *child_flags, QDict *child_options,
     abort();
 }
 static void blk_root_drained_begin(BdrvChild *child);
+static bool blk_root_drained_poll(BdrvChild *child);
 static void blk_root_drained_end(BdrvChild *child);
 
 static void blk_root_change_media(BdrvChild *child, bool load);
@@ -294,6 +295,7 @@ static const BdrvChildRole child_root = {
     .get_parent_desc    = blk_root_get_parent_desc,
 
     .drained_begin      = blk_root_drained_begin,
+    .drained_poll       = blk_root_drained_poll,
     .drained_end        = blk_root_drained_end,
 
     .activate           = blk_root_activate,
@@ -2191,6 +2193,13 @@ static void blk_root_drained_begin(BdrvChild *child)
     }
 }
 
+static bool blk_root_drained_poll(BdrvChild *child)
+{
+    BlockBackend *blk = child->opaque;
+    assert(blk->quiesce_counter);
+    return !!blk->in_flight;
+}
+
 static void blk_root_drained_end(BdrvChild *child)
 {
     BlockBackend *blk = child->opaque;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH 09/14] block-backend: Fix potential double blk_delete()
  2018-09-07 16:15 [Qemu-devel] [PATCH 00/14] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (7 preceding siblings ...)
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 08/14] block-backend: Add .drained_poll callback Kevin Wolf
@ 2018-09-07 16:15 ` Kevin Wolf
  2018-09-11  8:27   ` Fam Zheng
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 10/14] block-backend: Decrease in_flight only after callback Kevin Wolf
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Kevin Wolf @ 2018-09-07 16:15 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, qemu-devel

blk_unref() first decreases the refcount of the BlockBackend and calls
blk_delete() if the refcount reaches zero. Requests can still be in
flight at this point, they are only drained during blk_delete():

At this point, arbitrary callbacks can run. If any callback takes a
temporary BlockBackend reference, it will first increase the refcount to
1 and then decrease it to 0 again, triggering another blk_delete(). This
will cause a use-after-free crash in the outer blk_delete().

Fix it by draining the BlockBackend before decreasing to refcount to 0.
Assert in blk_ref() that it never takes the first refcount (which would
mean that the BlockBackend is already being deleted).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/block-backend.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index efa8d8011c..1b2d7a6ff5 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -435,6 +435,7 @@ int blk_get_refcnt(BlockBackend *blk)
  */
 void blk_ref(BlockBackend *blk)
 {
+    assert(blk->refcnt > 0);
     blk->refcnt++;
 }
 
@@ -447,7 +448,11 @@ void blk_unref(BlockBackend *blk)
 {
     if (blk) {
         assert(blk->refcnt > 0);
-        if (!--blk->refcnt) {
+        if (blk->refcnt > 1) {
+            blk->refcnt--;
+        } else {
+            blk_drain(blk);
+            blk->refcnt = 0;
             blk_delete(blk);
         }
     }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH 10/14] block-backend: Decrease in_flight only after callback
  2018-09-07 16:15 [Qemu-devel] [PATCH 00/14] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (8 preceding siblings ...)
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 09/14] block-backend: Fix potential double blk_delete() Kevin Wolf
@ 2018-09-07 16:15 ` Kevin Wolf
  2018-09-11  8:29   ` Fam Zheng
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 11/14] mirror: Fix potential use-after-free in active commit Kevin Wolf
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Kevin Wolf @ 2018-09-07 16:15 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, qemu-devel

Request callbacks can do pretty much anything, including operations that
will yield from the coroutine (such as draining the backend). In that
case, a decreased in_flight would be visible to other code and could
lead to a drain completing while the callback hasn't actually completed
yet.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/block-backend.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 1b2d7a6ff5..2efaf0c968 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1338,8 +1338,16 @@ static const AIOCBInfo blk_aio_em_aiocb_info = {
 static void blk_aio_complete(BlkAioEmAIOCB *acb)
 {
     if (acb->has_returned) {
-        blk_dec_in_flight(acb->rwco.blk);
+        if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
+            /* If we are in the main thread, the callback is allowed to unref
+             * the BlockBackend, so we have to hold an additional reference */
+            blk_ref(acb->rwco.blk);
+        }
         acb->common.cb(acb->common.opaque, acb->rwco.ret);
+        blk_dec_in_flight(acb->rwco.blk);
+        if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
+            blk_unref(acb->rwco.blk);
+        }
         qemu_aio_unref(acb);
     }
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH 11/14] mirror: Fix potential use-after-free in active commit
  2018-09-07 16:15 [Qemu-devel] [PATCH 00/14] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (9 preceding siblings ...)
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 10/14] block-backend: Decrease in_flight only after callback Kevin Wolf
@ 2018-09-07 16:15 ` Kevin Wolf
  2018-09-11  8:31   ` Fam Zheng
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 12/14] blockjob: Lie better in child_job_drained_poll() Kevin Wolf
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Kevin Wolf @ 2018-09-07 16:15 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, qemu-devel

When starting an active commit job, other callbacks can run before
mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
go away. Add another pair of bdrv_ref/unref() around it to protect
against this case.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/mirror.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/block/mirror.c b/block/mirror.c
index 6cc10df5c9..c42999eadf 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1679,6 +1679,11 @@ void commit_active_start(const char *job_id, BlockDriverState *bs,
 
     orig_base_flags = bdrv_get_flags(base);
 
+    /* bdrv_reopen() drains, which might make the BDSes go away before a
+     * reference is taken in mirror_start_job(). */
+    bdrv_ref(bs);
+    bdrv_ref(base);
+
     if (bdrv_reopen(base, bs->open_flags, errp)) {
         return;
     }
@@ -1689,6 +1694,10 @@ void commit_active_start(const char *job_id, BlockDriverState *bs,
                      &commit_active_job_driver, false, base, auto_complete,
                      filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND,
                      &local_err);
+
+    bdrv_unref(bs);
+    bdrv_unref(base);
+
     if (local_err) {
         error_propagate(errp, local_err);
         goto error_restore_flags;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH 12/14] blockjob: Lie better in child_job_drained_poll()
  2018-09-07 16:15 [Qemu-devel] [PATCH 00/14] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (10 preceding siblings ...)
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 11/14] mirror: Fix potential use-after-free in active commit Kevin Wolf
@ 2018-09-07 16:15 ` Kevin Wolf
  2018-09-11  8:34   ` Fam Zheng
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 13/14] block: Remove aio_poll() in bdrv_drain_poll variants Kevin Wolf
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 14/14] test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level() Kevin Wolf
  13 siblings, 1 reply; 33+ messages in thread
From: Kevin Wolf @ 2018-09-07 16:15 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, qemu-devel

Block jobs claim in .drained_poll() that they are in a quiescent state
as soon as job->deferred_to_main_loop is true. This is obviously wrong,
they still have a completion BH to run. We only get away with this
because commit 91af091f923 added an unconditional aio_poll(false) to the
drain functions, but this is bypassing the regular drain mechanisms.

However, just removing this and telling that the job is still active
doesn't work either: The completion callbacks themselves call drain
functions (directly, or indirectly with bdrv_reopen), so they would
deadlock then.

As a better lie, tell that the job is active as long as the BH is
pending, but falsely call it quiescent from the point in the BH when the
completion callback is called. At this point, nested drain calls won't
deadlock because they ignore the job, and outer drains will wait for the
job to really reach a quiescent state because the callback is already
running.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/qemu/job.h | 3 +++
 blockjob.c         | 2 +-
 job.c              | 8 ++++++++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 8ac48dbd28..5a49d6f9a5 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -76,6 +76,9 @@ typedef struct Job {
      * Set to false by the job while the coroutine has yielded and may be
      * re-entered by job_enter(). There may still be I/O or event loop activity
      * pending. Accessed under block_job_mutex (in blockjob.c).
+     *
+     * When the job is deferred to the main loop, busy is true as long as the
+     * bottom half is still pending.
      */
     bool busy;
 
diff --git a/blockjob.c b/blockjob.c
index 8d27e8e1ea..617d86fe93 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -164,7 +164,7 @@ static bool child_job_drained_poll(BdrvChild *c)
     /* An inactive or completed job doesn't have any pending requests. Jobs
      * with !job->busy are either already paused or have a pause point after
      * being reentered, so no job driver code will run before they pause. */
-    if (!job->busy || job_is_completed(job) || job->deferred_to_main_loop) {
+    if (!job->busy || job_is_completed(job)) {
         return false;
     }
 
diff --git a/job.c b/job.c
index 8480eda188..d05795843b 100644
--- a/job.c
+++ b/job.c
@@ -978,6 +978,13 @@ static void job_defer_to_main_loop_bh(void *opaque)
     AioContext *aio_context = job->aio_context;
 
     aio_context_acquire(aio_context);
+
+    /* This is a lie, we're not quiescent, but still doing the completion
+     * callbacks. However, completion callbacks tend to involve operations that
+     * drain block nodes, and if .drained_poll still returned true, we would
+     * deadlock. */
+    job->busy = false;
+
     data->fn(data->job, data->opaque);
     aio_context_release(aio_context);
 
@@ -991,6 +998,7 @@ void job_defer_to_main_loop(Job *job, JobDeferToMainLoopFn *fn, void *opaque)
     data->fn = fn;
     data->opaque = opaque;
     job->deferred_to_main_loop = true;
+    job->busy = true;
 
     aio_bh_schedule_oneshot(qemu_get_aio_context(),
                             job_defer_to_main_loop_bh, data);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH 13/14] block: Remove aio_poll() in bdrv_drain_poll variants
  2018-09-07 16:15 [Qemu-devel] [PATCH 00/14] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (11 preceding siblings ...)
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 12/14] blockjob: Lie better in child_job_drained_poll() Kevin Wolf
@ 2018-09-07 16:15 ` Kevin Wolf
  2018-09-11  8:35   ` Fam Zheng
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 14/14] test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level() Kevin Wolf
  13 siblings, 1 reply; 33+ messages in thread
From: Kevin Wolf @ 2018-09-07 16:15 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, qemu-devel

bdrv_drain_poll_top_level() was buggy because it didn't release the
AioContext lock of the node to be drained before calling aio_poll().
This way, callbacks called by aio_poll() would possibly take the lock a
second time and run into a deadlock with a nested AIO_WAIT_WHILE() call.

However, it turns out that the aio_poll() call isn't actually needed any
more. It was introduced in commit 91af091f923, which is effectively
reverted by this patch. The cases it was supposed to fix are now covered
by bdrv_drain_poll(), which waits for block jobs to reach a quiescent
state.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/block/io.c b/block/io.c
index 914ba78f1a..8b81ff3913 100644
--- a/block/io.c
+++ b/block/io.c
@@ -268,10 +268,6 @@ bool bdrv_drain_poll(BlockDriverState *bs, bool recursive,
 static bool bdrv_drain_poll_top_level(BlockDriverState *bs, bool recursive,
                                       BdrvChild *ignore_parent)
 {
-    /* Execute pending BHs first and check everything else only after the BHs
-     * have executed. */
-    while (aio_poll(bs->aio_context, false));
-
     return bdrv_drain_poll(bs, recursive, ignore_parent, false);
 }
 
@@ -511,10 +507,6 @@ static bool bdrv_drain_all_poll(void)
     BlockDriverState *bs = NULL;
     bool result = false;
 
-    /* Execute pending BHs first (may modify the graph) and check everything
-     * else only after the BHs have executed. */
-    while (aio_poll(qemu_get_aio_context(), false));
-
     /* bdrv_drain_poll() can't make changes to the graph and we are holding the
      * main AioContext lock, so iterating bdrv_next_all_states() is safe. */
     while ((bs = bdrv_next_all_states(bs))) {
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH 14/14] test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level()
  2018-09-07 16:15 [Qemu-devel] [PATCH 00/14] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (12 preceding siblings ...)
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 13/14] block: Remove aio_poll() in bdrv_drain_poll variants Kevin Wolf
@ 2018-09-07 16:15 ` Kevin Wolf
  2018-09-11  8:37   ` Fam Zheng
  13 siblings, 1 reply; 33+ messages in thread
From: Kevin Wolf @ 2018-09-07 16:15 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, qemu-devel

This is a regression test for a deadlock that could occur in callbacks
called from the aio_poll() in bdrv_drain_poll_top_level(). The
AioContext lock wasn't released and therefore would be taken a second
time in the callback. This would cause a possible AIO_WAIT_WHILE() in
the callback to hang.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/test-bdrv-drain.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 9641a20dd8..07ce5bfddc 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -636,6 +636,17 @@ static void test_iothread_aio_cb(void *opaque, int ret)
     qemu_event_set(&done_event);
 }
 
+static void test_iothread_main_thread_bh(void *opaque)
+{
+    struct test_iothread_data *data = opaque;
+
+    /* Test that the AioContext is not yet locked in a random BH that is
+     * executed during drain, otherwise this would deadlock. */
+    aio_context_acquire(bdrv_get_aio_context(data->bs));
+    bdrv_flush(data->bs);
+    aio_context_release(bdrv_get_aio_context(data->bs));
+}
+
 /*
  * Starts an AIO request on a BDS that runs in the AioContext of iothread 1.
  * The request involves a BH on iothread 2 before it can complete.
@@ -705,6 +716,8 @@ static void test_iothread_common(enum drain_type drain_type, int drain_thread)
             aio_context_acquire(ctx_a);
         }
 
+        aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, &data);
+
         /* The request is running on the IOThread a. Draining its block device
          * will make sure that it has completed as far as the BDS is concerned,
          * but the drain in this thread can continue immediately after
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 01/14] blockjob: Wake up BDS when job becomes idle
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 01/14] blockjob: Wake up BDS when job becomes idle Kevin Wolf
@ 2018-09-11  7:58   ` Fam Zheng
  0 siblings, 0 replies; 33+ messages in thread
From: Fam Zheng @ 2018-09-11  7:58 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, pbonzini, slp, qemu-devel

On Fri, 09/07 18:15, Kevin Wolf wrote:
> In the context of draining a BDS, the .drained_poll callback of block
> jobs is called. If this returns true (i.e. there is still some activity
> pending), the drain operation may call aio_poll() with blocking=true to
> wait for completion.
> 
> As soon as the pending activity is completed and the job finally arrives
> in a quiescent state (i.e. its coroutine either yields with busy=false
> or terminates), the block job must notify the aio_poll() loop to wake
> up, otherwise we get a deadlock if both are running in different
> threads.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 02/14] test-bdrv-drain: Drain with block jobs in an I/O thread
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 02/14] test-bdrv-drain: Drain with block jobs in an I/O thread Kevin Wolf
@ 2018-09-11  8:09   ` Fam Zheng
  0 siblings, 0 replies; 33+ messages in thread
From: Fam Zheng @ 2018-09-11  8:09 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, pbonzini, slp, qemu-devel

On Fri, 09/07 18:15, Kevin Wolf wrote:
> This extends the existing drain test with a block job to include
> variants where the block job runs in a different AioContext.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 03/14] test-blockjob: Acquire AioContext around job_finish_sync()
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 03/14] test-blockjob: Acquire AioContext around job_finish_sync() Kevin Wolf
@ 2018-09-11  8:11   ` Fam Zheng
  0 siblings, 0 replies; 33+ messages in thread
From: Fam Zheng @ 2018-09-11  8:11 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, pbonzini, slp, qemu-devel

On Fri, 09/07 18:15, Kevin Wolf wrote:
> All callers in QEMU proper hold the AioContext lock when calling
> job_finish_sync(). test-blockjob should do the same.

I think s/job_finish_sync/job_cancel_sync/ in the subject is more accurate?

Reviewed-by: Fam Zheng <famz@redhat.com>

> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/qemu/job.h    | 6 ++++++
>  tests/test-blockjob.c | 6 ++++++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index 0dae5b8481..8ac48dbd28 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -520,6 +520,8 @@ void job_user_cancel(Job *job, bool force, Error **errp);
>   *
>   * Returns the return value from the job if the job actually completed
>   * during the call, or -ECANCELED if it was canceled.
> + *
> + * Callers must hold the AioContext lock of job->aio_context.
>   */
>  int job_cancel_sync(Job *job);
>  
> @@ -537,6 +539,8 @@ void job_cancel_sync_all(void);
>   * function).
>   *
>   * Returns the return value from the job.
> + *
> + * Callers must hold the AioContext lock of job->aio_context.
>   */
>  int job_complete_sync(Job *job, Error **errp);
>  
> @@ -579,6 +583,8 @@ void job_defer_to_main_loop(Job *job, JobDeferToMainLoopFn *fn, void *opaque);
>   *
>   * Returns 0 if the job is successfully completed, -ECANCELED if the job was
>   * cancelled before completing, and -errno in other error cases.
> + *
> + * Callers must hold the AioContext lock of job->aio_context.
>   */
>  int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp);
>  
> diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
> index cb42f06e61..8c2babbe35 100644
> --- a/tests/test-blockjob.c
> +++ b/tests/test-blockjob.c
> @@ -230,6 +230,10 @@ static void cancel_common(CancelJob *s)
>      BlockJob *job = &s->common;
>      BlockBackend *blk = s->blk;
>      JobStatus sts = job->job.status;
> +    AioContext *ctx;
> +
> +    ctx = job->job.aio_context;
> +    aio_context_acquire(ctx);
>  
>      job_cancel_sync(&job->job);
>      if (sts != JOB_STATUS_CREATED && sts != JOB_STATUS_CONCLUDED) {
> @@ -239,6 +243,8 @@ static void cancel_common(CancelJob *s)
>      assert(job->job.status == JOB_STATUS_NULL);
>      job_unref(&job->job);
>      destroy_blk(blk);
> +
> +    aio_context_release(ctx);
>  }
>  
>  static void test_cancel_created(void)
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 04/14] job: Use AIO_WAIT_WHILE() in job_finish_sync()
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 04/14] job: Use AIO_WAIT_WHILE() in job_finish_sync() Kevin Wolf
@ 2018-09-11  8:17   ` Fam Zheng
  0 siblings, 0 replies; 33+ messages in thread
From: Fam Zheng @ 2018-09-11  8:17 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, pbonzini, slp, qemu-devel

On Fri, 09/07 18:15, Kevin Wolf wrote:
> job_finish_sync() needs to release the AioContext lock of the job before
> calling aio_poll(). Otherwise, callbacks called by aio_poll() would
> possibly take the lock a second time and run into a deadlock with a
> nested AIO_WAIT_WHILE() call.
> 
> Also, job_drain() without aio_poll() isn't necessarily enough to make
> progress on a job, it could depend on bottom halves to be executed.
> 
> Combine both open-coded while loops into a single AIO_WAIT_WHILE() call
> that solves both of these problems.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  job.c | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/job.c b/job.c
> index 9ad0b7476a..8480eda188 100644
> --- a/job.c
> +++ b/job.c
> @@ -29,6 +29,7 @@
>  #include "qemu/job.h"
>  #include "qemu/id.h"
>  #include "qemu/main-loop.h"
> +#include "block/aio-wait.h"
>  #include "trace-root.h"
>  #include "qapi/qapi-events-job.h"
>  
> @@ -998,6 +999,7 @@ void job_defer_to_main_loop(Job *job, JobDeferToMainLoopFn *fn, void *opaque)
>  int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
>  {
>      Error *local_err = NULL;
> +    AioWait dummy_wait = {};
>      int ret;
>  
>      job_ref(job);
> @@ -1010,14 +1012,10 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
>          job_unref(job);
>          return -EBUSY;
>      }
> -    /* job_drain calls job_enter, and it should be enough to induce progress
> -     * until the job completes or moves to the main thread. */
> -    while (!job->deferred_to_main_loop && !job_is_completed(job)) {
> -        job_drain(job);
> -    }
> -    while (!job_is_completed(job)) {
> -        aio_poll(qemu_get_aio_context(), true);
> -    }
> +
> +    AIO_WAIT_WHILE(&dummy_wait, job->aio_context,
> +                   (job_drain(job), !job_is_completed(job)));

The condition expression would read more elegant if job_drain() returns
progress.

Reviewed-by: Fam Zheng <famz@redhat.com>

> +
>      ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : job->ret;
>      job_unref(job);
>      return ret;
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 05/14] test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 05/14] test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback Kevin Wolf
@ 2018-09-11  8:17   ` Fam Zheng
  0 siblings, 0 replies; 33+ messages in thread
From: Fam Zheng @ 2018-09-11  8:17 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, pbonzini, slp, qemu-devel

On Fri, 09/07 18:15, Kevin Wolf wrote:
> This is a regression test for a deadlock that occurred in block job
> completion callbacks (via job_defer_to_main_loop) because the AioContext
> lock was taken twice: once in job_finish_sync() and then again in
> job_defer_to_main_loop_bh(). This would cause AIO_WAIT_WHILE() to hang.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 06/14] block: Add missing locking in bdrv_co_drain_bh_cb()
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 06/14] block: Add missing locking in bdrv_co_drain_bh_cb() Kevin Wolf
@ 2018-09-11  8:23   ` Fam Zheng
  2018-09-11  9:17     ` Kevin Wolf
  0 siblings, 1 reply; 33+ messages in thread
From: Fam Zheng @ 2018-09-11  8:23 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, pbonzini, slp, qemu-devel

On Fri, 09/07 18:15, Kevin Wolf wrote:
> bdrv_do_drained_begin/end() assume that they are called with the
> AioContext lock of bs held. If we call drain functions from a coroutine
> with the AioContext lock held, we yield and schedule a BH to move out of
> coroutine context. This means that the lock for the home context of the
> coroutine is released and must be re-acquired in the bottom half.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/qemu/coroutine.h |  5 +++++
>  block/io.c               | 15 +++++++++++++++
>  util/qemu-coroutine.c    |  5 +++++
>  3 files changed, 25 insertions(+)
> 
> diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
> index 6f8a487041..9801e7f5a4 100644
> --- a/include/qemu/coroutine.h
> +++ b/include/qemu/coroutine.h
> @@ -90,6 +90,11 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co);
>  void coroutine_fn qemu_coroutine_yield(void);
>  
>  /**
> + * Get the AioContext of the given coroutine
> + */
> +AioContext *coroutine_fn qemu_coroutine_get_aio_context(Coroutine *co);
> +
> +/**
>   * Get the currently executing coroutine
>   */
>  Coroutine *coroutine_fn qemu_coroutine_self(void);
> diff --git a/block/io.c b/block/io.c
> index 7100344c7b..914ba78f1a 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -288,6 +288,18 @@ static void bdrv_co_drain_bh_cb(void *opaque)
>      BlockDriverState *bs = data->bs;
>  
>      if (bs) {
> +        AioContext *ctx = bdrv_get_aio_context(bs);
> +        AioContext *co_ctx = qemu_coroutine_get_aio_context(co);
> +
> +        /*
> +         * When the coroutine yielded, the lock for its home context was
> +         * released, so we need to re-acquire it here. If it explicitly
> +         * acquired a different context, the lock is still held and we don't
> +         * want to lock it a second time (or AIO_WAIT_WHILE() would hang).
> +         */

This condition is rather obscure. When is ctx not equal to co_ctx?

> +        if (ctx == co_ctx) {
> +            aio_context_acquire(ctx);
> +        }
>          bdrv_dec_in_flight(bs);
>          if (data->begin) {
>              bdrv_do_drained_begin(bs, data->recursive, data->parent,
> @@ -296,6 +308,9 @@ static void bdrv_co_drain_bh_cb(void *opaque)
>              bdrv_do_drained_end(bs, data->recursive, data->parent,
>                                  data->ignore_bds_parents);
>          }
> +        if (ctx == co_ctx) {
> +            aio_context_release(ctx);
> +        }
>      } else {
>          assert(data->begin);
>          bdrv_drain_all_begin();
> diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> index 1ba4191b84..2295928d33 100644
> --- a/util/qemu-coroutine.c
> +++ b/util/qemu-coroutine.c
> @@ -198,3 +198,8 @@ bool qemu_coroutine_entered(Coroutine *co)
>  {
>      return co->caller;
>  }
> +
> +AioContext *coroutine_fn qemu_coroutine_get_aio_context(Coroutine *co)
> +{
> +    return co->ctx;
> +}
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 07/14] aio-wait: Increase num_waiters even in home thread
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 07/14] aio-wait: Increase num_waiters even in home thread Kevin Wolf
@ 2018-09-11  8:25   ` Fam Zheng
  0 siblings, 0 replies; 33+ messages in thread
From: Fam Zheng @ 2018-09-11  8:25 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, pbonzini, slp, qemu-devel

On Fri, 09/07 18:15, Kevin Wolf wrote:
> Even if AIO_WAIT_WHILE() is called in the home context of the
> AioContext, we still want to allow the condition to change depending on
> other threads as long as they kick the AioWait. Specfically block jobs
> can be running in an I/O thread and should then be able to kick a drain
> in the main loop context.

We can now move the atomic_inc/atomic_dec pair outside the if/else block,
but that's cosmetic.

Reviewed-by: Fam Zheng <famz@redhat.com>


> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/block/aio-wait.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
> index c85a62f798..46ba7f9111 100644
> --- a/include/block/aio-wait.h
> +++ b/include/block/aio-wait.h
> @@ -77,10 +77,12 @@ typedef struct {
>      AioWait *wait_ = (wait);                                       \
>      AioContext *ctx_ = (ctx);                                      \
>      if (ctx_ && in_aio_context_home_thread(ctx_)) {                \
> +        atomic_inc(&wait_->num_waiters);                           \
>          while ((cond)) {                                           \
>              aio_poll(ctx_, true);                                  \
>              waited_ = true;                                        \
>          }                                                          \
> +        atomic_dec(&wait_->num_waiters);                           \
>      } else {                                                       \
>          assert(qemu_get_current_aio_context() ==                   \
>                 qemu_get_aio_context());                            \
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 08/14] block-backend: Add .drained_poll callback
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 08/14] block-backend: Add .drained_poll callback Kevin Wolf
@ 2018-09-11  8:26   ` Fam Zheng
  0 siblings, 0 replies; 33+ messages in thread
From: Fam Zheng @ 2018-09-11  8:26 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, pbonzini, slp, qemu-devel

On Fri, 09/07 18:15, Kevin Wolf wrote:
> A bdrv_drain operation must ensure that all parents are quiesced, this
> includes BlockBackends. Otherwise, callbacks called by requests that are
> completed on the BDS layer, but not quite yet on the BlockBackend layer
> could still create new requests.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 09/14] block-backend: Fix potential double blk_delete()
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 09/14] block-backend: Fix potential double blk_delete() Kevin Wolf
@ 2018-09-11  8:27   ` Fam Zheng
  0 siblings, 0 replies; 33+ messages in thread
From: Fam Zheng @ 2018-09-11  8:27 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, pbonzini, slp, qemu-devel

On Fri, 09/07 18:15, Kevin Wolf wrote:
> blk_unref() first decreases the refcount of the BlockBackend and calls
> blk_delete() if the refcount reaches zero. Requests can still be in
> flight at this point, they are only drained during blk_delete():
> 
> At this point, arbitrary callbacks can run. If any callback takes a
> temporary BlockBackend reference, it will first increase the refcount to
> 1 and then decrease it to 0 again, triggering another blk_delete(). This
> will cause a use-after-free crash in the outer blk_delete().
> 
> Fix it by draining the BlockBackend before decreasing to refcount to 0.
> Assert in blk_ref() that it never takes the first refcount (which would
> mean that the BlockBackend is already being deleted).
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Good one!

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 10/14] block-backend: Decrease in_flight only after callback
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 10/14] block-backend: Decrease in_flight only after callback Kevin Wolf
@ 2018-09-11  8:29   ` Fam Zheng
  0 siblings, 0 replies; 33+ messages in thread
From: Fam Zheng @ 2018-09-11  8:29 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, pbonzini, slp, qemu-devel

On Fri, 09/07 18:15, Kevin Wolf wrote:
> Request callbacks can do pretty much anything, including operations that
> will yield from the coroutine (such as draining the backend). In that
> case, a decreased in_flight would be visible to other code and could
> lead to a drain completing while the callback hasn't actually completed
> yet.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 11/14] mirror: Fix potential use-after-free in active commit
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 11/14] mirror: Fix potential use-after-free in active commit Kevin Wolf
@ 2018-09-11  8:31   ` Fam Zheng
  2018-09-11  9:32     ` Kevin Wolf
  0 siblings, 1 reply; 33+ messages in thread
From: Fam Zheng @ 2018-09-11  8:31 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, pbonzini, slp, qemu-devel

On Fri, 09/07 18:15, Kevin Wolf wrote:
> When starting an active commit job, other callbacks can run before
> mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
> go away. Add another pair of bdrv_ref/unref() around it to protect
> against this case.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/mirror.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 6cc10df5c9..c42999eadf 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1679,6 +1679,11 @@ void commit_active_start(const char *job_id, BlockDriverState *bs,
>  
>      orig_base_flags = bdrv_get_flags(base);
>  
> +    /* bdrv_reopen() drains, which might make the BDSes go away before a
> +     * reference is taken in mirror_start_job(). */
> +    bdrv_ref(bs);
> +    bdrv_ref(base);
> +
>      if (bdrv_reopen(base, bs->open_flags, errp)) {

Doesn't it need bdrv_unref's in this branch?

>          return;
>      }
> @@ -1689,6 +1694,10 @@ void commit_active_start(const char *job_id, BlockDriverState *bs,
>                       &commit_active_job_driver, false, base, auto_complete,
>                       filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND,
>                       &local_err);
> +
> +    bdrv_unref(bs);
> +    bdrv_unref(base);
> +
>      if (local_err) {
>          error_propagate(errp, local_err);
>          goto error_restore_flags;
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 12/14] blockjob: Lie better in child_job_drained_poll()
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 12/14] blockjob: Lie better in child_job_drained_poll() Kevin Wolf
@ 2018-09-11  8:34   ` Fam Zheng
  0 siblings, 0 replies; 33+ messages in thread
From: Fam Zheng @ 2018-09-11  8:34 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, pbonzini, slp, qemu-devel

On Fri, 09/07 18:15, Kevin Wolf wrote:
> Block jobs claim in .drained_poll() that they are in a quiescent state
> as soon as job->deferred_to_main_loop is true. This is obviously wrong,
> they still have a completion BH to run. We only get away with this
> because commit 91af091f923 added an unconditional aio_poll(false) to the
> drain functions, but this is bypassing the regular drain mechanisms.
> 
> However, just removing this and telling that the job is still active
> doesn't work either: The completion callbacks themselves call drain
> functions (directly, or indirectly with bdrv_reopen), so they would
> deadlock then.
> 
> As a better lie, tell that the job is active as long as the BH is
> pending, but falsely call it quiescent from the point in the BH when the
> completion callback is called. At this point, nested drain calls won't
> deadlock because they ignore the job, and outer drains will wait for the
> job to really reach a quiescent state because the callback is already
> running.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 13/14] block: Remove aio_poll() in bdrv_drain_poll variants
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 13/14] block: Remove aio_poll() in bdrv_drain_poll variants Kevin Wolf
@ 2018-09-11  8:35   ` Fam Zheng
  0 siblings, 0 replies; 33+ messages in thread
From: Fam Zheng @ 2018-09-11  8:35 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, pbonzini, slp, qemu-devel

On Fri, 09/07 18:15, Kevin Wolf wrote:
> bdrv_drain_poll_top_level() was buggy because it didn't release the
> AioContext lock of the node to be drained before calling aio_poll().
> This way, callbacks called by aio_poll() would possibly take the lock a
> second time and run into a deadlock with a nested AIO_WAIT_WHILE() call.
> 
> However, it turns out that the aio_poll() call isn't actually needed any
> more. It was introduced in commit 91af091f923, which is effectively
> reverted by this patch. The cases it was supposed to fix are now covered
> by bdrv_drain_poll(), which waits for block jobs to reach a quiescent
> state.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 14/14] test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level()
  2018-09-07 16:15 ` [Qemu-devel] [PATCH 14/14] test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level() Kevin Wolf
@ 2018-09-11  8:37   ` Fam Zheng
  0 siblings, 0 replies; 33+ messages in thread
From: Fam Zheng @ 2018-09-11  8:37 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, pbonzini, slp, qemu-devel

On Fri, 09/07 18:15, Kevin Wolf wrote:
> This is a regression test for a deadlock that could occur in callbacks
> called from the aio_poll() in bdrv_drain_poll_top_level(). The
> AioContext lock wasn't released and therefore would be taken a second
> time in the callback. This would cause a possible AIO_WAIT_WHILE() in
> the callback to hang.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 06/14] block: Add missing locking in bdrv_co_drain_bh_cb()
  2018-09-11  8:23   ` Fam Zheng
@ 2018-09-11  9:17     ` Kevin Wolf
  2018-09-11  9:28       ` Sergio Lopez
  0 siblings, 1 reply; 33+ messages in thread
From: Kevin Wolf @ 2018-09-11  9:17 UTC (permalink / raw)
  To: Fam Zheng; +Cc: qemu-block, mreitz, pbonzini, slp, qemu-devel

Am 11.09.2018 um 10:23 hat Fam Zheng geschrieben:
> On Fri, 09/07 18:15, Kevin Wolf wrote:
> > bdrv_do_drained_begin/end() assume that they are called with the
> > AioContext lock of bs held. If we call drain functions from a coroutine
> > with the AioContext lock held, we yield and schedule a BH to move out of
> > coroutine context. This means that the lock for the home context of the
> > coroutine is released and must be re-acquired in the bottom half.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  include/qemu/coroutine.h |  5 +++++
> >  block/io.c               | 15 +++++++++++++++
> >  util/qemu-coroutine.c    |  5 +++++
> >  3 files changed, 25 insertions(+)
> > 
> > diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
> > index 6f8a487041..9801e7f5a4 100644
> > --- a/include/qemu/coroutine.h
> > +++ b/include/qemu/coroutine.h
> > @@ -90,6 +90,11 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co);
> >  void coroutine_fn qemu_coroutine_yield(void);
> >  
> >  /**
> > + * Get the AioContext of the given coroutine
> > + */
> > +AioContext *coroutine_fn qemu_coroutine_get_aio_context(Coroutine *co);
> > +
> > +/**
> >   * Get the currently executing coroutine
> >   */
> >  Coroutine *coroutine_fn qemu_coroutine_self(void);
> > diff --git a/block/io.c b/block/io.c
> > index 7100344c7b..914ba78f1a 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -288,6 +288,18 @@ static void bdrv_co_drain_bh_cb(void *opaque)
> >      BlockDriverState *bs = data->bs;
> >  
> >      if (bs) {
> > +        AioContext *ctx = bdrv_get_aio_context(bs);
> > +        AioContext *co_ctx = qemu_coroutine_get_aio_context(co);
> > +
> > +        /*
> > +         * When the coroutine yielded, the lock for its home context was
> > +         * released, so we need to re-acquire it here. If it explicitly
> > +         * acquired a different context, the lock is still held and we don't
> > +         * want to lock it a second time (or AIO_WAIT_WHILE() would hang).
> > +         */
> 
> This condition is rather obscure. When is ctx not equal to co_ctx?

Whenever you drain a BlockDriverState that is in a different AioContext.
The common case is a bdrv_drain() from the main loop thread for a BDS in
an iothread.

I didn't have this condition at first and ran into deadlocks (because
AIO_WAIT_WHILE() dropped the lock only once, but it was locked twice).

Kevin

> > +        if (ctx == co_ctx) {
> > +            aio_context_acquire(ctx);
> > +        }
> >          bdrv_dec_in_flight(bs);
> >          if (data->begin) {
> >              bdrv_do_drained_begin(bs, data->recursive, data->parent,
> > @@ -296,6 +308,9 @@ static void bdrv_co_drain_bh_cb(void *opaque)
> >              bdrv_do_drained_end(bs, data->recursive, data->parent,
> >                                  data->ignore_bds_parents);
> >          }
> > +        if (ctx == co_ctx) {
> > +            aio_context_release(ctx);
> > +        }
> >      } else {
> >          assert(data->begin);
> >          bdrv_drain_all_begin();

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 06/14] block: Add missing locking in bdrv_co_drain_bh_cb()
  2018-09-11  9:17     ` Kevin Wolf
@ 2018-09-11  9:28       ` Sergio Lopez
  2018-09-11 10:22         ` Kevin Wolf
  0 siblings, 1 reply; 33+ messages in thread
From: Sergio Lopez @ 2018-09-11  9:28 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Fam Zheng, qemu-block, mreitz, pbonzini, qemu-devel

On Tue, Sep 11, 2018 at 11:17:20AM +0200, Kevin Wolf wrote:
> Am 11.09.2018 um 10:23 hat Fam Zheng geschrieben:
> > On Fri, 09/07 18:15, Kevin Wolf wrote:
> > > bdrv_do_drained_begin/end() assume that they are called with the
> > > AioContext lock of bs held. If we call drain functions from a coroutine
> > > with the AioContext lock held, we yield and schedule a BH to move out of
> > > coroutine context. This means that the lock for the home context of the
> > > coroutine is released and must be re-acquired in the bottom half.
> > > 
> > > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > > ---
> > >  include/qemu/coroutine.h |  5 +++++
> > >  block/io.c               | 15 +++++++++++++++
> > >  util/qemu-coroutine.c    |  5 +++++
> > >  3 files changed, 25 insertions(+)
> > > 
> > > diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
> > > index 6f8a487041..9801e7f5a4 100644
> > > --- a/include/qemu/coroutine.h
> > > +++ b/include/qemu/coroutine.h
> > > @@ -90,6 +90,11 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co);
> > >  void coroutine_fn qemu_coroutine_yield(void);
> > >  
> > >  /**
> > > + * Get the AioContext of the given coroutine
> > > + */
> > > +AioContext *coroutine_fn qemu_coroutine_get_aio_context(Coroutine *co);
> > > +
> > > +/**
> > >   * Get the currently executing coroutine
> > >   */
> > >  Coroutine *coroutine_fn qemu_coroutine_self(void);
> > > diff --git a/block/io.c b/block/io.c
> > > index 7100344c7b..914ba78f1a 100644
> > > --- a/block/io.c
> > > +++ b/block/io.c
> > > @@ -288,6 +288,18 @@ static void bdrv_co_drain_bh_cb(void *opaque)
> > >      BlockDriverState *bs = data->bs;
> > >  
> > >      if (bs) {
> > > +        AioContext *ctx = bdrv_get_aio_context(bs);
> > > +        AioContext *co_ctx = qemu_coroutine_get_aio_context(co);
> > > +
> > > +        /*
> > > +         * When the coroutine yielded, the lock for its home context was
> > > +         * released, so we need to re-acquire it here. If it explicitly
> > > +         * acquired a different context, the lock is still held and we don't
> > > +         * want to lock it a second time (or AIO_WAIT_WHILE() would hang).
> > > +         */
> > 
> > This condition is rather obscure. When is ctx not equal to co_ctx?
> 
> Whenever you drain a BlockDriverState that is in a different AioContext.
> The common case is a bdrv_drain() from the main loop thread for a BDS in
> an iothread.

Isn't this a consequence of using qemu_coroutine_enter in co_schedule_bh
[1]?

AFAIK, even if an IOThread's AioContext is being polled by the main loop
thread, all coroutines should be running with the IOThread/BDS
AioContext.

Sergio.

[1] https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg00450.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 11/14] mirror: Fix potential use-after-free in active commit
  2018-09-11  8:31   ` Fam Zheng
@ 2018-09-11  9:32     ` Kevin Wolf
  0 siblings, 0 replies; 33+ messages in thread
From: Kevin Wolf @ 2018-09-11  9:32 UTC (permalink / raw)
  To: Fam Zheng; +Cc: qemu-block, mreitz, pbonzini, slp, qemu-devel

Am 11.09.2018 um 10:31 hat Fam Zheng geschrieben:
> On Fri, 09/07 18:15, Kevin Wolf wrote:
> > When starting an active commit job, other callbacks can run before
> > mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
> > go away. Add another pair of bdrv_ref/unref() around it to protect
> > against this case.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/mirror.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/block/mirror.c b/block/mirror.c
> > index 6cc10df5c9..c42999eadf 100644
> > --- a/block/mirror.c
> > +++ b/block/mirror.c
> > @@ -1679,6 +1679,11 @@ void commit_active_start(const char *job_id, BlockDriverState *bs,
> >  
> >      orig_base_flags = bdrv_get_flags(base);
> >  
> > +    /* bdrv_reopen() drains, which might make the BDSes go away before a
> > +     * reference is taken in mirror_start_job(). */
> > +    bdrv_ref(bs);
> > +    bdrv_ref(base);
> > +
> >      if (bdrv_reopen(base, bs->open_flags, errp)) {
> 
> Doesn't it need bdrv_unref's in this branch?

Yes, of course. Thanks for catching this!

Kevin

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH 06/14] block: Add missing locking in bdrv_co_drain_bh_cb()
  2018-09-11  9:28       ` Sergio Lopez
@ 2018-09-11 10:22         ` Kevin Wolf
  0 siblings, 0 replies; 33+ messages in thread
From: Kevin Wolf @ 2018-09-11 10:22 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: Fam Zheng, qemu-block, mreitz, pbonzini, qemu-devel

Am 11.09.2018 um 11:28 hat Sergio Lopez geschrieben:
> On Tue, Sep 11, 2018 at 11:17:20AM +0200, Kevin Wolf wrote:
> > Am 11.09.2018 um 10:23 hat Fam Zheng geschrieben:
> > > On Fri, 09/07 18:15, Kevin Wolf wrote:
> > > > bdrv_do_drained_begin/end() assume that they are called with the
> > > > AioContext lock of bs held. If we call drain functions from a coroutine
> > > > with the AioContext lock held, we yield and schedule a BH to move out of
> > > > coroutine context. This means that the lock for the home context of the
> > > > coroutine is released and must be re-acquired in the bottom half.
> > > > 
> > > > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > > > ---
> > > >  include/qemu/coroutine.h |  5 +++++
> > > >  block/io.c               | 15 +++++++++++++++
> > > >  util/qemu-coroutine.c    |  5 +++++
> > > >  3 files changed, 25 insertions(+)
> > > > 
> > > > diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
> > > > index 6f8a487041..9801e7f5a4 100644
> > > > --- a/include/qemu/coroutine.h
> > > > +++ b/include/qemu/coroutine.h
> > > > @@ -90,6 +90,11 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co);
> > > >  void coroutine_fn qemu_coroutine_yield(void);
> > > >  
> > > >  /**
> > > > + * Get the AioContext of the given coroutine
> > > > + */
> > > > +AioContext *coroutine_fn qemu_coroutine_get_aio_context(Coroutine *co);
> > > > +
> > > > +/**
> > > >   * Get the currently executing coroutine
> > > >   */
> > > >  Coroutine *coroutine_fn qemu_coroutine_self(void);
> > > > diff --git a/block/io.c b/block/io.c
> > > > index 7100344c7b..914ba78f1a 100644
> > > > --- a/block/io.c
> > > > +++ b/block/io.c
> > > > @@ -288,6 +288,18 @@ static void bdrv_co_drain_bh_cb(void *opaque)
> > > >      BlockDriverState *bs = data->bs;
> > > >  
> > > >      if (bs) {
> > > > +        AioContext *ctx = bdrv_get_aio_context(bs);
> > > > +        AioContext *co_ctx = qemu_coroutine_get_aio_context(co);
> > > > +
> > > > +        /*
> > > > +         * When the coroutine yielded, the lock for its home context was
> > > > +         * released, so we need to re-acquire it here. If it explicitly
> > > > +         * acquired a different context, the lock is still held and we don't
> > > > +         * want to lock it a second time (or AIO_WAIT_WHILE() would hang).
> > > > +         */
> > > 
> > > This condition is rather obscure. When is ctx not equal to co_ctx?
> > 
> > Whenever you drain a BlockDriverState that is in a different AioContext.
> > The common case is a bdrv_drain() from the main loop thread for a BDS in
> > an iothread.
> 
> Isn't this a consequence of using qemu_coroutine_enter in co_schedule_bh
> [1]?
> 
> AFAIK, even if an IOThread's AioContext is being polled by the main loop
> thread, all coroutines should be running with the IOThread/BDS
> AioContext.

You're right, bdrv_co_yield_to_drain() does schedule the BH in the
AioContext of the BDS, so in theory this shouldn't happen. If it was
called from a coroutine with a wrong co->ctx (due to the bug you
mentioned), that would explain the behaviour. Maybe the condition isn't
necessary any more after your fix.

Kevin

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2018-09-11 10:22 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-07 16:15 [Qemu-devel] [PATCH 00/14] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
2018-09-07 16:15 ` [Qemu-devel] [PATCH 01/14] blockjob: Wake up BDS when job becomes idle Kevin Wolf
2018-09-11  7:58   ` Fam Zheng
2018-09-07 16:15 ` [Qemu-devel] [PATCH 02/14] test-bdrv-drain: Drain with block jobs in an I/O thread Kevin Wolf
2018-09-11  8:09   ` Fam Zheng
2018-09-07 16:15 ` [Qemu-devel] [PATCH 03/14] test-blockjob: Acquire AioContext around job_finish_sync() Kevin Wolf
2018-09-11  8:11   ` Fam Zheng
2018-09-07 16:15 ` [Qemu-devel] [PATCH 04/14] job: Use AIO_WAIT_WHILE() in job_finish_sync() Kevin Wolf
2018-09-11  8:17   ` Fam Zheng
2018-09-07 16:15 ` [Qemu-devel] [PATCH 05/14] test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback Kevin Wolf
2018-09-11  8:17   ` Fam Zheng
2018-09-07 16:15 ` [Qemu-devel] [PATCH 06/14] block: Add missing locking in bdrv_co_drain_bh_cb() Kevin Wolf
2018-09-11  8:23   ` Fam Zheng
2018-09-11  9:17     ` Kevin Wolf
2018-09-11  9:28       ` Sergio Lopez
2018-09-11 10:22         ` Kevin Wolf
2018-09-07 16:15 ` [Qemu-devel] [PATCH 07/14] aio-wait: Increase num_waiters even in home thread Kevin Wolf
2018-09-11  8:25   ` Fam Zheng
2018-09-07 16:15 ` [Qemu-devel] [PATCH 08/14] block-backend: Add .drained_poll callback Kevin Wolf
2018-09-11  8:26   ` Fam Zheng
2018-09-07 16:15 ` [Qemu-devel] [PATCH 09/14] block-backend: Fix potential double blk_delete() Kevin Wolf
2018-09-11  8:27   ` Fam Zheng
2018-09-07 16:15 ` [Qemu-devel] [PATCH 10/14] block-backend: Decrease in_flight only after callback Kevin Wolf
2018-09-11  8:29   ` Fam Zheng
2018-09-07 16:15 ` [Qemu-devel] [PATCH 11/14] mirror: Fix potential use-after-free in active commit Kevin Wolf
2018-09-11  8:31   ` Fam Zheng
2018-09-11  9:32     ` Kevin Wolf
2018-09-07 16:15 ` [Qemu-devel] [PATCH 12/14] blockjob: Lie better in child_job_drained_poll() Kevin Wolf
2018-09-11  8:34   ` Fam Zheng
2018-09-07 16:15 ` [Qemu-devel] [PATCH 13/14] block: Remove aio_poll() in bdrv_drain_poll variants Kevin Wolf
2018-09-11  8:35   ` Fam Zheng
2018-09-07 16:15 ` [Qemu-devel] [PATCH 14/14] test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level() Kevin Wolf
2018-09-11  8:37   ` Fam Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.