All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/17] Fix some jobs/drain/aio_poll related hangs
@ 2018-09-13 12:52 Kevin Wolf
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 01/17] job: Fix missing locking due to mismerge Kevin Wolf
                   ` (16 more replies)
  0 siblings, 17 replies; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 12:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, jsnow, qemu-devel

Especially the combination of iothreads, block jobs and drain tends to
lead to hangs currently. This series fixes a few of these bugs, although
there are more of them, to be addressed in separate patches.

The primary goal of this series is to fix the scenario from:
https://bugzilla.redhat.com/show_bug.cgi?id=1601212

A simplified reproducer of the reported problem looks like this (two concurrent
commit block jobs for disks in an iothread):

$qemu -qmp stdio \
    -object iothread,id=iothread1 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x6,iothread=iothread1 \
    -drive  id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=hd0 \
    -device scsi-hd,drive=drive_image1,id=image1,bootindex=1 \
    -drive  id=drive_image2,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=hd1 \
    -device scsi-hd,drive=drive_image2,id=image2,bootindex=2

{"execute":"qmp_capabilities"}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image1","snapshot-file":"sn1"}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image1","snapshot-file":"sn11"}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image1","snapshot-file":"sn111"}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image2","snapshot-file":"sn2"}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image2","snapshot-file":"sn22"}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image2","snapshot-file":"sn222"}}

{ "execute": "block-commit", "arguments": { "device": "drive_image2","base":"sn2","backing-file":"sn2","top":"sn22"}}
{ "execute": "block-commit", "arguments": { "device": "drive_image1","base":"sn1","backing-file":"sn1","top":"sn11"}}

{"execute":"quit"}

v2:
- Rebased on top of mreitz/block (including fixes for new bugs: patch 1 and 16)
- Patch 12: Added missing bdrv_unref() calls in error path [Fam]

Kevin Wolf (17):
  job: Fix missing locking due to mismerge
  blockjob: Wake up BDS when job becomes idle
  aio-wait: Increase num_waiters even in home thread
  test-bdrv-drain: Drain with block jobs in an I/O thread
  test-blockjob: Acquire AioContext around job_cancel_sync()
  job: Use AIO_WAIT_WHILE() in job_finish_sync()
  test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback
  block: Add missing locking in bdrv_co_drain_bh_cb()
  block-backend: Add .drained_poll callback
  block-backend: Fix potential double blk_delete()
  block-backend: Decrease in_flight only after callback
  mirror: Fix potential use-after-free in active commit
  blockjob: Lie better in child_job_drained_poll()
  block: Remove aio_poll() in bdrv_drain_poll variants
  test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level()
  job: Avoid deadlocks in job_completed_txn_abort()
  test-bdrv-drain: AIO_WAIT_WHILE() in job .commit/.abort

 include/block/aio-wait.h |   2 +
 include/block/blockjob.h |  13 +++
 include/qemu/coroutine.h |   5 ++
 include/qemu/job.h       |  12 +++
 block/block-backend.c    |  26 +++++-
 block/io.c               |  23 +++--
 block/mirror.c           |  11 +++
 blockjob.c               |  20 ++++-
 job.c                    |  50 ++++++++---
 tests/test-bdrv-drain.c  | 215 ++++++++++++++++++++++++++++++++++++++++++++---
 tests/test-blockjob.c    |   6 ++
 util/qemu-coroutine.c    |   5 ++
 12 files changed, 354 insertions(+), 34 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v2 01/17] job: Fix missing locking due to mismerge
  2018-09-13 12:52 [Qemu-devel] [PATCH v2 00/17] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
@ 2018-09-13 12:52 ` Kevin Wolf
  2018-09-13 13:56   ` Max Reitz
  2018-09-13 17:38   ` John Snow
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 02/17] blockjob: Wake up BDS when job becomes idle Kevin Wolf
                   ` (15 subsequent siblings)
  16 siblings, 2 replies; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 12:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, jsnow, qemu-devel

job_completed() had a problem with double locking that was recently
fixed independently by two different commits:

"job: Fix nested aio_poll() hanging in job_txn_apply"
"jobs: add exit shim"

One fix removed the first aio_context_acquire(), the other fix removed
the other one. Now we have a bug again and the code is run without any
locking.

Add it back in one of the places.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 job.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/job.c b/job.c
index 82b46929bd..5c4e84f007 100644
--- a/job.c
+++ b/job.c
@@ -847,7 +847,11 @@ static void job_completed(Job *job)
 static void job_exit(void *opaque)
 {
     Job *job = (Job *)opaque;
+    AioContext *ctx = job->aio_context;
+
+    aio_context_acquire(ctx);
     job_completed(job);
+    aio_context_release(ctx);
 }
 
 /**
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v2 02/17] blockjob: Wake up BDS when job becomes idle
  2018-09-13 12:52 [Qemu-devel] [PATCH v2 00/17] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 01/17] job: Fix missing locking due to mismerge Kevin Wolf
@ 2018-09-13 12:52 ` Kevin Wolf
  2018-09-13 14:31   ` Max Reitz
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 03/17] aio-wait: Increase num_waiters even in home thread Kevin Wolf
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 12:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, jsnow, qemu-devel

In the context of draining a BDS, the .drained_poll callback of block
jobs is called. If this returns true (i.e. there is still some activity
pending), the drain operation may call aio_poll() with blocking=true to
wait for completion.

As soon as the pending activity is completed and the job finally arrives
in a quiescent state (i.e. its coroutine either yields with busy=false
or terminates), the block job must notify the aio_poll() loop to wake
up, otherwise we get a deadlock if both are running in different
threads.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 include/block/blockjob.h | 13 +++++++++++++
 include/qemu/job.h       |  3 +++
 blockjob.c               | 18 ++++++++++++++++++
 job.c                    |  7 +++++++
 4 files changed, 41 insertions(+)

diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 32c00b7dc0..2290bbb824 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -70,6 +70,9 @@ typedef struct BlockJob {
     /** Called when the job transitions to READY */
     Notifier ready_notifier;
 
+    /** Called when the job coroutine yields or terminates */
+    Notifier idle_notifier;
+
     /** BlockDriverStates that are involved in this block job */
     GSList *nodes;
 } BlockJob;
@@ -119,6 +122,16 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
 void block_job_remove_all_bdrv(BlockJob *job);
 
 /**
+ * block_job_wakeup_all_bdrv:
+ * @job: The block job
+ *
+ * Calls bdrv_wakeup() for all BlockDriverStates that have been added to the
+ * job. This function is to be called whenever child_job_drained_poll() would
+ * go from true to false to notify waiting drain requests.
+ */
+void block_job_wakeup_all_bdrv(BlockJob *job);
+
+/**
  * block_job_set_speed:
  * @job: The job to set the speed for.
  * @speed: The new value
diff --git a/include/qemu/job.h b/include/qemu/job.h
index 5cb0681834..b4a784d3cc 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -156,6 +156,9 @@ typedef struct Job {
     /** Notifiers called when the job transitions to READY */
     NotifierList on_ready;
 
+    /** Notifiers called when the job coroutine yields or terminates */
+    NotifierList on_idle;
+
     /** Element of the list of jobs */
     QLIST_ENTRY(Job) job_list;
 
diff --git a/blockjob.c b/blockjob.c
index be5903aa96..8d27e8e1ea 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -221,6 +221,22 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
     return 0;
 }
 
+void block_job_wakeup_all_bdrv(BlockJob *job)
+{
+    GSList *l;
+
+    for (l = job->nodes; l; l = l->next) {
+        BdrvChild *c = l->data;
+        bdrv_wakeup(c->bs);
+    }
+}
+
+static void block_job_on_idle(Notifier *n, void *opaque)
+{
+    BlockJob *job = opaque;
+    block_job_wakeup_all_bdrv(job);
+}
+
 bool block_job_is_internal(BlockJob *job)
 {
     return (job->job.id == NULL);
@@ -419,6 +435,7 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     job->finalize_completed_notifier.notify = block_job_event_completed;
     job->pending_notifier.notify = block_job_event_pending;
     job->ready_notifier.notify = block_job_event_ready;
+    job->idle_notifier.notify = block_job_on_idle;
 
     notifier_list_add(&job->job.on_finalize_cancelled,
                       &job->finalize_cancelled_notifier);
@@ -426,6 +443,7 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
                       &job->finalize_completed_notifier);
     notifier_list_add(&job->job.on_pending, &job->pending_notifier);
     notifier_list_add(&job->job.on_ready, &job->ready_notifier);
+    notifier_list_add(&job->job.on_idle, &job->idle_notifier);
 
     error_setg(&job->blocker, "block device is in use by block job: %s",
                job_type_str(&job->job));
diff --git a/job.c b/job.c
index 5c4e84f007..48a767c102 100644
--- a/job.c
+++ b/job.c
@@ -402,6 +402,11 @@ static void job_event_ready(Job *job)
     notifier_list_notify(&job->on_ready, job);
 }
 
+static void job_event_idle(Job *job)
+{
+    notifier_list_notify(&job->on_idle, job);
+}
+
 void job_enter_cond(Job *job, bool(*fn)(Job *job))
 {
     if (!job_started(job)) {
@@ -447,6 +452,7 @@ static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
         timer_mod(&job->sleep_timer, ns);
     }
     job->busy = false;
+    job_event_idle(job);
     job_unlock();
     qemu_coroutine_yield();
 
@@ -865,6 +871,7 @@ static void coroutine_fn job_co_entry(void *opaque)
     assert(job && job->driver && job->driver->run);
     job_pause_point(job);
     job->ret = job->driver->run(job, &job->err);
+    job_event_idle(job);
     job->deferred_to_main_loop = true;
     aio_bh_schedule_oneshot(qemu_get_aio_context(), job_exit, job);
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v2 03/17] aio-wait: Increase num_waiters even in home thread
  2018-09-13 12:52 [Qemu-devel] [PATCH v2 00/17] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 01/17] job: Fix missing locking due to mismerge Kevin Wolf
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 02/17] blockjob: Wake up BDS when job becomes idle Kevin Wolf
@ 2018-09-13 12:52 ` Kevin Wolf
  2018-09-13 15:11   ` Paolo Bonzini
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 04/17] test-bdrv-drain: Drain with block jobs in an I/O thread Kevin Wolf
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 12:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, jsnow, qemu-devel

Even if AIO_WAIT_WHILE() is called in the home context of the
AioContext, we still want to allow the condition to change depending on
other threads as long as they kick the AioWait. Specfically block jobs
can be running in an I/O thread and should then be able to kick a drain
in the main loop context.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 include/block/aio-wait.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index c85a62f798..46ba7f9111 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -77,10 +77,12 @@ typedef struct {
     AioWait *wait_ = (wait);                                       \
     AioContext *ctx_ = (ctx);                                      \
     if (ctx_ && in_aio_context_home_thread(ctx_)) {                \
+        atomic_inc(&wait_->num_waiters);                           \
         while ((cond)) {                                           \
             aio_poll(ctx_, true);                                  \
             waited_ = true;                                        \
         }                                                          \
+        atomic_dec(&wait_->num_waiters);                           \
     } else {                                                       \
         assert(qemu_get_current_aio_context() ==                   \
                qemu_get_aio_context());                            \
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v2 04/17] test-bdrv-drain: Drain with block jobs in an I/O thread
  2018-09-13 12:52 [Qemu-devel] [PATCH v2 00/17] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (2 preceding siblings ...)
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 03/17] aio-wait: Increase num_waiters even in home thread Kevin Wolf
@ 2018-09-13 12:52 ` Kevin Wolf
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 05/17] test-blockjob: Acquire AioContext around job_cancel_sync() Kevin Wolf
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 12:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, jsnow, qemu-devel

This extends the existing drain test with a block job to include
variants where the block job runs in a different AioContext.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 tests/test-bdrv-drain.c | 92 +++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 86 insertions(+), 6 deletions(-)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 89ac15e88a..57da22a096 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -174,6 +174,28 @@ static void do_drain_end(enum drain_type drain_type, BlockDriverState *bs)
     }
 }
 
+static void do_drain_begin_unlocked(enum drain_type drain_type, BlockDriverState *bs)
+{
+    if (drain_type != BDRV_DRAIN_ALL) {
+        aio_context_acquire(bdrv_get_aio_context(bs));
+    }
+    do_drain_begin(drain_type, bs);
+    if (drain_type != BDRV_DRAIN_ALL) {
+        aio_context_release(bdrv_get_aio_context(bs));
+    }
+}
+
+static void do_drain_end_unlocked(enum drain_type drain_type, BlockDriverState *bs)
+{
+    if (drain_type != BDRV_DRAIN_ALL) {
+        aio_context_acquire(bdrv_get_aio_context(bs));
+    }
+    do_drain_end(drain_type, bs);
+    if (drain_type != BDRV_DRAIN_ALL) {
+        aio_context_release(bdrv_get_aio_context(bs));
+    }
+}
+
 static void test_drv_cb_common(enum drain_type drain_type, bool recursive)
 {
     BlockBackend *blk;
@@ -785,11 +807,13 @@ BlockJobDriver test_job_driver = {
     },
 };
 
-static void test_blockjob_common(enum drain_type drain_type)
+static void test_blockjob_common(enum drain_type drain_type, bool use_iothread)
 {
     BlockBackend *blk_src, *blk_target;
     BlockDriverState *src, *target;
     BlockJob *job;
+    IOThread *iothread = NULL;
+    AioContext *ctx;
     int ret;
 
     src = bdrv_new_open_driver(&bdrv_test, "source", BDRV_O_RDWR,
@@ -797,21 +821,31 @@ static void test_blockjob_common(enum drain_type drain_type)
     blk_src = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
     blk_insert_bs(blk_src, src, &error_abort);
 
+    if (use_iothread) {
+        iothread = iothread_new();
+        ctx = iothread_get_aio_context(iothread);
+        blk_set_aio_context(blk_src, ctx);
+    } else {
+        ctx = qemu_get_aio_context();
+    }
+
     target = bdrv_new_open_driver(&bdrv_test, "target", BDRV_O_RDWR,
                                   &error_abort);
     blk_target = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
     blk_insert_bs(blk_target, target, &error_abort);
 
+    aio_context_acquire(ctx);
     job = block_job_create("job0", &test_job_driver, NULL, src, 0, BLK_PERM_ALL,
                            0, 0, NULL, NULL, &error_abort);
     block_job_add_bdrv(job, "target", target, 0, BLK_PERM_ALL, &error_abort);
     job_start(&job->job);
+    aio_context_release(ctx);
 
     g_assert_cmpint(job->job.pause_count, ==, 0);
     g_assert_false(job->job.paused);
     g_assert_true(job->job.busy); /* We're in job_sleep_ns() */
 
-    do_drain_begin(drain_type, src);
+    do_drain_begin_unlocked(drain_type, src);
 
     if (drain_type == BDRV_DRAIN_ALL) {
         /* bdrv_drain_all() drains both src and target */
@@ -822,7 +856,14 @@ static void test_blockjob_common(enum drain_type drain_type)
     g_assert_true(job->job.paused);
     g_assert_false(job->job.busy); /* The job is paused */
 
-    do_drain_end(drain_type, src);
+    do_drain_end_unlocked(drain_type, src);
+
+    if (use_iothread) {
+        /* paused is reset in the I/O thread, wait for it */
+        while (job->job.paused) {
+            aio_poll(qemu_get_aio_context(), false);
+        }
+    }
 
     g_assert_cmpint(job->job.pause_count, ==, 0);
     g_assert_false(job->job.paused);
@@ -841,32 +882,64 @@ static void test_blockjob_common(enum drain_type drain_type)
 
     do_drain_end(drain_type, target);
 
+    if (use_iothread) {
+        /* paused is reset in the I/O thread, wait for it */
+        while (job->job.paused) {
+            aio_poll(qemu_get_aio_context(), false);
+        }
+    }
+
     g_assert_cmpint(job->job.pause_count, ==, 0);
     g_assert_false(job->job.paused);
     g_assert_true(job->job.busy); /* We're in job_sleep_ns() */
 
+    aio_context_acquire(ctx);
     ret = job_complete_sync(&job->job, &error_abort);
     g_assert_cmpint(ret, ==, 0);
 
+    if (use_iothread) {
+        blk_set_aio_context(blk_src, qemu_get_aio_context());
+    }
+    aio_context_release(ctx);
+
     blk_unref(blk_src);
     blk_unref(blk_target);
     bdrv_unref(src);
     bdrv_unref(target);
+
+    if (iothread) {
+        iothread_join(iothread);
+    }
 }
 
 static void test_blockjob_drain_all(void)
 {
-    test_blockjob_common(BDRV_DRAIN_ALL);
+    test_blockjob_common(BDRV_DRAIN_ALL, false);
 }
 
 static void test_blockjob_drain(void)
 {
-    test_blockjob_common(BDRV_DRAIN);
+    test_blockjob_common(BDRV_DRAIN, false);
 }
 
 static void test_blockjob_drain_subtree(void)
 {
-    test_blockjob_common(BDRV_SUBTREE_DRAIN);
+    test_blockjob_common(BDRV_SUBTREE_DRAIN, false);
+}
+
+static void test_blockjob_iothread_drain_all(void)
+{
+    test_blockjob_common(BDRV_DRAIN_ALL, true);
+}
+
+static void test_blockjob_iothread_drain(void)
+{
+    test_blockjob_common(BDRV_DRAIN, true);
+}
+
+static void test_blockjob_iothread_drain_subtree(void)
+{
+    test_blockjob_common(BDRV_SUBTREE_DRAIN, true);
 }
 
 
@@ -1338,6 +1411,13 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/blockjob/drain_subtree",
                     test_blockjob_drain_subtree);
 
+    g_test_add_func("/bdrv-drain/blockjob/iothread/drain_all",
+                    test_blockjob_iothread_drain_all);
+    g_test_add_func("/bdrv-drain/blockjob/iothread/drain",
+                    test_blockjob_iothread_drain);
+    g_test_add_func("/bdrv-drain/blockjob/iothread/drain_subtree",
+                    test_blockjob_iothread_drain_subtree);
+
     g_test_add_func("/bdrv-drain/deletion/drain", test_delete_by_drain);
     g_test_add_func("/bdrv-drain/detach/drain_all", test_detach_by_drain_all);
     g_test_add_func("/bdrv-drain/detach/drain", test_detach_by_drain);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v2 05/17] test-blockjob: Acquire AioContext around job_cancel_sync()
  2018-09-13 12:52 [Qemu-devel] [PATCH v2 00/17] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (3 preceding siblings ...)
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 04/17] test-bdrv-drain: Drain with block jobs in an I/O thread Kevin Wolf
@ 2018-09-13 12:52 ` Kevin Wolf
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 06/17] job: Use AIO_WAIT_WHILE() in job_finish_sync() Kevin Wolf
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 12:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, jsnow, qemu-devel

All callers in QEMU proper hold the AioContext lock when calling
job_finish_sync(). test-blockjob should do the same when it calls the
function indirectly through job_cancel_sync().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 include/qemu/job.h    | 6 ++++++
 tests/test-blockjob.c | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index b4a784d3cc..63c60ef1ba 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -524,6 +524,8 @@ void job_user_cancel(Job *job, bool force, Error **errp);
  *
  * Returns the return value from the job if the job actually completed
  * during the call, or -ECANCELED if it was canceled.
+ *
+ * Callers must hold the AioContext lock of job->aio_context.
  */
 int job_cancel_sync(Job *job);
 
@@ -541,6 +543,8 @@ void job_cancel_sync_all(void);
  * function).
  *
  * Returns the return value from the job.
+ *
+ * Callers must hold the AioContext lock of job->aio_context.
  */
 int job_complete_sync(Job *job, Error **errp);
 
@@ -566,6 +570,8 @@ void job_dismiss(Job **job, Error **errp);
  *
  * Returns 0 if the job is successfully completed, -ECANCELED if the job was
  * cancelled before completing, and -errno in other error cases.
+ *
+ * Callers must hold the AioContext lock of job->aio_context.
  */
 int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp);
 
diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
index de4c1c20aa..652d1e8359 100644
--- a/tests/test-blockjob.c
+++ b/tests/test-blockjob.c
@@ -223,6 +223,10 @@ static void cancel_common(CancelJob *s)
     BlockJob *job = &s->common;
     BlockBackend *blk = s->blk;
     JobStatus sts = job->job.status;
+    AioContext *ctx;
+
+    ctx = job->job.aio_context;
+    aio_context_acquire(ctx);
 
     job_cancel_sync(&job->job);
     if (sts != JOB_STATUS_CREATED && sts != JOB_STATUS_CONCLUDED) {
@@ -232,6 +236,8 @@ static void cancel_common(CancelJob *s)
     assert(job->job.status == JOB_STATUS_NULL);
     job_unref(&job->job);
     destroy_blk(blk);
+
+    aio_context_release(ctx);
 }
 
 static void test_cancel_created(void)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v2 06/17] job: Use AIO_WAIT_WHILE() in job_finish_sync()
  2018-09-13 12:52 [Qemu-devel] [PATCH v2 00/17] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (4 preceding siblings ...)
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 05/17] test-blockjob: Acquire AioContext around job_cancel_sync() Kevin Wolf
@ 2018-09-13 12:52 ` Kevin Wolf
  2018-09-13 14:45   ` Max Reitz
  2018-09-13 15:15   ` Paolo Bonzini
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 07/17] test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback Kevin Wolf
                   ` (10 subsequent siblings)
  16 siblings, 2 replies; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 12:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, jsnow, qemu-devel

job_finish_sync() needs to release the AioContext lock of the job before
calling aio_poll(). Otherwise, callbacks called by aio_poll() would
possibly take the lock a second time and run into a deadlock with a
nested AIO_WAIT_WHILE() call.

Also, job_drain() without aio_poll() isn't necessarily enough to make
progress on a job, it could depend on bottom halves to be executed.

Combine both open-coded while loops into a single AIO_WAIT_WHILE() call
that solves both of these problems.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 job.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/job.c b/job.c
index 48a767c102..fa74558ba0 100644
--- a/job.c
+++ b/job.c
@@ -29,6 +29,7 @@
 #include "qemu/job.h"
 #include "qemu/id.h"
 #include "qemu/main-loop.h"
+#include "block/aio-wait.h"
 #include "trace-root.h"
 #include "qapi/qapi-events-job.h"
 
@@ -962,6 +963,7 @@ void job_complete(Job *job, Error **errp)
 int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
 {
     Error *local_err = NULL;
+    AioWait dummy_wait = {};
     int ret;
 
     job_ref(job);
@@ -974,14 +976,10 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
         job_unref(job);
         return -EBUSY;
     }
-    /* job_drain calls job_enter, and it should be enough to induce progress
-     * until the job completes or moves to the main thread. */
-    while (!job->deferred_to_main_loop && !job_is_completed(job)) {
-        job_drain(job);
-    }
-    while (!job_is_completed(job)) {
-        aio_poll(qemu_get_aio_context(), true);
-    }
+
+    AIO_WAIT_WHILE(&dummy_wait, job->aio_context,
+                   (job_drain(job), !job_is_completed(job)));
+
     ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : job->ret;
     job_unref(job);
     return ret;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v2 07/17] test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback
  2018-09-13 12:52 [Qemu-devel] [PATCH v2 00/17] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (5 preceding siblings ...)
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 06/17] job: Use AIO_WAIT_WHILE() in job_finish_sync() Kevin Wolf
@ 2018-09-13 12:52 ` Kevin Wolf
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 08/17] block: Add missing locking in bdrv_co_drain_bh_cb() Kevin Wolf
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 12:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, jsnow, qemu-devel

This is a regression test for a deadlock that occurred in block job
completion callbacks (via job_defer_to_main_loop) because the AioContext
lock was taken twice: once in job_finish_sync() and then again in
job_defer_to_main_loop_bh(). This would cause AIO_WAIT_WHILE() to hang.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 tests/test-bdrv-drain.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 57da22a096..c3c17b9ff7 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -774,6 +774,15 @@ typedef struct TestBlockJob {
     bool should_complete;
 } TestBlockJob;
 
+static int test_job_prepare(Job *job)
+{
+    TestBlockJob *s = container_of(job, TestBlockJob, common.job);
+
+    /* Provoke an AIO_WAIT_WHILE() call to verify there is no deadlock */
+    blk_flush(s->common.blk);
+    return 0;
+}
+
 static int coroutine_fn test_job_run(Job *job, Error **errp)
 {
     TestBlockJob *s = container_of(job, TestBlockJob, common.job);
@@ -804,6 +813,7 @@ BlockJobDriver test_job_driver = {
         .drain          = block_job_drain,
         .run            = test_job_run,
         .complete       = test_job_complete,
+        .prepare        = test_job_prepare,
     },
 };
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v2 08/17] block: Add missing locking in bdrv_co_drain_bh_cb()
  2018-09-13 12:52 [Qemu-devel] [PATCH v2 00/17] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (6 preceding siblings ...)
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 07/17] test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback Kevin Wolf
@ 2018-09-13 12:52 ` Kevin Wolf
  2018-09-13 14:58   ` Max Reitz
  2018-09-13 15:17   ` Paolo Bonzini
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 09/17] block-backend: Add .drained_poll callback Kevin Wolf
                   ` (8 subsequent siblings)
  16 siblings, 2 replies; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 12:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, jsnow, qemu-devel

bdrv_do_drained_begin/end() assume that they are called with the
AioContext lock of bs held. If we call drain functions from a coroutine
with the AioContext lock held, we yield and schedule a BH to move out of
coroutine context. This means that the lock for the home context of the
coroutine is released and must be re-acquired in the bottom half.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/qemu/coroutine.h |  5 +++++
 block/io.c               | 15 +++++++++++++++
 util/qemu-coroutine.c    |  5 +++++
 3 files changed, 25 insertions(+)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 6f8a487041..9801e7f5a4 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -90,6 +90,11 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co);
 void coroutine_fn qemu_coroutine_yield(void);
 
 /**
+ * Get the AioContext of the given coroutine
+ */
+AioContext *coroutine_fn qemu_coroutine_get_aio_context(Coroutine *co);
+
+/**
  * Get the currently executing coroutine
  */
 Coroutine *coroutine_fn qemu_coroutine_self(void);
diff --git a/block/io.c b/block/io.c
index 7100344c7b..914ba78f1a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -288,6 +288,18 @@ static void bdrv_co_drain_bh_cb(void *opaque)
     BlockDriverState *bs = data->bs;
 
     if (bs) {
+        AioContext *ctx = bdrv_get_aio_context(bs);
+        AioContext *co_ctx = qemu_coroutine_get_aio_context(co);
+
+        /*
+         * When the coroutine yielded, the lock for its home context was
+         * released, so we need to re-acquire it here. If it explicitly
+         * acquired a different context, the lock is still held and we don't
+         * want to lock it a second time (or AIO_WAIT_WHILE() would hang).
+         */
+        if (ctx == co_ctx) {
+            aio_context_acquire(ctx);
+        }
         bdrv_dec_in_flight(bs);
         if (data->begin) {
             bdrv_do_drained_begin(bs, data->recursive, data->parent,
@@ -296,6 +308,9 @@ static void bdrv_co_drain_bh_cb(void *opaque)
             bdrv_do_drained_end(bs, data->recursive, data->parent,
                                 data->ignore_bds_parents);
         }
+        if (ctx == co_ctx) {
+            aio_context_release(ctx);
+        }
     } else {
         assert(data->begin);
         bdrv_drain_all_begin();
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index 1ba4191b84..2295928d33 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -198,3 +198,8 @@ bool qemu_coroutine_entered(Coroutine *co)
 {
     return co->caller;
 }
+
+AioContext *coroutine_fn qemu_coroutine_get_aio_context(Coroutine *co)
+{
+    return co->ctx;
+}
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v2 09/17] block-backend: Add .drained_poll callback
  2018-09-13 12:52 [Qemu-devel] [PATCH v2 00/17] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (7 preceding siblings ...)
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 08/17] block: Add missing locking in bdrv_co_drain_bh_cb() Kevin Wolf
@ 2018-09-13 12:52 ` Kevin Wolf
  2018-09-13 15:01   ` Max Reitz
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 10/17] block-backend: Fix potential double blk_delete() Kevin Wolf
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 12:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, jsnow, qemu-devel

A bdrv_drain operation must ensure that all parents are quiesced, this
includes BlockBackends. Otherwise, callbacks called by requests that are
completed on the BDS layer, but not quite yet on the BlockBackend layer
could still create new requests.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 block/block-backend.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index fa120630be..efa8d8011c 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -121,6 +121,7 @@ static void blk_root_inherit_options(int *child_flags, QDict *child_options,
     abort();
 }
 static void blk_root_drained_begin(BdrvChild *child);
+static bool blk_root_drained_poll(BdrvChild *child);
 static void blk_root_drained_end(BdrvChild *child);
 
 static void blk_root_change_media(BdrvChild *child, bool load);
@@ -294,6 +295,7 @@ static const BdrvChildRole child_root = {
     .get_parent_desc    = blk_root_get_parent_desc,
 
     .drained_begin      = blk_root_drained_begin,
+    .drained_poll       = blk_root_drained_poll,
     .drained_end        = blk_root_drained_end,
 
     .activate           = blk_root_activate,
@@ -2191,6 +2193,13 @@ static void blk_root_drained_begin(BdrvChild *child)
     }
 }
 
+static bool blk_root_drained_poll(BdrvChild *child)
+{
+    BlockBackend *blk = child->opaque;
+    assert(blk->quiesce_counter);
+    return !!blk->in_flight;
+}
+
 static void blk_root_drained_end(BdrvChild *child)
 {
     BlockBackend *blk = child->opaque;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v2 10/17] block-backend: Fix potential double blk_delete()
  2018-09-13 12:52 [Qemu-devel] [PATCH v2 00/17] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (8 preceding siblings ...)
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 09/17] block-backend: Add .drained_poll callback Kevin Wolf
@ 2018-09-13 12:52 ` Kevin Wolf
  2018-09-13 15:19   ` Paolo Bonzini
  2018-09-13 19:50   ` Max Reitz
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback Kevin Wolf
                   ` (6 subsequent siblings)
  16 siblings, 2 replies; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 12:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, jsnow, qemu-devel

blk_unref() first decreases the refcount of the BlockBackend and calls
blk_delete() if the refcount reaches zero. Requests can still be in
flight at this point, they are only drained during blk_delete():

At this point, arbitrary callbacks can run. If any callback takes a
temporary BlockBackend reference, it will first increase the refcount to
1 and then decrease it to 0 again, triggering another blk_delete(). This
will cause a use-after-free crash in the outer blk_delete().

Fix it by draining the BlockBackend before decreasing to refcount to 0.
Assert in blk_ref() that it never takes the first refcount (which would
mean that the BlockBackend is already being deleted).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 block/block-backend.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index efa8d8011c..1b2d7a6ff5 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -435,6 +435,7 @@ int blk_get_refcnt(BlockBackend *blk)
  */
 void blk_ref(BlockBackend *blk)
 {
+    assert(blk->refcnt > 0);
     blk->refcnt++;
 }
 
@@ -447,7 +448,11 @@ void blk_unref(BlockBackend *blk)
 {
     if (blk) {
         assert(blk->refcnt > 0);
-        if (!--blk->refcnt) {
+        if (blk->refcnt > 1) {
+            blk->refcnt--;
+        } else {
+            blk_drain(blk);
+            blk->refcnt = 0;
             blk_delete(blk);
         }
     }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback
  2018-09-13 12:52 [Qemu-devel] [PATCH v2 00/17] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (9 preceding siblings ...)
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 10/17] block-backend: Fix potential double blk_delete() Kevin Wolf
@ 2018-09-13 12:52 ` Kevin Wolf
  2018-09-13 15:10   ` Paolo Bonzini
  2018-09-13 20:50   ` Max Reitz
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 12/17] mirror: Fix potential use-after-free in active commit Kevin Wolf
                   ` (5 subsequent siblings)
  16 siblings, 2 replies; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 12:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, jsnow, qemu-devel

Request callbacks can do pretty much anything, including operations that
will yield from the coroutine (such as draining the backend). In that
case, a decreased in_flight would be visible to other code and could
lead to a drain completing while the callback hasn't actually completed
yet.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 block/block-backend.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 1b2d7a6ff5..2efaf0c968 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1338,8 +1338,16 @@ static const AIOCBInfo blk_aio_em_aiocb_info = {
 static void blk_aio_complete(BlkAioEmAIOCB *acb)
 {
     if (acb->has_returned) {
-        blk_dec_in_flight(acb->rwco.blk);
+        if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
+            /* If we are in the main thread, the callback is allowed to unref
+             * the BlockBackend, so we have to hold an additional reference */
+            blk_ref(acb->rwco.blk);
+        }
         acb->common.cb(acb->common.opaque, acb->rwco.ret);
+        blk_dec_in_flight(acb->rwco.blk);
+        if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
+            blk_unref(acb->rwco.blk);
+        }
         qemu_aio_unref(acb);
     }
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v2 12/17] mirror: Fix potential use-after-free in active commit
  2018-09-13 12:52 [Qemu-devel] [PATCH v2 00/17] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (10 preceding siblings ...)
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback Kevin Wolf
@ 2018-09-13 12:52 ` Kevin Wolf
  2018-09-13 20:55   ` Max Reitz
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 13/17] blockjob: Lie better in child_job_drained_poll() Kevin Wolf
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 12:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, jsnow, qemu-devel

When starting an active commit job, other callbacks can run before
mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
go away. Add another pair of bdrv_ref/unref() around it to protect
against this case.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/mirror.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/block/mirror.c b/block/mirror.c
index 56d9ef7474..c8657991cf 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1697,7 +1697,14 @@ void commit_active_start(const char *job_id, BlockDriverState *bs,
 
     orig_base_flags = bdrv_get_flags(base);
 
+    /* bdrv_reopen() drains, which might make the BDSes go away before a
+     * reference is taken in mirror_start_job(). */
+    bdrv_ref(bs);
+    bdrv_ref(base);
+
     if (bdrv_reopen(base, bs->open_flags, errp)) {
+        bdrv_unref(bs);
+        bdrv_unref(base);
         return;
     }
 
@@ -1707,6 +1714,10 @@ void commit_active_start(const char *job_id, BlockDriverState *bs,
                      &commit_active_job_driver, false, base, auto_complete,
                      filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND,
                      &local_err);
+
+    bdrv_unref(bs);
+    bdrv_unref(base);
+
     if (local_err) {
         error_propagate(errp, local_err);
         goto error_restore_flags;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v2 13/17] blockjob: Lie better in child_job_drained_poll()
  2018-09-13 12:52 [Qemu-devel] [PATCH v2 00/17] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (11 preceding siblings ...)
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 12/17] mirror: Fix potential use-after-free in active commit Kevin Wolf
@ 2018-09-13 12:52 ` Kevin Wolf
  2018-09-13 21:52   ` Max Reitz
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 14/17] block: Remove aio_poll() in bdrv_drain_poll variants Kevin Wolf
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 12:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, jsnow, qemu-devel

Block jobs claim in .drained_poll() that they are in a quiescent state
as soon as job->deferred_to_main_loop is true. This is obviously wrong,
they still have a completion BH to run. We only get away with this
because commit 91af091f923 added an unconditional aio_poll(false) to the
drain functions, but this is bypassing the regular drain mechanisms.

However, just removing this and telling that the job is still active
doesn't work either: The completion callbacks themselves call drain
functions (directly, or indirectly with bdrv_reopen), so they would
deadlock then.

As a better lie, tell that the job is active as long as the BH is
pending, but falsely call it quiescent from the point in the BH when the
completion callback is called. At this point, nested drain calls won't
deadlock because they ignore the job, and outer drains will wait for the
job to really reach a quiescent state because the callback is already
running.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/qemu/job.h |  3 +++
 blockjob.c         |  2 +-
 job.c              | 11 ++++++++++-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 63c60ef1ba..9e7cd1e4a0 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -76,6 +76,9 @@ typedef struct Job {
      * Set to false by the job while the coroutine has yielded and may be
      * re-entered by job_enter(). There may still be I/O or event loop activity
      * pending. Accessed under block_job_mutex (in blockjob.c).
+     *
+     * When the job is deferred to the main loop, busy is true as long as the
+     * bottom half is still pending.
      */
     bool busy;
 
diff --git a/blockjob.c b/blockjob.c
index 8d27e8e1ea..617d86fe93 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -164,7 +164,7 @@ static bool child_job_drained_poll(BdrvChild *c)
     /* An inactive or completed job doesn't have any pending requests. Jobs
      * with !job->busy are either already paused or have a pause point after
      * being reentered, so no job driver code will run before they pause. */
-    if (!job->busy || job_is_completed(job) || job->deferred_to_main_loop) {
+    if (!job->busy || job_is_completed(job)) {
         return false;
     }
 
diff --git a/job.c b/job.c
index fa74558ba0..00a1cd128d 100644
--- a/job.c
+++ b/job.c
@@ -857,7 +857,16 @@ static void job_exit(void *opaque)
     AioContext *ctx = job->aio_context;
 
     aio_context_acquire(ctx);
+
+    /* This is a lie, we're not quiescent, but still doing the completion
+     * callbacks. However, completion callbacks tend to involve operations that
+     * drain block nodes, and if .drained_poll still returned true, we would
+     * deadlock. */
+    job->busy = false;
+    job_event_idle(job);
+
     job_completed(job);
+
     aio_context_release(ctx);
 }
 
@@ -872,8 +881,8 @@ static void coroutine_fn job_co_entry(void *opaque)
     assert(job && job->driver && job->driver->run);
     job_pause_point(job);
     job->ret = job->driver->run(job, &job->err);
-    job_event_idle(job);
     job->deferred_to_main_loop = true;
+    job->busy = true;
     aio_bh_schedule_oneshot(qemu_get_aio_context(), job_exit, job);
 }
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v2 14/17] block: Remove aio_poll() in bdrv_drain_poll variants
  2018-09-13 12:52 [Qemu-devel] [PATCH v2 00/17] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (12 preceding siblings ...)
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 13/17] blockjob: Lie better in child_job_drained_poll() Kevin Wolf
@ 2018-09-13 12:52 ` Kevin Wolf
  2018-09-13 21:55   ` Max Reitz
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 15/17] test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level() Kevin Wolf
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 12:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, jsnow, qemu-devel

bdrv_drain_poll_top_level() was buggy because it didn't release the
AioContext lock of the node to be drained before calling aio_poll().
This way, callbacks called by aio_poll() would possibly take the lock a
second time and run into a deadlock with a nested AIO_WAIT_WHILE() call.

However, it turns out that the aio_poll() call isn't actually needed any
more. It was introduced in commit 91af091f923, which is effectively
reverted by this patch. The cases it was supposed to fix are now covered
by bdrv_drain_poll(), which waits for block jobs to reach a quiescent
state.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 block/io.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/block/io.c b/block/io.c
index 914ba78f1a..8b81ff3913 100644
--- a/block/io.c
+++ b/block/io.c
@@ -268,10 +268,6 @@ bool bdrv_drain_poll(BlockDriverState *bs, bool recursive,
 static bool bdrv_drain_poll_top_level(BlockDriverState *bs, bool recursive,
                                       BdrvChild *ignore_parent)
 {
-    /* Execute pending BHs first and check everything else only after the BHs
-     * have executed. */
-    while (aio_poll(bs->aio_context, false));
-
     return bdrv_drain_poll(bs, recursive, ignore_parent, false);
 }
 
@@ -511,10 +507,6 @@ static bool bdrv_drain_all_poll(void)
     BlockDriverState *bs = NULL;
     bool result = false;
 
-    /* Execute pending BHs first (may modify the graph) and check everything
-     * else only after the BHs have executed. */
-    while (aio_poll(qemu_get_aio_context(), false));
-
     /* bdrv_drain_poll() can't make changes to the graph and we are holding the
      * main AioContext lock, so iterating bdrv_next_all_states() is safe. */
     while ((bs = bdrv_next_all_states(bs))) {
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v2 15/17] test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level()
  2018-09-13 12:52 [Qemu-devel] [PATCH v2 00/17] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (13 preceding siblings ...)
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 14/17] block: Remove aio_poll() in bdrv_drain_poll variants Kevin Wolf
@ 2018-09-13 12:52 ` Kevin Wolf
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 16/17] job: Avoid deadlocks in job_completed_txn_abort() Kevin Wolf
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 17/17] test-bdrv-drain: AIO_WAIT_WHILE() in job .commit/.abort Kevin Wolf
  16 siblings, 0 replies; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 12:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, jsnow, qemu-devel

This is a regression test for a deadlock that could occur in callbacks
called from the aio_poll() in bdrv_drain_poll_top_level(). The
AioContext lock wasn't released and therefore would be taken a second
time in the callback. This would cause a possible AIO_WAIT_WHILE() in
the callback to hang.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 tests/test-bdrv-drain.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index c3c17b9ff7..e105c0ae84 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -636,6 +636,17 @@ static void test_iothread_aio_cb(void *opaque, int ret)
     qemu_event_set(&done_event);
 }
 
+static void test_iothread_main_thread_bh(void *opaque)
+{
+    struct test_iothread_data *data = opaque;
+
+    /* Test that the AioContext is not yet locked in a random BH that is
+     * executed during drain, otherwise this would deadlock. */
+    aio_context_acquire(bdrv_get_aio_context(data->bs));
+    bdrv_flush(data->bs);
+    aio_context_release(bdrv_get_aio_context(data->bs));
+}
+
 /*
  * Starts an AIO request on a BDS that runs in the AioContext of iothread 1.
  * The request involves a BH on iothread 2 before it can complete.
@@ -705,6 +716,8 @@ static void test_iothread_common(enum drain_type drain_type, int drain_thread)
             aio_context_acquire(ctx_a);
         }
 
+        aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, &data);
+
         /* The request is running on the IOThread a. Draining its block device
          * will make sure that it has completed as far as the BDS is concerned,
          * but the drain in this thread can continue immediately after
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v2 16/17] job: Avoid deadlocks in job_completed_txn_abort()
  2018-09-13 12:52 [Qemu-devel] [PATCH v2 00/17] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (14 preceding siblings ...)
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 15/17] test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level() Kevin Wolf
@ 2018-09-13 12:52 ` Kevin Wolf
  2018-09-13 22:01   ` Max Reitz
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 17/17] test-bdrv-drain: AIO_WAIT_WHILE() in job .commit/.abort Kevin Wolf
  16 siblings, 1 reply; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 12:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, jsnow, qemu-devel

Amongst others, job_finalize_single() calls the .prepare/.commit/.abort
callbacks of the individual job driver. Recently, their use was adapted
for all block jobs so that they involve code calling AIO_WAIT_WHILE()
now. Such code must be called under the AioContext lock for the
respective job, but without holding any other AioContext lock.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 job.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/job.c b/job.c
index 00a1cd128d..0b021867da 100644
--- a/job.c
+++ b/job.c
@@ -718,6 +718,7 @@ static void job_cancel_async(Job *job, bool force)
 
 static void job_completed_txn_abort(Job *job)
 {
+    AioContext *outer_ctx = job->aio_context;
     AioContext *ctx;
     JobTxn *txn = job->txn;
     Job *other_job;
@@ -731,23 +732,26 @@ static void job_completed_txn_abort(Job *job)
     txn->aborting = true;
     job_txn_ref(txn);
 
-    /* We are the first failed job. Cancel other jobs. */
-    QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
-        ctx = other_job->aio_context;
-        aio_context_acquire(ctx);
-    }
+    /* We can only hold the single job's AioContext lock while calling
+     * job_finalize_single() because the finalization callbacks can involve
+     * calls of AIO_WAIT_WHILE(), which could deadlock otherwise. */
+    aio_context_release(outer_ctx);
 
     /* Other jobs are effectively cancelled by us, set the status for
      * them; this job, however, may or may not be cancelled, depending
      * on the caller, so leave it. */
     QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
         if (other_job != job) {
+            ctx = other_job->aio_context;
+            aio_context_acquire(ctx);
             job_cancel_async(other_job, false);
+            aio_context_release(ctx);
         }
     }
     while (!QLIST_EMPTY(&txn->jobs)) {
         other_job = QLIST_FIRST(&txn->jobs);
         ctx = other_job->aio_context;
+        aio_context_acquire(ctx);
         if (!job_is_completed(other_job)) {
             assert(job_is_cancelled(other_job));
             job_finish_sync(other_job, NULL, NULL);
@@ -756,6 +760,8 @@ static void job_completed_txn_abort(Job *job)
         aio_context_release(ctx);
     }
 
+    aio_context_acquire(outer_ctx);
+
     job_txn_unref(txn);
 }
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Qemu-devel] [PATCH v2 17/17] test-bdrv-drain: AIO_WAIT_WHILE() in job .commit/.abort
  2018-09-13 12:52 [Qemu-devel] [PATCH v2 00/17] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
                   ` (15 preceding siblings ...)
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 16/17] job: Avoid deadlocks in job_completed_txn_abort() Kevin Wolf
@ 2018-09-13 12:52 ` Kevin Wolf
  2018-09-13 22:05   ` Max Reitz
  16 siblings, 1 reply; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 12:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, mreitz, famz, pbonzini, slp, jsnow, qemu-devel

This adds tests for calling AIO_WAIT_WHILE() in the .commit and .abort
callbacks. Both reasons why .abort could be called for a single job are
tested: Either .run or .prepare could return an error.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/test-bdrv-drain.c | 116 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 104 insertions(+), 12 deletions(-)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index e105c0ae84..bbc56a055f 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -784,6 +784,8 @@ static void test_iothread_drain_subtree(void)
 
 typedef struct TestBlockJob {
     BlockJob common;
+    int run_ret;
+    int prepare_ret;
     bool should_complete;
 } TestBlockJob;
 
@@ -793,7 +795,23 @@ static int test_job_prepare(Job *job)
 
     /* Provoke an AIO_WAIT_WHILE() call to verify there is no deadlock */
     blk_flush(s->common.blk);
-    return 0;
+    return s->prepare_ret;
+}
+
+static void test_job_commit(Job *job)
+{
+    TestBlockJob *s = container_of(job, TestBlockJob, common.job);
+
+    /* Provoke an AIO_WAIT_WHILE() call to verify there is no deadlock */
+    blk_flush(s->common.blk);
+}
+
+static void test_job_abort(Job *job)
+{
+    TestBlockJob *s = container_of(job, TestBlockJob, common.job);
+
+    /* Provoke an AIO_WAIT_WHILE() call to verify there is no deadlock */
+    blk_flush(s->common.blk);
 }
 
 static int coroutine_fn test_job_run(Job *job, Error **errp)
@@ -809,7 +827,7 @@ static int coroutine_fn test_job_run(Job *job, Error **errp)
         job_pause_point(&s->common.job);
     }
 
-    return 0;
+    return s->run_ret;
 }
 
 static void test_job_complete(Job *job, Error **errp)
@@ -827,14 +845,24 @@ BlockJobDriver test_job_driver = {
         .run            = test_job_run,
         .complete       = test_job_complete,
         .prepare        = test_job_prepare,
+        .commit         = test_job_commit,
+        .abort          = test_job_abort,
     },
 };
 
-static void test_blockjob_common(enum drain_type drain_type, bool use_iothread)
+enum test_job_result {
+    TEST_JOB_SUCCESS,
+    TEST_JOB_FAIL_RUN,
+    TEST_JOB_FAIL_PREPARE,
+};
+
+static void test_blockjob_common(enum drain_type drain_type, bool use_iothread,
+                                 enum test_job_result result)
 {
     BlockBackend *blk_src, *blk_target;
     BlockDriverState *src, *target;
     BlockJob *job;
+    TestBlockJob *tjob;
     IOThread *iothread = NULL;
     AioContext *ctx;
     int ret;
@@ -858,9 +886,23 @@ static void test_blockjob_common(enum drain_type drain_type, bool use_iothread)
     blk_insert_bs(blk_target, target, &error_abort);
 
     aio_context_acquire(ctx);
-    job = block_job_create("job0", &test_job_driver, NULL, src, 0, BLK_PERM_ALL,
-                           0, 0, NULL, NULL, &error_abort);
+    tjob = block_job_create("job0", &test_job_driver, NULL, src,
+                            0, BLK_PERM_ALL,
+                            0, 0, NULL, NULL, &error_abort);
+    job = &tjob->common;
     block_job_add_bdrv(job, "target", target, 0, BLK_PERM_ALL, &error_abort);
+
+    switch (result) {
+    case TEST_JOB_SUCCESS:
+        break;
+    case TEST_JOB_FAIL_RUN:
+        tjob->run_ret = -EIO;
+        break;
+    case TEST_JOB_FAIL_PREPARE:
+        tjob->prepare_ret = -EIO;
+        break;
+    }
+
     job_start(&job->job);
     aio_context_release(ctx);
 
@@ -918,7 +960,7 @@ static void test_blockjob_common(enum drain_type drain_type, bool use_iothread)
 
     aio_context_acquire(ctx);
     ret = job_complete_sync(&job->job, &error_abort);
-    g_assert_cmpint(ret, ==, 0);
+    g_assert_cmpint(ret, ==, (result == TEST_JOB_SUCCESS ? 0 : -EIO));
 
     if (use_iothread) {
         blk_set_aio_context(blk_src, qemu_get_aio_context());
@@ -937,32 +979,68 @@ static void test_blockjob_common(enum drain_type drain_type, bool use_iothread)
 
 static void test_blockjob_drain_all(void)
 {
-    test_blockjob_common(BDRV_DRAIN_ALL, false);
+    test_blockjob_common(BDRV_DRAIN_ALL, false, TEST_JOB_SUCCESS);
 }
 
 static void test_blockjob_drain(void)
 {
-    test_blockjob_common(BDRV_DRAIN, false);
+    test_blockjob_common(BDRV_DRAIN, false, TEST_JOB_SUCCESS);
 }
 
 static void test_blockjob_drain_subtree(void)
 {
-    test_blockjob_common(BDRV_SUBTREE_DRAIN, false);
+    test_blockjob_common(BDRV_SUBTREE_DRAIN, false, TEST_JOB_SUCCESS);
+}
+
+static void test_blockjob_error_drain_all(void)
+{
+    test_blockjob_common(BDRV_DRAIN_ALL, false, TEST_JOB_FAIL_RUN);
+    test_blockjob_common(BDRV_DRAIN_ALL, false, TEST_JOB_FAIL_PREPARE);
+}
+
+static void test_blockjob_error_drain(void)
+{
+    test_blockjob_common(BDRV_DRAIN, false, TEST_JOB_FAIL_RUN);
+    test_blockjob_common(BDRV_DRAIN, false, TEST_JOB_FAIL_PREPARE);
+}
+
+static void test_blockjob_error_drain_subtree(void)
+{
+    test_blockjob_common(BDRV_SUBTREE_DRAIN, false, TEST_JOB_FAIL_RUN);
+    test_blockjob_common(BDRV_SUBTREE_DRAIN, false, TEST_JOB_FAIL_PREPARE);
 }
 
 static void test_blockjob_iothread_drain_all(void)
 {
-    test_blockjob_common(BDRV_DRAIN_ALL, true);
+    test_blockjob_common(BDRV_DRAIN_ALL, true, TEST_JOB_SUCCESS);
 }
 
 static void test_blockjob_iothread_drain(void)
 {
-    test_blockjob_common(BDRV_DRAIN, true);
+    test_blockjob_common(BDRV_DRAIN, true, TEST_JOB_SUCCESS);
 }
 
 static void test_blockjob_iothread_drain_subtree(void)
 {
-    test_blockjob_common(BDRV_SUBTREE_DRAIN, true);
+    test_blockjob_common(BDRV_SUBTREE_DRAIN, true, TEST_JOB_SUCCESS);
+}
+
+static void test_blockjob_iothread_error_drain_all(void)
+{
+    test_blockjob_common(BDRV_DRAIN_ALL, true, TEST_JOB_FAIL_RUN);
+    test_blockjob_common(BDRV_DRAIN_ALL, true, TEST_JOB_FAIL_PREPARE);
+}
+
+static void test_blockjob_iothread_error_drain(void)
+{
+    test_blockjob_common(BDRV_DRAIN, true, TEST_JOB_FAIL_RUN);
+    test_blockjob_common(BDRV_DRAIN, true, TEST_JOB_FAIL_PREPARE);
+}
+
+static void test_blockjob_iothread_error_drain_subtree(void)
+{
+    test_blockjob_common(BDRV_SUBTREE_DRAIN, true, TEST_JOB_FAIL_RUN);
+    test_blockjob_common(BDRV_SUBTREE_DRAIN, true, TEST_JOB_FAIL_PREPARE);
 }
 
 
@@ -1434,6 +1512,13 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/blockjob/drain_subtree",
                     test_blockjob_drain_subtree);
 
+    g_test_add_func("/bdrv-drain/blockjob/error/drain_all",
+                    test_blockjob_error_drain_all);
+    g_test_add_func("/bdrv-drain/blockjob/error/drain",
+                    test_blockjob_error_drain);
+    g_test_add_func("/bdrv-drain/blockjob/error/drain_subtree",
+                    test_blockjob_error_drain_subtree);
+
     g_test_add_func("/bdrv-drain/blockjob/iothread/drain_all",
                     test_blockjob_iothread_drain_all);
     g_test_add_func("/bdrv-drain/blockjob/iothread/drain",
@@ -1441,6 +1526,13 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/blockjob/iothread/drain_subtree",
                     test_blockjob_iothread_drain_subtree);
 
+    g_test_add_func("/bdrv-drain/blockjob/iothread/error/drain_all",
+                    test_blockjob_iothread_error_drain_all);
+    g_test_add_func("/bdrv-drain/blockjob/iothread/error/drain",
+                    test_blockjob_iothread_error_drain);
+    g_test_add_func("/bdrv-drain/blockjob/iothread/error/drain_subtree",
+                    test_blockjob_iothread_error_drain_subtree);
+
     g_test_add_func("/bdrv-drain/deletion/drain", test_delete_by_drain);
     g_test_add_func("/bdrv-drain/detach/drain_all", test_detach_by_drain_all);
     g_test_add_func("/bdrv-drain/detach/drain", test_detach_by_drain);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/17] job: Fix missing locking due to mismerge
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 01/17] job: Fix missing locking due to mismerge Kevin Wolf
@ 2018-09-13 13:56   ` Max Reitz
  2018-09-13 17:38   ` John Snow
  1 sibling, 0 replies; 47+ messages in thread
From: Max Reitz @ 2018-09-13 13:56 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: famz, pbonzini, slp, jsnow, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 611 bytes --]

On 13.09.18 14:52, Kevin Wolf wrote:
> job_completed() had a problem with double locking that was recently
> fixed independently by two different commits:
> 
> "job: Fix nested aio_poll() hanging in job_txn_apply"
> "jobs: add exit shim"
> 
> One fix removed the first aio_context_acquire(), the other fix removed
> the other one. Now we have a bug again and the code is run without any
> locking.
> 
> Add it back in one of the places.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  job.c | 4 ++++
>  1 file changed, 4 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/17] blockjob: Wake up BDS when job becomes idle
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 02/17] blockjob: Wake up BDS when job becomes idle Kevin Wolf
@ 2018-09-13 14:31   ` Max Reitz
  0 siblings, 0 replies; 47+ messages in thread
From: Max Reitz @ 2018-09-13 14:31 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: famz, pbonzini, slp, jsnow, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 962 bytes --]

On 13.09.18 14:52, Kevin Wolf wrote:
> In the context of draining a BDS, the .drained_poll callback of block
> jobs is called. If this returns true (i.e. there is still some activity
> pending), the drain operation may call aio_poll() with blocking=true to
> wait for completion.
> 
> As soon as the pending activity is completed and the job finally arrives
> in a quiescent state (i.e. its coroutine either yields with busy=false
> or terminates), the block job must notify the aio_poll() loop to wake
> up, otherwise we get a deadlock if both are running in different
> threads.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Fam Zheng <famz@redhat.com>
> ---
>  include/block/blockjob.h | 13 +++++++++++++
>  include/qemu/job.h       |  3 +++
>  blockjob.c               | 18 ++++++++++++++++++
>  job.c                    |  7 +++++++
>  4 files changed, 41 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/17] job: Use AIO_WAIT_WHILE() in job_finish_sync()
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 06/17] job: Use AIO_WAIT_WHILE() in job_finish_sync() Kevin Wolf
@ 2018-09-13 14:45   ` Max Reitz
  2018-09-13 15:15   ` Paolo Bonzini
  1 sibling, 0 replies; 47+ messages in thread
From: Max Reitz @ 2018-09-13 14:45 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: famz, pbonzini, slp, jsnow, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 788 bytes --]

On 13.09.18 14:52, Kevin Wolf wrote:
> job_finish_sync() needs to release the AioContext lock of the job before
> calling aio_poll(). Otherwise, callbacks called by aio_poll() would
> possibly take the lock a second time and run into a deadlock with a
> nested AIO_WAIT_WHILE() call.
> 
> Also, job_drain() without aio_poll() isn't necessarily enough to make
> progress on a job, it could depend on bottom halves to be executed.
> 
> Combine both open-coded while loops into a single AIO_WAIT_WHILE() call
> that solves both of these problems.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Fam Zheng <famz@redhat.com>
> ---
>  job.c | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/17] block: Add missing locking in bdrv_co_drain_bh_cb()
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 08/17] block: Add missing locking in bdrv_co_drain_bh_cb() Kevin Wolf
@ 2018-09-13 14:58   ` Max Reitz
  2018-09-13 15:17   ` Paolo Bonzini
  1 sibling, 0 replies; 47+ messages in thread
From: Max Reitz @ 2018-09-13 14:58 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: famz, pbonzini, slp, jsnow, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 888 bytes --]

On 13.09.18 14:52, Kevin Wolf wrote:
> bdrv_do_drained_begin/end() assume that they are called with the
> AioContext lock of bs held. If we call drain functions from a coroutine
> with the AioContext lock held, we yield and schedule a BH to move out of
> coroutine context. This means that the lock for the home context of the
> coroutine is released and must be re-acquired in the bottom half.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/qemu/coroutine.h |  5 +++++
>  block/io.c               | 15 +++++++++++++++
>  util/qemu-coroutine.c    |  5 +++++
>  3 files changed, 25 insertions(+)

I suppose the coroutine lock you mean is the one aio_co_enter() takes?

If so:

Reviewed-by: Max Reitz <mreitz@redhat.com>

(Sorry, I still really don't have much understanding of how coroutines,
bottom halves, and all nice things like them work.)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/17] block-backend: Add .drained_poll callback
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 09/17] block-backend: Add .drained_poll callback Kevin Wolf
@ 2018-09-13 15:01   ` Max Reitz
  0 siblings, 0 replies; 47+ messages in thread
From: Max Reitz @ 2018-09-13 15:01 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: famz, pbonzini, slp, jsnow, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 526 bytes --]

On 13.09.18 14:52, Kevin Wolf wrote:
> A bdrv_drain operation must ensure that all parents are quiesced, this
> includes BlockBackends. Otherwise, callbacks called by requests that are
> completed on the BDS layer, but not quite yet on the BlockBackend layer
> could still create new requests.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Fam Zheng <famz@redhat.com>
> ---
>  block/block-backend.c | 9 +++++++++
>  1 file changed, 9 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback Kevin Wolf
@ 2018-09-13 15:10   ` Paolo Bonzini
  2018-09-13 16:59     ` Kevin Wolf
  2018-09-13 20:50   ` Max Reitz
  1 sibling, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2018-09-13 15:10 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, famz, slp, jsnow, qemu-devel

On 13/09/2018 14:52, Kevin Wolf wrote:
> + if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
> + /* If we are in the main thread, the callback is allowed to unref
> + * the BlockBackend, so we have to hold an additional reference */
> + blk_ref(acb->rwco.blk);
> + }
> acb->common.cb(acb->common.opaque, acb->rwco.ret);
> + blk_dec_in_flight(acb->rwco.blk);
> + if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
> + blk_unref(acb->rwco.blk);
> + }

Is this something that happens only for some specific callers?  That is,
which callers are sure that the callback is invoked from the main thread?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/17] aio-wait: Increase num_waiters even in home thread
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 03/17] aio-wait: Increase num_waiters even in home thread Kevin Wolf
@ 2018-09-13 15:11   ` Paolo Bonzini
  2018-09-13 17:21     ` Kevin Wolf
  0 siblings, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2018-09-13 15:11 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, famz, slp, jsnow, qemu-devel

On 13/09/2018 14:52, Kevin Wolf wrote:
> Even if AIO_WAIT_WHILE() is called in the home context of the
> AioContext, we still want to allow the condition to change depending on
> other threads as long as they kick the AioWait. Specfically block jobs
> can be running in an I/O thread and should then be able to kick a drain
> in the main loop context.

I don't understand the scenario very well.  Why hasn't the main loop's
drain incremented num_waiters?

Note I'm not against the patch---though I would hoist the
atomic_inc/atomic_dec outside the if, since it's done in both branches.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/17] job: Use AIO_WAIT_WHILE() in job_finish_sync()
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 06/17] job: Use AIO_WAIT_WHILE() in job_finish_sync() Kevin Wolf
  2018-09-13 14:45   ` Max Reitz
@ 2018-09-13 15:15   ` Paolo Bonzini
  2018-09-13 17:39     ` Kevin Wolf
  1 sibling, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2018-09-13 15:15 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, famz, slp, jsnow, qemu-devel

On 13/09/2018 14:52, Kevin Wolf wrote:
> Also, job_drain() without aio_poll() isn't necessarily enough to make
> progress on a job, it could depend on bottom halves to be executed.

This is not changed by the patch though; AIO_WAIT_WHILE does not do the
bottom halves part anymore, bdrv_drain_poll_top_level does it instead.

What is it that needs bottom halves to be executed?  Could it delay
switching the job state until after that?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/17] block: Add missing locking in bdrv_co_drain_bh_cb()
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 08/17] block: Add missing locking in bdrv_co_drain_bh_cb() Kevin Wolf
  2018-09-13 14:58   ` Max Reitz
@ 2018-09-13 15:17   ` Paolo Bonzini
  2018-09-13 17:36     ` Kevin Wolf
  1 sibling, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2018-09-13 15:17 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, famz, slp, jsnow, qemu-devel

On 13/09/2018 14:52, Kevin Wolf wrote:
> bdrv_do_drained_begin/end() assume that they are called with the
> AioContext lock of bs held. If we call drain functions from a coroutine
> with the AioContext lock held, we yield and schedule a BH to move out of
> coroutine context. This means that the lock for the home context of the
> coroutine is released and must be re-acquired in the bottom half.

What exactly needs the lock, is it bdrv_drain_invoke?

Would it make sense to always do release/acquire in bdrv_drain, and
always do acquire/release in bdrv_drain_invoke?  (Conditional locking is
tricky...).

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/17] block-backend: Fix potential double blk_delete()
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 10/17] block-backend: Fix potential double blk_delete() Kevin Wolf
@ 2018-09-13 15:19   ` Paolo Bonzini
  2018-09-13 19:50   ` Max Reitz
  1 sibling, 0 replies; 47+ messages in thread
From: Paolo Bonzini @ 2018-09-13 15:19 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, famz, slp, jsnow, qemu-devel

On 13/09/2018 14:52, Kevin Wolf wrote:
> blk_unref() first decreases the refcount of the BlockBackend and calls
> blk_delete() if the refcount reaches zero. Requests can still be in
> flight at this point, they are only drained during blk_delete():
> 
> At this point, arbitrary callbacks can run. If any callback takes a
> temporary BlockBackend reference, it will first increase the refcount to
> 1 and then decrease it to 0 again, triggering another blk_delete(). This
> will cause a use-after-free crash in the outer blk_delete().
> 
> Fix it by draining the BlockBackend before decreasing to refcount to 0.
> Assert in blk_ref() that it never takes the first refcount (which would
> mean that the BlockBackend is already being deleted).
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Fam Zheng <famz@redhat.com>
> ---
>  block/block-backend.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index efa8d8011c..1b2d7a6ff5 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -435,6 +435,7 @@ int blk_get_refcnt(BlockBackend *blk)
>   */
>  void blk_ref(BlockBackend *blk)
>  {
> +    assert(blk->refcnt > 0);
>      blk->refcnt++;
>  }
>  
> @@ -447,7 +448,11 @@ void blk_unref(BlockBackend *blk)
>  {
>      if (blk) {
>          assert(blk->refcnt > 0);
> -        if (!--blk->refcnt) {
> +        if (blk->refcnt > 1) {
> +            blk->refcnt--;
> +        } else {
> +            blk_drain(blk);

Maybe "assert(blk->refcnt == 1);" here to ensure that blk_drain() cannot
resurrect the block device, or alternatively

+    if (blk->refcnt == 1) {
+        blk_drain(blk);
+        assert(blk->refcnt == 1);
+    }
     if (!--blk->refcnt) {
         blk_delete(blk);
     }

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback
  2018-09-13 15:10   ` Paolo Bonzini
@ 2018-09-13 16:59     ` Kevin Wolf
  2018-09-14  7:47       ` Fam Zheng
  2018-09-14 15:12       ` Paolo Bonzini
  0 siblings, 2 replies; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 16:59 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, mreitz, famz, slp, jsnow, qemu-devel

Am 13.09.2018 um 17:10 hat Paolo Bonzini geschrieben:
> On 13/09/2018 14:52, Kevin Wolf wrote:
> > + if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
> > + /* If we are in the main thread, the callback is allowed to unref
> > + * the BlockBackend, so we have to hold an additional reference */
> > + blk_ref(acb->rwco.blk);
> > + }
> > acb->common.cb(acb->common.opaque, acb->rwco.ret);
> > + blk_dec_in_flight(acb->rwco.blk);
> > + if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
> > + blk_unref(acb->rwco.blk);
> > + }
> 
> Is this something that happens only for some specific callers?  That is,
> which callers are sure that the callback is invoked from the main thread?

I can't seem to reproduce the problem I saw any more even when reverting
the bdrv_ref/unref pair. If I remember correctly it was actually a
nested aio_poll() that was running a block job completion or something
like that - which would obviously only happen on the main thread because
the job intentionally defers to the main thread.

The only reason I made this conditional is that I think bdrv_unref()
still isn't safe outside the main thread, is it?

Kevin

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/17] aio-wait: Increase num_waiters even in home thread
  2018-09-13 15:11   ` Paolo Bonzini
@ 2018-09-13 17:21     ` Kevin Wolf
  2018-09-14 15:14       ` Paolo Bonzini
  0 siblings, 1 reply; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 17:21 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, mreitz, famz, slp, jsnow, qemu-devel

Am 13.09.2018 um 17:11 hat Paolo Bonzini geschrieben:
> On 13/09/2018 14:52, Kevin Wolf wrote:
> > Even if AIO_WAIT_WHILE() is called in the home context of the
> > AioContext, we still want to allow the condition to change depending on
> > other threads as long as they kick the AioWait. Specfically block jobs
> > can be running in an I/O thread and should then be able to kick a drain
> > in the main loop context.
> 
> I don't understand the scenario very well.  Why hasn't the main loop's
> drain incremented num_waiters?

We are in this path (that didn't increase num_waiters before this patch)
because drain, and therefore AIO_WAIT_WHILE(), was called from the main
thread. But draining depends on a job in a different thread, so we need
to be able to be kicked when that job finally is quiesced.

If I revert this, the test /bdrv-drain/blockjob/iothread/drain hangs.
This is a block job that works on two nodes in two different contexts.

(I think I saw this with mirror, which doesn't take additional locks
when it issues a request, so maybe there's a bit more wrong there... We
clearly need more testing with iothreads, this series probably only
scratches the surface.)

> Note I'm not against the patch---though I would hoist the
> atomic_inc/atomic_dec outside the if, since it's done in both branches.

Ok.

Kevin

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/17] block: Add missing locking in bdrv_co_drain_bh_cb()
  2018-09-13 15:17   ` Paolo Bonzini
@ 2018-09-13 17:36     ` Kevin Wolf
  0 siblings, 0 replies; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 17:36 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, mreitz, famz, slp, jsnow, qemu-devel

Am 13.09.2018 um 17:17 hat Paolo Bonzini geschrieben:
> On 13/09/2018 14:52, Kevin Wolf wrote:
> > bdrv_do_drained_begin/end() assume that they are called with the
> > AioContext lock of bs held. If we call drain functions from a coroutine
> > with the AioContext lock held, we yield and schedule a BH to move out of
> > coroutine context. This means that the lock for the home context of the
> > coroutine is released and must be re-acquired in the bottom half.
> 
> What exactly needs the lock, is it bdrv_drain_invoke?
> 
> Would it make sense to always do release/acquire in bdrv_drain, and
> always do acquire/release in bdrv_drain_invoke?  (Conditional locking is
> tricky...).

The thing that made it obvious was an aio_poll() call around which we
want to release the lock temporarily, and if you don't hold it, you get
a crash. This aio_poll() has actually disappeared in v2, and I'm not
sure if AIO_WAIT_WHILE() can hit it, but I think locking is still right.

I'm not sure what data structures are actually protected by it, but the
simple rule as documented for bdrv_co_drain() has always been to hold
the AioContext lock of bs when you call bdrv_drain(bs), so this patch
just obeys it.

Kevin

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/17] job: Fix missing locking due to mismerge
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 01/17] job: Fix missing locking due to mismerge Kevin Wolf
  2018-09-13 13:56   ` Max Reitz
@ 2018-09-13 17:38   ` John Snow
  1 sibling, 0 replies; 47+ messages in thread
From: John Snow @ 2018-09-13 17:38 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: famz, slp, qemu-devel, mreitz, pbonzini



On 09/13/2018 08:52 AM, Kevin Wolf wrote:
> job_completed() had a problem with double locking that was recently
> fixed independently by two different commits:
> 
> "job: Fix nested aio_poll() hanging in job_txn_apply"
> "jobs: add exit shim"
> 
> One fix removed the first aio_context_acquire(), the other fix removed
> the other one. Now we have a bug again and the code is run without any
> locking.
> 
> Add it back in one of the places.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  job.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/job.c b/job.c
> index 82b46929bd..5c4e84f007 100644
> --- a/job.c
> +++ b/job.c
> @@ -847,7 +847,11 @@ static void job_completed(Job *job)
>  static void job_exit(void *opaque)
>  {
>      Job *job = (Job *)opaque;
> +    AioContext *ctx = job->aio_context;
> +
> +    aio_context_acquire(ctx);
>      job_completed(job);
> +    aio_context_release(ctx);
>  }
>  
>  /**
> 

OK =]
Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/17] job: Use AIO_WAIT_WHILE() in job_finish_sync()
  2018-09-13 15:15   ` Paolo Bonzini
@ 2018-09-13 17:39     ` Kevin Wolf
  0 siblings, 0 replies; 47+ messages in thread
From: Kevin Wolf @ 2018-09-13 17:39 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, mreitz, famz, slp, jsnow, qemu-devel

Am 13.09.2018 um 17:15 hat Paolo Bonzini geschrieben:
> On 13/09/2018 14:52, Kevin Wolf wrote:
> > Also, job_drain() without aio_poll() isn't necessarily enough to make
> > progress on a job, it could depend on bottom halves to be executed.
> 
> This is not changed by the patch though; AIO_WAIT_WHILE does not do the
> bottom halves part anymore, bdrv_drain_poll_top_level does it instead.
> 
> What is it that needs bottom halves to be executed?  Could it delay
> switching the job state until after that?

AIO_WAIT_WHILE() does still execute bottom halves if the condition is
true, which is all that is needed to make progress. It only doesn't
execute them unconditionally any more.

Kevin

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/17] block-backend: Fix potential double blk_delete()
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 10/17] block-backend: Fix potential double blk_delete() Kevin Wolf
  2018-09-13 15:19   ` Paolo Bonzini
@ 2018-09-13 19:50   ` Max Reitz
  1 sibling, 0 replies; 47+ messages in thread
From: Max Reitz @ 2018-09-13 19:50 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: famz, pbonzini, slp, jsnow, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 993 bytes --]

On 13.09.18 14:52, Kevin Wolf wrote:
> blk_unref() first decreases the refcount of the BlockBackend and calls
> blk_delete() if the refcount reaches zero. Requests can still be in
> flight at this point, they are only drained during blk_delete():
> 
> At this point, arbitrary callbacks can run. If any callback takes a
> temporary BlockBackend reference, it will first increase the refcount to
> 1 and then decrease it to 0 again, triggering another blk_delete(). This
> will cause a use-after-free crash in the outer blk_delete().
> 
> Fix it by draining the BlockBackend before decreasing to refcount to 0.
> Assert in blk_ref() that it never takes the first refcount (which would
> mean that the BlockBackend is already being deleted).
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Fam Zheng <famz@redhat.com>
> ---
>  block/block-backend.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback Kevin Wolf
  2018-09-13 15:10   ` Paolo Bonzini
@ 2018-09-13 20:50   ` Max Reitz
  1 sibling, 0 replies; 47+ messages in thread
From: Max Reitz @ 2018-09-13 20:50 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: famz, pbonzini, slp, jsnow, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 586 bytes --]

On 13.09.18 14:52, Kevin Wolf wrote:
> Request callbacks can do pretty much anything, including operations that
> will yield from the coroutine (such as draining the backend). In that
> case, a decreased in_flight would be visible to other code and could
> lead to a drain completing while the callback hasn't actually completed
> yet.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Fam Zheng <famz@redhat.com>
> ---
>  block/block-backend.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 12/17] mirror: Fix potential use-after-free in active commit
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 12/17] mirror: Fix potential use-after-free in active commit Kevin Wolf
@ 2018-09-13 20:55   ` Max Reitz
  2018-09-13 21:43     ` Max Reitz
  2018-09-14 16:25     ` Kevin Wolf
  0 siblings, 2 replies; 47+ messages in thread
From: Max Reitz @ 2018-09-13 20:55 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: famz, pbonzini, slp, jsnow, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 640 bytes --]

On 13.09.18 14:52, Kevin Wolf wrote:
> When starting an active commit job, other callbacks can run before
> mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
> go away. Add another pair of bdrv_ref/unref() around it to protect
> against this case.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/mirror.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>

But...  How?

Like...  You mirror to some target (in an iothread), then you give that
target a backing file, then you cancel the mirror and immediately commit
the target?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 12/17] mirror: Fix potential use-after-free in active commit
  2018-09-13 20:55   ` Max Reitz
@ 2018-09-13 21:43     ` Max Reitz
  2018-09-14 16:25     ` Kevin Wolf
  1 sibling, 0 replies; 47+ messages in thread
From: Max Reitz @ 2018-09-13 21:43 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: famz, pbonzini, slp, jsnow, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1116 bytes --]

On 13.09.18 22:55, Max Reitz wrote:
> On 13.09.18 14:52, Kevin Wolf wrote:
>> When starting an active commit job, other callbacks can run before
>> mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
>> go away. Add another pair of bdrv_ref/unref() around it to protect
>> against this case.
>>
>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>> ---
>>  block/mirror.c | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
> 
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 
> But...  How?
> 
> Like...  You mirror to some target (in an iothread), then you give that
> target a backing file, then you cancel the mirror and immediately commit
> the target?

The only way I got this to work was to allow commit to accept a non-root
BDS as @device.  I can't imagine a way where @device can go away, but
isn't currently in use by something that would make it a non-root BDS.
(Because the only reason someone can make it go away is because that
someone uses it right now.)

But if commit accepts non-root BDSs as @device, I get a segfault even
after this commit...

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 13/17] blockjob: Lie better in child_job_drained_poll()
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 13/17] blockjob: Lie better in child_job_drained_poll() Kevin Wolf
@ 2018-09-13 21:52   ` Max Reitz
  0 siblings, 0 replies; 47+ messages in thread
From: Max Reitz @ 2018-09-13 21:52 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: famz, pbonzini, slp, jsnow, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1468 bytes --]

On 13.09.18 14:52, Kevin Wolf wrote:
> Block jobs claim in .drained_poll() that they are in a quiescent state
> as soon as job->deferred_to_main_loop is true. This is obviously wrong,
> they still have a completion BH to run. We only get away with this
> because commit 91af091f923 added an unconditional aio_poll(false) to the
> drain functions, but this is bypassing the regular drain mechanisms.
> 
> However, just removing this and telling that the job is still active
> doesn't work either: The completion callbacks themselves call drain
> functions (directly, or indirectly with bdrv_reopen), so they would
> deadlock then.
> 
> As a better lie, tell that the job is active as long as the BH is
> pending, but falsely call it quiescent from the point in the BH when the
> completion callback is called. At this point, nested drain calls won't
> deadlock because they ignore the job, and outer drains will wait for the
> job to really reach a quiescent state because the callback is already
> running.

...because it's running in the main loop, so it's impossible for another
drain to run there concurrently.  I suppose.

At least that's what makes this patch make sense to me.

> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/qemu/job.h |  3 +++
>  blockjob.c         |  2 +-
>  job.c              | 11 ++++++++++-
>  3 files changed, 14 insertions(+), 2 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 484 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/17] block: Remove aio_poll() in bdrv_drain_poll variants
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 14/17] block: Remove aio_poll() in bdrv_drain_poll variants Kevin Wolf
@ 2018-09-13 21:55   ` Max Reitz
  0 siblings, 0 replies; 47+ messages in thread
From: Max Reitz @ 2018-09-13 21:55 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: famz, pbonzini, slp, jsnow, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 856 bytes --]

On 13.09.18 14:52, Kevin Wolf wrote:
> bdrv_drain_poll_top_level() was buggy because it didn't release the
> AioContext lock of the node to be drained before calling aio_poll().
> This way, callbacks called by aio_poll() would possibly take the lock a
> second time and run into a deadlock with a nested AIO_WAIT_WHILE() call.
> 
> However, it turns out that the aio_poll() call isn't actually needed any
> more. It was introduced in commit 91af091f923, which is effectively
> reverted by this patch. The cases it was supposed to fix are now covered
> by bdrv_drain_poll(), which waits for block jobs to reach a quiescent
> state.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Fam Zheng <famz@redhat.com>
> ---
>  block/io.c | 8 --------
>  1 file changed, 8 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 16/17] job: Avoid deadlocks in job_completed_txn_abort()
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 16/17] job: Avoid deadlocks in job_completed_txn_abort() Kevin Wolf
@ 2018-09-13 22:01   ` Max Reitz
  0 siblings, 0 replies; 47+ messages in thread
From: Max Reitz @ 2018-09-13 22:01 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: famz, pbonzini, slp, jsnow, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 586 bytes --]

On 13.09.18 14:52, Kevin Wolf wrote:
> Amongst others, job_finalize_single() calls the .prepare/.commit/.abort
> callbacks of the individual job driver. Recently, their use was adapted
> for all block jobs so that they involve code calling AIO_WAIT_WHILE()
> now. Such code must be called under the AioContext lock for the
> respective job, but without holding any other AioContext lock.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  job.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 17/17] test-bdrv-drain: AIO_WAIT_WHILE() in job .commit/.abort
  2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 17/17] test-bdrv-drain: AIO_WAIT_WHILE() in job .commit/.abort Kevin Wolf
@ 2018-09-13 22:05   ` Max Reitz
  0 siblings, 0 replies; 47+ messages in thread
From: Max Reitz @ 2018-09-13 22:05 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: famz, pbonzini, slp, jsnow, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 490 bytes --]

On 13.09.18 14:52, Kevin Wolf wrote:
> This adds tests for calling AIO_WAIT_WHILE() in the .commit and .abort
> callbacks. Both reasons why .abort could be called for a single job are
> tested: Either .run or .prepare could return an error.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  tests/test-bdrv-drain.c | 116 +++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 104 insertions(+), 12 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback
  2018-09-13 16:59     ` Kevin Wolf
@ 2018-09-14  7:47       ` Fam Zheng
  2018-09-14 15:12       ` Paolo Bonzini
  1 sibling, 0 replies; 47+ messages in thread
From: Fam Zheng @ 2018-09-14  7:47 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, qemu-block, mreitz, slp, jsnow, qemu-devel

On Thu, 09/13 18:59, Kevin Wolf wrote:
> Am 13.09.2018 um 17:10 hat Paolo Bonzini geschrieben:
> > On 13/09/2018 14:52, Kevin Wolf wrote:
> > > + if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
> > > + /* If we are in the main thread, the callback is allowed to unref
> > > + * the BlockBackend, so we have to hold an additional reference */
> > > + blk_ref(acb->rwco.blk);
> > > + }
> > > acb->common.cb(acb->common.opaque, acb->rwco.ret);
> > > + blk_dec_in_flight(acb->rwco.blk);
> > > + if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
> > > + blk_unref(acb->rwco.blk);
> > > + }
> > 
> > Is this something that happens only for some specific callers?  That is,
> > which callers are sure that the callback is invoked from the main thread?
> 
> I can't seem to reproduce the problem I saw any more even when reverting
> the bdrv_ref/unref pair. If I remember correctly it was actually a
> nested aio_poll() that was running a block job completion or something
> like that - which would obviously only happen on the main thread because
> the job intentionally defers to the main thread.
> 
> The only reason I made this conditional is that I think bdrv_unref()
> still isn't safe outside the main thread, is it?

I think that is correct.

Fam

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback
  2018-09-13 16:59     ` Kevin Wolf
  2018-09-14  7:47       ` Fam Zheng
@ 2018-09-14 15:12       ` Paolo Bonzini
  2018-09-14 17:14         ` Kevin Wolf
  1 sibling, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2018-09-14 15:12 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, famz, slp, jsnow, qemu-devel

On 13/09/2018 18:59, Kevin Wolf wrote:
> Am 13.09.2018 um 17:10 hat Paolo Bonzini geschrieben:
>> On 13/09/2018 14:52, Kevin Wolf wrote:
>>> + if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
>>> + /* If we are in the main thread, the callback is allowed to unref
>>> + * the BlockBackend, so we have to hold an additional reference */
>>> + blk_ref(acb->rwco.blk);
>>> + }
>>> acb->common.cb(acb->common.opaque, acb->rwco.ret);
>>> + blk_dec_in_flight(acb->rwco.blk);
>>> + if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
>>> + blk_unref(acb->rwco.blk);
>>> + }
>>
>> Is this something that happens only for some specific callers?  That is,
>> which callers are sure that the callback is invoked from the main thread?
> 
> I can't seem to reproduce the problem I saw any more even when reverting
> the bdrv_ref/unref pair. If I remember correctly it was actually a
> nested aio_poll() that was running a block job completion or something
> like that - which would obviously only happen on the main thread because
> the job intentionally defers to the main thread.
> 
> The only reason I made this conditional is that I think bdrv_unref()
> still isn't safe outside the main thread, is it?

Yes, making it conditional is correct, but it is quite fishy even with
the conditional.

As you mention, you could have a nested aio_poll() in the main thread,
for example invoked from a bottom half, but in that case I'd rather
track the caller that is creating the bottom half and see if it lacks a
bdrv_ref/bdrv_unref (or perhaps it's even higher in the tree that is
missing).

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/17] aio-wait: Increase num_waiters even in home thread
  2018-09-13 17:21     ` Kevin Wolf
@ 2018-09-14 15:14       ` Paolo Bonzini
  0 siblings, 0 replies; 47+ messages in thread
From: Paolo Bonzini @ 2018-09-14 15:14 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, famz, slp, jsnow, qemu-devel

On 13/09/2018 19:21, Kevin Wolf wrote:
> Am 13.09.2018 um 17:11 hat Paolo Bonzini geschrieben:
>> On 13/09/2018 14:52, Kevin Wolf wrote:
>>> Even if AIO_WAIT_WHILE() is called in the home context of the
>>> AioContext, we still want to allow the condition to change depending on
>>> other threads as long as they kick the AioWait. Specfically block jobs
>>> can be running in an I/O thread and should then be able to kick a drain
>>> in the main loop context.
>>
>> I don't understand the scenario very well.  Why hasn't the main loop's
>> drain incremented num_waiters?
> 
> We are in this path (that didn't increase num_waiters before this patch)
> because drain, and therefore AIO_WAIT_WHILE(), was called from the main
> thread. But draining depends on a job in a different thread, so we need
> to be able to be kicked when that job finally is quiesced.

Ah, because AIO_WAIT_WHILE() is invoked with ctx ==
qemu_get_aio_context(), but the condition is triggered *outside* the
main context?  Tricky, but seems correct.  AIO_WAIT_WHILE() anyway is
not a fast path.

Paolo

> If I revert this, the test /bdrv-drain/blockjob/iothread/drain hangs.
> This is a block job that works on two nodes in two different contexts.
> 
> (I think I saw this with mirror, which doesn't take additional locks
> when it issues a request, so maybe there's a bit more wrong there... We
> clearly need more testing with iothreads, this series probably only
> scratches the surface.)
> 
>> Note I'm not against the patch---though I would hoist the
>> atomic_inc/atomic_dec outside the if, since it's done in both branches.
> 
> Ok.
> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 12/17] mirror: Fix potential use-after-free in active commit
  2018-09-13 20:55   ` Max Reitz
  2018-09-13 21:43     ` Max Reitz
@ 2018-09-14 16:25     ` Kevin Wolf
  1 sibling, 0 replies; 47+ messages in thread
From: Kevin Wolf @ 2018-09-14 16:25 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-block, famz, pbonzini, slp, jsnow, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 5901 bytes --]

Am 13.09.2018 um 22:55 hat Max Reitz geschrieben:
> On 13.09.18 14:52, Kevin Wolf wrote:
> > When starting an active commit job, other callbacks can run before
> > mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
> > go away. Add another pair of bdrv_ref/unref() around it to protect
> > against this case.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/mirror.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> 
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 
> But...  How?

Tried to reproduce the other bug Paolo was concerned about (good there
is something like 'git reflog'!) and dug up this one instead.

So the scenario seems to be test_stream_commit_1 in qemu-iotests 030.

The backtrace when the BDS is deleted is the following:

(rr) bt
#0  0x00007faeaf6145ec in __memset_sse2_unaligned_erms () at /lib64/libc.so.6
#1  0x00007faeaf60414e in _int_free () at /lib64/libc.so.6
#2  0x00007faeaf6093be in free () at /lib64/libc.so.6
#3  0x00007faecaea6b4e in g_free () at /lib64/libglib-2.0.so.0
#4  0x000055c0c94f9d6c in bdrv_delete (bs=0x55c0cbe69240) at block.c:3590
#5  0x000055c0c94fbe82 in bdrv_unref (bs=0x55c0cbe69240) at block.c:4638
#6  0x000055c0c94f6761 in bdrv_root_unref_child (child=0x55c0cbe57af0) at block.c:2188
#7  0x000055c0c94fe239 in block_job_remove_all_bdrv (job=0x55c0ccf9e220) at blockjob.c:200
#8  0x000055c0c94fdf1f in block_job_free (job=0x55c0ccf9e220) at blockjob.c:94
#9  0x000055c0c94ffe0d in job_unref (job=0x55c0ccf9e220) at job.c:368
#10 0x000055c0c95007cd in job_do_dismiss (job=0x55c0ccf9e220) at job.c:641
#11 0x000055c0c95008d9 in job_conclude (job=0x55c0ccf9e220) at job.c:667
#12 0x000055c0c9500b66 in job_finalize_single (job=0x55c0ccf9e220) at job.c:735
#13 0x000055c0c94ff4fa in job_txn_apply (txn=0x55c0cbe19eb0, fn=0x55c0c9500a62 <job_finalize_single>, lock=true) at job.c:151
#14 0x000055c0c9500e92 in job_do_finalize (job=0x55c0ccf9e220) at job.c:822
#15 0x000055c0c9501040 in job_completed_txn_success (job=0x55c0ccf9e220) at job.c:872
#16 0x000055c0c950115c in job_completed (job=0x55c0ccf9e220, ret=0, error=0x0) at job.c:892
#17 0x000055c0c92b572c in stream_complete (job=0x55c0ccf9e220, opaque=0x55c0cc050bc0) at block/stream.c:96
#18 0x000055c0c950142b in job_defer_to_main_loop_bh (opaque=0x55c0cbe38a50) at job.c:981
#19 0x000055c0c96171fc in aio_bh_call (bh=0x55c0cbe19f20) at util/async.c:90
#20 0x000055c0c9617294 in aio_bh_poll (ctx=0x55c0cbd7b8d0) at util/async.c:118
#21 0x000055c0c961c150 in aio_poll (ctx=0x55c0cbd7b8d0, blocking=true) at util/aio-posix.c:690
#22 0x000055c0c957246c in bdrv_flush (bs=0x55c0cbe4b250) at block/io.c:2693
#23 0x000055c0c94f8e46 in bdrv_reopen_prepare (reopen_state=0x55c0cbef4d68, queue=0x55c0cbeab9f0, errp=0x7ffd65617050) at block.c:3206
#24 0x000055c0c94f883a in bdrv_reopen_multiple (ctx=0x55c0cbd7b8d0, bs_queue=0x55c0cbeab9f0, errp=0x7ffd656170c8) at block.c:3032
#25 0x000055c0c94f8a00 in bdrv_reopen (bs=0x55c0cbe2c250, bdrv_flags=8194, errp=0x7ffd65617220) at block.c:3075
#26 0x000055c0c956a23f in commit_active_start (job_id=0x0, bs=0x55c0cbd905b0, base=0x55c0cbe2c250, creation_flags=0, speed=0, on_error=BLOCKDEV_ON_ERROR_REPORT, filter_node_name=0x0, cb=0x0, opaque=0x0, auto_complete=false, errp=0x7ffd65617220) at block/mirror.c:1687
#27 0x000055c0c927437c in qmp_block_commit (has_job_id=false, job_id=0x0, device=0x55c0cbef4d20 "drive0", has_base_node=false, base_node=0x0, has_base=true, base=0x55c0cbe38a00 "/home/kwolf/source/qemu/tests/qemu-iotests/scratch/img-3.img", has_top_node=false, top_node=0x0, has_top=false, top=0x0, has_backing_file=false, backing_file=0x0, has_speed=false, speed=0, has_filter_node_name=false, filter_node_name=0x0, errp=0x7ffd656172d8) at blockdev.c:3325
#28 0x000055c0c928bb08 in qmp_marshal_block_commit (args=<optimized out>, ret=<optimized out>, errp=0x7ffd656173b8) at qapi/qapi-commands-block-core.c:409
#29 0x000055c0c960beef in do_qmp_dispatch (errp=0x7ffd656173b0, allow_oob=false, request=<optimized out>, cmds=0x55c0c9f56f80 <qmp_commands>) at qapi/qmp-dispatch.c:129
#30 0x000055c0c960beef in qmp_dispatch (cmds=0x55c0c9f56f80 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:171
#31 0x000055c0c9111623 in monitor_qmp_dispatch (mon=0x55c0cbdbb930, req=0x55c0ccf90e00, id=0x0) at /home/kwolf/source/qemu/monitor.c:4168
#32 0x000055c0c911191f in monitor_qmp_bh_dispatcher (data=0x0) at /home/kwolf/source/qemu/monitor.c:4237
#33 0x000055c0c96171fc in aio_bh_call (bh=0x55c0cbccc840) at util/async.c:90
#34 0x000055c0c9617294 in aio_bh_poll (ctx=0x55c0cbccc550) at util/async.c:118
#35 0x000055c0c961b723 in aio_dispatch (ctx=0x55c0cbccc550) at util/aio-posix.c:436
#36 0x000055c0c961762f in aio_ctx_dispatch (source=0x55c0cbccc550, callback=0x0, user_data=0x0) at util/async.c:261
#37 0x00007faecaea1257 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#38 0x000055c0c961a2aa in glib_pollfds_poll () at util/main-loop.c:215
#39 0x000055c0c961a2aa in os_host_main_loop_wait (timeout=0) at util/main-loop.c:238
#40 0x000055c0c961a2aa in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#41 0x000055c0c92805e1 in main_loop () at vl.c:1866
#42 0x000055c0c9287eca in main (argc=18, argv=0x7ffd65617958, envp=0x7ffd656179f0) at vl.c:4647
(rr) p bs.node_name 
$17 = "node1", '\000' <repeats 26 times>

Obviously, this exact backtrace shouldn't even be possible like this
because job_defer_to_main_loop_bh() shouldn't be called inside
bdrv_flush(), but when draining the subtree in bdrv_reopen(). This is
only fixed in "blockjob: Lie better in child_job_drained_poll()".

It probably doesn't make a difference, though, where exactly during
bdrv_reopen() the node is deleted. We'll crash anyway if we don't hold
the extra reference.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback
  2018-09-14 15:12       ` Paolo Bonzini
@ 2018-09-14 17:14         ` Kevin Wolf
  2018-09-14 17:38           ` Paolo Bonzini
  0 siblings, 1 reply; 47+ messages in thread
From: Kevin Wolf @ 2018-09-14 17:14 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, mreitz, famz, slp, jsnow, qemu-devel

Am 14.09.2018 um 17:12 hat Paolo Bonzini geschrieben:
> On 13/09/2018 18:59, Kevin Wolf wrote:
> > Am 13.09.2018 um 17:10 hat Paolo Bonzini geschrieben:
> >> On 13/09/2018 14:52, Kevin Wolf wrote:
> >>> + if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
> >>> + /* If we are in the main thread, the callback is allowed to unref
> >>> + * the BlockBackend, so we have to hold an additional reference */
> >>> + blk_ref(acb->rwco.blk);
> >>> + }
> >>> acb->common.cb(acb->common.opaque, acb->rwco.ret);
> >>> + blk_dec_in_flight(acb->rwco.blk);
> >>> + if (qemu_get_current_aio_context() == qemu_get_aio_context()) {
> >>> + blk_unref(acb->rwco.blk);
> >>> + }
> >>
> >> Is this something that happens only for some specific callers?  That is,
> >> which callers are sure that the callback is invoked from the main thread?
> > 
> > I can't seem to reproduce the problem I saw any more even when reverting
> > the bdrv_ref/unref pair. If I remember correctly it was actually a
> > nested aio_poll() that was running a block job completion or something
> > like that - which would obviously only happen on the main thread because
> > the job intentionally defers to the main thread.
> > 
> > The only reason I made this conditional is that I think bdrv_unref()
> > still isn't safe outside the main thread, is it?
> 
> Yes, making it conditional is correct, but it is quite fishy even with
> the conditional.
> 
> As you mention, you could have a nested aio_poll() in the main thread,
> for example invoked from a bottom half, but in that case I'd rather
> track the caller that is creating the bottom half and see if it lacks a
> bdrv_ref/bdrv_unref (or perhaps it's even higher in the tree that is
> missing).

I went back to the commit where I first added the patch (it already
contained the ref/unref pair) and tried if I could reproduce a bug with
the pair removed. I couldn't.

I'm starting to think that maybe I was just overly cautious with the
ref/unref. I may have confused the nested aio_poll() crash with a
different situation. I've dealt with so many crashes and hangs while
working on this series that it's quite possible.

Kevin

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback
  2018-09-14 17:14         ` Kevin Wolf
@ 2018-09-14 17:38           ` Paolo Bonzini
  0 siblings, 0 replies; 47+ messages in thread
From: Paolo Bonzini @ 2018-09-14 17:38 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, famz, slp, jsnow, qemu-devel

On 14/09/2018 19:14, Kevin Wolf wrote:
>> As you mention, you could have a nested aio_poll() in the main thread,
>> for example invoked from a bottom half, but in that case I'd rather
>> track the caller that is creating the bottom half and see if it lacks a
>> bdrv_ref/bdrv_unref (or perhaps it's even higher in the tree that is
>> missing).
> I went back to the commit where I first added the patch (it already
> contained the ref/unref pair) and tried if I could reproduce a bug with
> the pair removed. I couldn't.
> 
> I'm starting to think that maybe I was just overly cautious with the
> ref/unref. I may have confused the nested aio_poll() crash with a
> different situation. I've dealt with so many crashes and hangs while
> working on this series that it's quite possible.

Are you going to drop the patch hen?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2018-09-14 17:39 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-13 12:52 [Qemu-devel] [PATCH v2 00/17] Fix some jobs/drain/aio_poll related hangs Kevin Wolf
2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 01/17] job: Fix missing locking due to mismerge Kevin Wolf
2018-09-13 13:56   ` Max Reitz
2018-09-13 17:38   ` John Snow
2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 02/17] blockjob: Wake up BDS when job becomes idle Kevin Wolf
2018-09-13 14:31   ` Max Reitz
2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 03/17] aio-wait: Increase num_waiters even in home thread Kevin Wolf
2018-09-13 15:11   ` Paolo Bonzini
2018-09-13 17:21     ` Kevin Wolf
2018-09-14 15:14       ` Paolo Bonzini
2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 04/17] test-bdrv-drain: Drain with block jobs in an I/O thread Kevin Wolf
2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 05/17] test-blockjob: Acquire AioContext around job_cancel_sync() Kevin Wolf
2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 06/17] job: Use AIO_WAIT_WHILE() in job_finish_sync() Kevin Wolf
2018-09-13 14:45   ` Max Reitz
2018-09-13 15:15   ` Paolo Bonzini
2018-09-13 17:39     ` Kevin Wolf
2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 07/17] test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback Kevin Wolf
2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 08/17] block: Add missing locking in bdrv_co_drain_bh_cb() Kevin Wolf
2018-09-13 14:58   ` Max Reitz
2018-09-13 15:17   ` Paolo Bonzini
2018-09-13 17:36     ` Kevin Wolf
2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 09/17] block-backend: Add .drained_poll callback Kevin Wolf
2018-09-13 15:01   ` Max Reitz
2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 10/17] block-backend: Fix potential double blk_delete() Kevin Wolf
2018-09-13 15:19   ` Paolo Bonzini
2018-09-13 19:50   ` Max Reitz
2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback Kevin Wolf
2018-09-13 15:10   ` Paolo Bonzini
2018-09-13 16:59     ` Kevin Wolf
2018-09-14  7:47       ` Fam Zheng
2018-09-14 15:12       ` Paolo Bonzini
2018-09-14 17:14         ` Kevin Wolf
2018-09-14 17:38           ` Paolo Bonzini
2018-09-13 20:50   ` Max Reitz
2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 12/17] mirror: Fix potential use-after-free in active commit Kevin Wolf
2018-09-13 20:55   ` Max Reitz
2018-09-13 21:43     ` Max Reitz
2018-09-14 16:25     ` Kevin Wolf
2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 13/17] blockjob: Lie better in child_job_drained_poll() Kevin Wolf
2018-09-13 21:52   ` Max Reitz
2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 14/17] block: Remove aio_poll() in bdrv_drain_poll variants Kevin Wolf
2018-09-13 21:55   ` Max Reitz
2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 15/17] test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level() Kevin Wolf
2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 16/17] job: Avoid deadlocks in job_completed_txn_abort() Kevin Wolf
2018-09-13 22:01   ` Max Reitz
2018-09-13 12:52 ` [Qemu-devel] [PATCH v2 17/17] test-bdrv-drain: AIO_WAIT_WHILE() in job .commit/.abort Kevin Wolf
2018-09-13 22:05   ` Max Reitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.