All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-2.10 00/10] Preparation for block job mutex
@ 2017-03-23 17:39 Paolo Bonzini
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 01/10] blockjob: remove unnecessary check Paolo Bonzini
                   ` (9 more replies)
  0 siblings, 10 replies; 44+ messages in thread
From: Paolo Bonzini @ 2017-03-23 17:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: jcody, jsnow

These are a bunch of cleanups and patches extracted from the AioContext
lock removal series.  They are independent and can be applied/reviewed
separately.  The flip side is that several of the changes seem to be a
bit gratuitous without the big change from AioContext lock to a specific
block job mutex for all block job data structures.

While the patch set seems very large, a lot of it just moving the code
around to avoid forward references; this is so that close functions will
have similar locking rules.

Paolo

Paolo Bonzini (10):
  blockjob: remove unnecessary check
  blockjob: remove iostatus_reset callback
  blockjob: introduce block_job_fail
  blockjob: introduce block_job_pause/resume_all
  blockjob: separate monitor and blockjob APIs
  blockjob: move iostatus reset inside block_job_user_resume
  blockjob: strengthen a bit test-blockjob-txn
  blockjob: introduce block_job_cancel_async
  blockjob: reorganize block_job_completed_txn_abort
  blockjob: use deferred_to_main_loop to indicate the coroutine has
    ended

 block/backup.c               |   2 +-
 block/commit.c               |   2 +-
 block/io.c                   |  18 +-
 block/mirror.c               |   2 +-
 blockdev.c                   |   1 -
 blockjob.c                   | 559 ++++++++++++++++++++++++-------------------
 include/block/blockjob.h     |  14 +-
 include/block/blockjob_int.h |  21 +-
 tests/test-blockjob-txn.c    |   7 +-
 tests/test-blockjob.c        |  10 +-
 10 files changed, 334 insertions(+), 302 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH 01/10] blockjob: remove unnecessary check
  2017-03-23 17:39 [Qemu-devel] [PATCH for-2.10 00/10] Preparation for block job mutex Paolo Bonzini
@ 2017-03-23 17:39 ` Paolo Bonzini
  2017-04-07 22:30   ` John Snow
                     ` (2 more replies)
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 02/10] blockjob: remove iostatus_reset callback Paolo Bonzini
                   ` (8 subsequent siblings)
  9 siblings, 3 replies; 44+ messages in thread
From: Paolo Bonzini @ 2017-03-23 17:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: jcody, jsnow

!job is always checked prior to the call, drop it from here.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockjob.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/blockjob.c b/blockjob.c
index 9b619f385..a9fb624 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -480,7 +480,7 @@ static bool block_job_should_pause(BlockJob *job)
 
 bool block_job_user_paused(BlockJob *job)
 {
-    return job ? job->user_paused : 0;
+    return job->user_paused;
 }
 
 void coroutine_fn block_job_pause_point(BlockJob *job)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH 02/10] blockjob: remove iostatus_reset callback
  2017-03-23 17:39 [Qemu-devel] [PATCH for-2.10 00/10] Preparation for block job mutex Paolo Bonzini
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 01/10] blockjob: remove unnecessary check Paolo Bonzini
@ 2017-03-23 17:39 ` Paolo Bonzini
  2017-04-07 23:12   ` John Snow
  2017-04-10  9:27   ` Stefan Hajnoczi
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 03/10] blockjob: introduce block_job_fail Paolo Bonzini
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 44+ messages in thread
From: Paolo Bonzini @ 2017-03-23 17:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: jcody, jsnow

This is unused since commit 66a0fae ("blockjob: Don't touch BDS iostatus",
2016-05-19).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockjob.c                   | 3 ---
 include/block/blockjob_int.h | 3 ---
 2 files changed, 6 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index a9fb624..24d1e12 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -555,9 +555,6 @@ bool block_job_is_cancelled(BlockJob *job)
 void block_job_iostatus_reset(BlockJob *job)
 {
     job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
-    if (job->driver->iostatus_reset) {
-        job->driver->iostatus_reset(job);
-    }
 }
 
 static int block_job_finish_sync(BlockJob *job,
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 3f86cc5..bfcc5d1 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -44,9 +44,6 @@ struct BlockJobDriver {
     /** Optional callback for job types that support setting a speed limit */
     void (*set_speed)(BlockJob *job, int64_t speed, Error **errp);
 
-    /** Optional callback for job types that need to forward I/O status reset */
-    void (*iostatus_reset)(BlockJob *job);
-
     /** Mandatory: Entrypoint for the Coroutine. */
     CoroutineEntry *start;
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH 03/10] blockjob: introduce block_job_fail
  2017-03-23 17:39 [Qemu-devel] [PATCH for-2.10 00/10] Preparation for block job mutex Paolo Bonzini
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 01/10] blockjob: remove unnecessary check Paolo Bonzini
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 02/10] blockjob: remove iostatus_reset callback Paolo Bonzini
@ 2017-03-23 17:39 ` Paolo Bonzini
  2017-04-07 23:24   ` John Snow
  2017-04-10  9:22   ` Stefan Hajnoczi
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 04/10] blockjob: introduce block_job_pause/resume_all Paolo Bonzini
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 44+ messages in thread
From: Paolo Bonzini @ 2017-03-23 17:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: jcody, jsnow

Outside blockjob.c, block_job_unref is only used when a block job fails
to start, and block_job_ref is not used at all.  The reference counting
thus is pretty well hidden.  Introduce a separate function to be used
by block jobs; because block_job_ref and block_job_unref now become
static, move them earlier in blockjob.c.

Later on, block_job_fail will also have different locking than
block_job_unref.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/backup.c               |  2 +-
 block/commit.c               |  2 +-
 block/mirror.c               |  2 +-
 blockjob.c                   | 47 ++++++++++++++++++++++++++------------------
 include/block/blockjob_int.h | 15 +++-----------
 tests/test-blockjob.c        | 10 +++++-----
 6 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index a4fb288..f284fd5 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -692,7 +692,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     }
     if (job) {
         backup_clean(&job->common);
-        block_job_unref(&job->common);
+        block_job_fail(&job->common);
     }
 
     return NULL;
diff --git a/block/commit.c b/block/commit.c
index 2832482..9b9ea39 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -424,7 +424,7 @@ fail:
     if (commit_top_bs) {
         bdrv_set_backing_hd(overlay_bs, top, &error_abort);
     }
-    block_job_unref(&s->common);
+    block_job_fail(&s->common);
 }
 
 
diff --git a/block/mirror.c b/block/mirror.c
index ca4baa5..5cb2134 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1243,7 +1243,7 @@ fail:
     if (s) {
         g_free(s->replaces);
         blk_unref(s->target);
-        block_job_unref(&s->common);
+        block_job_fail(&s->common);
     }
 
     bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
diff --git a/blockjob.c b/blockjob.c
index 24d1e12..1c63d15 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -106,6 +106,32 @@ BlockJob *block_job_get(const char *id)
     return NULL;
 }
 
+static void block_job_ref(BlockJob *job)
+{
+    ++job->refcnt;
+}
+
+static void block_job_attached_aio_context(AioContext *new_context,
+                                           void *opaque);
+static void block_job_detach_aio_context(void *opaque);
+
+static void block_job_unref(BlockJob *job)
+{
+    if (--job->refcnt == 0) {
+        BlockDriverState *bs = blk_bs(job->blk);
+        bs->job = NULL;
+        block_job_remove_all_bdrv(job);
+        blk_remove_aio_context_notifier(job->blk,
+                                        block_job_attached_aio_context,
+                                        block_job_detach_aio_context, job);
+        blk_unref(job->blk);
+        error_free(job->blocker);
+        g_free(job->id);
+        QLIST_REMOVE(job, job_list);
+        g_free(job);
+    }
+}
+
 static void block_job_attached_aio_context(AioContext *new_context,
                                            void *opaque)
 {
@@ -293,26 +319,9 @@ void block_job_start(BlockJob *job)
     qemu_coroutine_enter(job->co);
 }
 
-void block_job_ref(BlockJob *job)
-{
-    ++job->refcnt;
-}
-
-void block_job_unref(BlockJob *job)
+void block_job_fail(BlockJob *job)
 {
-    if (--job->refcnt == 0) {
-        BlockDriverState *bs = blk_bs(job->blk);
-        bs->job = NULL;
-        block_job_remove_all_bdrv(job);
-        blk_remove_aio_context_notifier(job->blk,
-                                        block_job_attached_aio_context,
-                                        block_job_detach_aio_context, job);
-        blk_unref(job->blk);
-        error_free(job->blocker);
-        g_free(job->id);
-        QLIST_REMOVE(job, job_list);
-        g_free(job);
-    }
+    block_job_unref(job);
 }
 
 static void block_job_completed_single(BlockJob *job)
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index bfcc5d1..97ffc43 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -156,21 +156,12 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
 void block_job_yield(BlockJob *job);
 
 /**
- * block_job_ref:
+ * block_job_fail:
  * @bs: The block device.
  *
- * Grab a reference to the block job. Should be paired with block_job_unref.
+ * The block job could not be started, free it.
  */
-void block_job_ref(BlockJob *job);
-
-/**
- * block_job_unref:
- * @bs: The block device.
- *
- * Release reference to the block job and release resources if it is the last
- * reference.
- */
-void block_job_unref(BlockJob *job);
+void block_job_fail(BlockJob *job);
 
 /**
  * block_job_completed:
diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
index 740e740..35bbbbc 100644
--- a/tests/test-blockjob.c
+++ b/tests/test-blockjob.c
@@ -116,11 +116,11 @@ static void test_job_ids(void)
     job[1] = do_test_id(blk[1], "id0", false);
 
     /* But once job[0] finishes we can reuse its ID */
-    block_job_unref(job[0]);
+    block_job_fail(job[0]);
     job[1] = do_test_id(blk[1], "id0", true);
 
     /* No job ID specified, defaults to the backend name ('drive1') */
-    block_job_unref(job[1]);
+    block_job_fail(job[1]);
     job[1] = do_test_id(blk[1], NULL, true);
 
     /* Duplicate job ID */
@@ -133,9 +133,9 @@ static void test_job_ids(void)
     /* This one is valid */
     job[2] = do_test_id(blk[2], "id_2", true);
 
-    block_job_unref(job[0]);
-    block_job_unref(job[1]);
-    block_job_unref(job[2]);
+    block_job_fail(job[0]);
+    block_job_fail(job[1]);
+    block_job_fail(job[2]);
 
     destroy_blk(blk[0]);
     destroy_blk(blk[1]);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH 04/10] blockjob: introduce block_job_pause/resume_all
  2017-03-23 17:39 [Qemu-devel] [PATCH for-2.10 00/10] Preparation for block job mutex Paolo Bonzini
                   ` (2 preceding siblings ...)
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 03/10] blockjob: introduce block_job_fail Paolo Bonzini
@ 2017-03-23 17:39 ` Paolo Bonzini
  2017-04-07 23:37   ` John Snow
  2017-04-10  9:26   ` Stefan Hajnoczi
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 05/10] blockjob: separate monitor and blockjob APIs Paolo Bonzini
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 44+ messages in thread
From: Paolo Bonzini @ 2017-03-23 17:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: jcody, jsnow

Remove use of block_job_pause/resume from outside blockjob.c, thus
making them static.  Again, one reason to do this is that
block_job_pause/resume will have different locking rules compared
to everything else that block.c and block/io.c use.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/io.c               |  18 +-------
 blockjob.c               | 114 ++++++++++++++++++++++++++++-------------------
 include/block/blockjob.h |  14 +++---
 3 files changed, 77 insertions(+), 69 deletions(-)

diff --git a/block/io.c b/block/io.c
index 2709a70..1cc9318 100644
--- a/block/io.c
+++ b/block/io.c
@@ -284,16 +284,9 @@ void bdrv_drain_all_begin(void)
     bool waited = true;
     BlockDriverState *bs;
     BdrvNextIterator it;
-    BlockJob *job = NULL;
     GSList *aio_ctxs = NULL, *ctx;
 
-    while ((job = block_job_next(job))) {
-        AioContext *aio_context = blk_get_aio_context(job->blk);
-
-        aio_context_acquire(aio_context);
-        block_job_pause(job);
-        aio_context_release(aio_context);
-    }
+    block_job_pause_all();
 
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
@@ -337,7 +330,6 @@ void bdrv_drain_all_end(void)
 {
     BlockDriverState *bs;
     BdrvNextIterator it;
-    BlockJob *job = NULL;
 
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
@@ -348,13 +340,7 @@ void bdrv_drain_all_end(void)
         aio_context_release(aio_context);
     }
 
-    while ((job = block_job_next(job))) {
-        AioContext *aio_context = blk_get_aio_context(job->blk);
-
-        aio_context_acquire(aio_context);
-        block_job_resume(job);
-        aio_context_release(aio_context);
-    }
+    block_job_resume_all();
 }
 
 void bdrv_drain_all(void)
diff --git a/blockjob.c b/blockjob.c
index 1c63d15..caca718 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -55,36 +55,6 @@ struct BlockJobTxn {
 
 static QLIST_HEAD(, BlockJob) block_jobs = QLIST_HEAD_INITIALIZER(block_jobs);
 
-static char *child_job_get_parent_desc(BdrvChild *c)
-{
-    BlockJob *job = c->opaque;
-    return g_strdup_printf("%s job '%s'",
-                           BlockJobType_lookup[job->driver->job_type],
-                           job->id);
-}
-
-static const BdrvChildRole child_job = {
-    .get_parent_desc    = child_job_get_parent_desc,
-    .stay_at_node       = true,
-};
-
-static void block_job_drained_begin(void *opaque)
-{
-    BlockJob *job = opaque;
-    block_job_pause(job);
-}
-
-static void block_job_drained_end(void *opaque)
-{
-    BlockJob *job = opaque;
-    block_job_resume(job);
-}
-
-static const BlockDevOps block_job_dev_ops = {
-    .drained_begin = block_job_drained_begin,
-    .drained_end = block_job_drained_end,
-};
-
 BlockJob *block_job_next(BlockJob *job)
 {
     if (!job) {
@@ -106,6 +76,21 @@ BlockJob *block_job_get(const char *id)
     return NULL;
 }
 
+static void block_job_pause(BlockJob *job)
+{
+    job->pause_count++;
+}
+
+static void block_job_resume(BlockJob *job)
+{
+    assert(job->pause_count > 0);
+    job->pause_count--;
+    if (job->pause_count) {
+        return;
+    }
+    block_job_enter(job);
+}
+
 static void block_job_ref(BlockJob *job)
 {
     ++job->refcnt;
@@ -171,6 +156,36 @@ static void block_job_detach_aio_context(void *opaque)
     block_job_unref(job);
 }
 
+static char *child_job_get_parent_desc(BdrvChild *c)
+{
+    BlockJob *job = c->opaque;
+    return g_strdup_printf("%s job '%s'",
+                           BlockJobType_lookup[job->driver->job_type],
+                           job->id);
+}
+
+static const BdrvChildRole child_job = {
+    .get_parent_desc    = child_job_get_parent_desc,
+    .stay_at_node       = true,
+};
+
+static void block_job_drained_begin(void *opaque)
+{
+    BlockJob *job = opaque;
+    block_job_pause(job);
+}
+
+static void block_job_drained_end(void *opaque)
+{
+    BlockJob *job = opaque;
+    block_job_resume(job);
+}
+
+static const BlockDevOps block_job_dev_ops = {
+    .drained_begin = block_job_drained_begin,
+    .drained_end = block_job_drained_end,
+};
+
 void block_job_remove_all_bdrv(BlockJob *job)
 {
     GSList *l;
@@ -471,11 +486,6 @@ void block_job_complete(BlockJob *job, Error **errp)
     job->driver->complete(job, errp);
 }
 
-void block_job_pause(BlockJob *job)
-{
-    job->pause_count++;
-}
-
 void block_job_user_pause(BlockJob *job)
 {
     job->user_paused = true;
@@ -520,16 +530,6 @@ void coroutine_fn block_job_pause_point(BlockJob *job)
     }
 }
 
-void block_job_resume(BlockJob *job)
-{
-    assert(job->pause_count > 0);
-    job->pause_count--;
-    if (job->pause_count) {
-        return;
-    }
-    block_job_enter(job);
-}
-
 void block_job_user_resume(BlockJob *job)
 {
     if (job && job->user_paused && job->pause_count > 0) {
@@ -723,6 +723,30 @@ static void block_job_event_completed(BlockJob *job, const char *msg)
                                         &error_abort);
 }
 
+void block_job_pause_all(void)
+{
+    BlockJob *job = NULL;
+    while ((job = block_job_next(job))) {
+        AioContext *aio_context = blk_get_aio_context(job->blk);
+
+        aio_context_acquire(aio_context);
+        block_job_pause(job);
+        aio_context_release(aio_context);
+    }
+}
+
+void block_job_resume_all(void)
+{
+    BlockJob *job = NULL;
+    while ((job = block_job_next(job))) {
+        AioContext *aio_context = blk_get_aio_context(job->blk);
+
+        aio_context_acquire(aio_context);
+        block_job_resume(job);
+        aio_context_release(aio_context);
+    }
+}
+
 void block_job_event_ready(BlockJob *job)
 {
     job->ready = true;
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 9e906f7..aac87cd 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -235,12 +235,11 @@ void block_job_complete(BlockJob *job, Error **errp);
 BlockJobInfo *block_job_query(BlockJob *job, Error **errp);
 
 /**
- * block_job_pause:
- * @job: The job to be paused.
+ * block_job_pause_all:
  *
- * Asynchronously pause the specified job.
+ * Asynchronously pause all jobs.
  */
-void block_job_pause(BlockJob *job);
+void block_job_pause_all(void);
 
 /**
  * block_job_user_pause:
@@ -260,12 +259,11 @@ void block_job_user_pause(BlockJob *job);
 bool block_job_user_paused(BlockJob *job);
 
 /**
- * block_job_resume:
- * @job: The job to be resumed.
+ * block_job_resume_all:
  *
- * Resume the specified job.  Must be paired with a preceding block_job_pause.
+ * Resume all block jobs.  Must be paired with a preceding block_job_pause_all.
  */
-void block_job_resume(BlockJob *job);
+void block_job_resume_all(void);
 
 /**
  * block_job_user_resume:
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH 05/10] blockjob: separate monitor and blockjob APIs
  2017-03-23 17:39 [Qemu-devel] [PATCH for-2.10 00/10] Preparation for block job mutex Paolo Bonzini
                   ` (3 preceding siblings ...)
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 04/10] blockjob: introduce block_job_pause/resume_all Paolo Bonzini
@ 2017-03-23 17:39 ` Paolo Bonzini
  2017-04-08  0:03   ` John Snow
  2017-04-10  9:30   ` Stefan Hajnoczi
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 06/10] blockjob: move iostatus reset inside block_job_user_resume Paolo Bonzini
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 44+ messages in thread
From: Paolo Bonzini @ 2017-03-23 17:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: jcody, jsnow

We already have different locking policies for APIs called by the monitor
and the block job.  Monitor APIs need consistency across block_job_get
and the actual operation (e.g. block_job_set_speed), so currently there
are explicit aio_context_acquire/release calls in blockdev.c.

When a block job needs to do something instead it doesn't care about locking,
because the whole coroutine runs under the AioContext lock.  When moving
away from the AioContext lock, the monitor will have to call new
block_job_lock/unlock APIs, while blockjob APIs will take care of this
for the users.

In preparation for that, keep all the blockjob APIs together in the
blockjob.c file.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockjob.c | 206 +++++++++++++++++++++++++++++++------------------------------
 1 file changed, 105 insertions(+), 101 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index caca718..c5f1d19 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -334,11 +334,6 @@ void block_job_start(BlockJob *job)
     qemu_coroutine_enter(job->co);
 }
 
-void block_job_fail(BlockJob *job)
-{
-    block_job_unref(job);
-}
-
 static void block_job_completed_single(BlockJob *job)
 {
     if (!job->ret) {
@@ -440,21 +435,6 @@ static void block_job_completed_txn_success(BlockJob *job)
     }
 }
 
-void block_job_completed(BlockJob *job, int ret)
-{
-    assert(blk_bs(job->blk)->job == job);
-    assert(!job->completed);
-    job->completed = true;
-    job->ret = ret;
-    if (!job->txn) {
-        block_job_completed_single(job);
-    } else if (ret < 0 || block_job_is_cancelled(job)) {
-        block_job_completed_txn_abort(job);
-    } else {
-        block_job_completed_txn_success(job);
-    }
-}
-
 void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
 {
     Error *local_err = NULL;
@@ -492,44 +472,11 @@ void block_job_user_pause(BlockJob *job)
     block_job_pause(job);
 }
 
-static bool block_job_should_pause(BlockJob *job)
-{
-    return job->pause_count > 0;
-}
-
 bool block_job_user_paused(BlockJob *job)
 {
     return job->user_paused;
 }
 
-void coroutine_fn block_job_pause_point(BlockJob *job)
-{
-    assert(job && block_job_started(job));
-
-    if (!block_job_should_pause(job)) {
-        return;
-    }
-    if (block_job_is_cancelled(job)) {
-        return;
-    }
-
-    if (job->driver->pause) {
-        job->driver->pause(job);
-    }
-
-    if (block_job_should_pause(job) && !block_job_is_cancelled(job)) {
-        job->paused = true;
-        job->busy = false;
-        qemu_coroutine_yield(); /* wait for block_job_resume() */
-        job->busy = true;
-        job->paused = false;
-    }
-
-    if (job->driver->resume) {
-        job->driver->resume(job);
-    }
-}
-
 void block_job_user_resume(BlockJob *job)
 {
     if (job && job->user_paused && job->pause_count > 0) {
@@ -538,13 +485,6 @@ void block_job_user_resume(BlockJob *job)
     }
 }
 
-void block_job_enter(BlockJob *job)
-{
-    if (job->co && !job->busy) {
-        qemu_coroutine_enter(job->co);
-    }
-}
-
 void block_job_cancel(BlockJob *job)
 {
     if (block_job_started(job)) {
@@ -556,11 +496,6 @@ void block_job_cancel(BlockJob *job)
     }
 }
 
-bool block_job_is_cancelled(BlockJob *job)
-{
-    return job->cancelled;
-}
-
 void block_job_iostatus_reset(BlockJob *job)
 {
     job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
@@ -628,42 +563,6 @@ int block_job_complete_sync(BlockJob *job, Error **errp)
     return block_job_finish_sync(job, &block_job_complete, errp);
 }
 
-void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
-{
-    assert(job->busy);
-
-    /* Check cancellation *before* setting busy = false, too!  */
-    if (block_job_is_cancelled(job)) {
-        return;
-    }
-
-    job->busy = false;
-    if (!block_job_should_pause(job)) {
-        co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
-    }
-    job->busy = true;
-
-    block_job_pause_point(job);
-}
-
-void block_job_yield(BlockJob *job)
-{
-    assert(job->busy);
-
-    /* Check cancellation *before* setting busy = false, too!  */
-    if (block_job_is_cancelled(job)) {
-        return;
-    }
-
-    job->busy = false;
-    if (!block_job_should_pause(job)) {
-        qemu_coroutine_yield();
-    }
-    job->busy = true;
-
-    block_job_pause_point(job);
-}
-
 BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
 {
     BlockJobInfo *info;
@@ -723,6 +622,10 @@ static void block_job_event_completed(BlockJob *job, const char *msg)
                                         &error_abort);
 }
 
+/*
+ * API for block job drivers and the block layer.
+ */
+
 void block_job_pause_all(void)
 {
     BlockJob *job = NULL;
@@ -735,6 +638,59 @@ void block_job_pause_all(void)
     }
 }
 
+void block_job_fail(BlockJob *job)
+{
+    block_job_unref(job);
+}
+
+void block_job_completed(BlockJob *job, int ret)
+{
+    assert(blk_bs(job->blk)->job == job);
+    assert(!job->completed);
+    job->completed = true;
+    job->ret = ret;
+    if (!job->txn) {
+        block_job_completed_single(job);
+    } else if (ret < 0 || block_job_is_cancelled(job)) {
+        block_job_completed_txn_abort(job);
+    } else {
+        block_job_completed_txn_success(job);
+    }
+}
+
+static bool block_job_should_pause(BlockJob *job)
+{
+    return job->pause_count > 0;
+}
+
+void coroutine_fn block_job_pause_point(BlockJob *job)
+{
+    assert(job && block_job_started(job));
+
+    if (!block_job_should_pause(job)) {
+        return;
+    }
+    if (block_job_is_cancelled(job)) {
+        return;
+    }
+
+    if (job->driver->pause) {
+        job->driver->pause(job);
+    }
+
+    if (block_job_should_pause(job) && !block_job_is_cancelled(job)) {
+        job->paused = true;
+        job->busy = false;
+        qemu_coroutine_yield(); /* wait for block_job_resume() */
+        job->busy = true;
+        job->paused = false;
+    }
+
+    if (job->driver->resume) {
+        job->driver->resume(job);
+    }
+}
+
 void block_job_resume_all(void)
 {
     BlockJob *job = NULL;
@@ -747,6 +703,54 @@ void block_job_resume_all(void)
     }
 }
 
+void block_job_enter(BlockJob *job)
+{
+    if (job->co && !job->busy) {
+        qemu_coroutine_enter(job->co);
+    }
+}
+
+bool block_job_is_cancelled(BlockJob *job)
+{
+    return job->cancelled;
+}
+
+void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
+{
+    assert(job->busy);
+
+    /* Check cancellation *before* setting busy = false, too!  */
+    if (block_job_is_cancelled(job)) {
+        return;
+    }
+
+    job->busy = false;
+    if (!block_job_should_pause(job)) {
+        co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
+    }
+    job->busy = true;
+
+    block_job_pause_point(job);
+}
+
+void block_job_yield(BlockJob *job)
+{
+    assert(job->busy);
+
+    /* Check cancellation *before* setting busy = false, too!  */
+    if (block_job_is_cancelled(job)) {
+        return;
+    }
+
+    job->busy = false;
+    if (!block_job_should_pause(job)) {
+        qemu_coroutine_yield();
+    }
+    job->busy = true;
+
+    block_job_pause_point(job);
+}
+
 void block_job_event_ready(BlockJob *job)
 {
     job->ready = true;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH 06/10] blockjob: move iostatus reset inside block_job_user_resume
  2017-03-23 17:39 [Qemu-devel] [PATCH for-2.10 00/10] Preparation for block job mutex Paolo Bonzini
                   ` (4 preceding siblings ...)
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 05/10] blockjob: separate monitor and blockjob APIs Paolo Bonzini
@ 2017-03-23 17:39 ` Paolo Bonzini
  2017-04-08  0:28   ` John Snow
  2017-04-10  9:33   ` Stefan Hajnoczi
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 07/10] blockjob: strengthen a bit test-blockjob-txn Paolo Bonzini
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 44+ messages in thread
From: Paolo Bonzini @ 2017-03-23 17:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: jcody, jsnow

Outside blockjob.c, the block_job_iostatus_reset function is used once
in the monitor and once in BlockBackend.  When we introduce the block
job mutex, block_job_iostatus_reset's client is going to be the block
layer (for which blockjob.c will take the block job mutex) rather than
the monitor (which will take the block job mutex by itself).

The monitor's call to block_job_iostatus_reset from the monitor comes
just before the sole call to block_job_user_resume, so reset the
iostatus directly from block_job_iostatus_reset.  This will avoid
the need to introduce separate block_job_iostatus_reset and
block_job_iostatus_reset_locked APIs.

After making this change, move the function together with the others
that were moved in the previous patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockdev.c |  1 -
 blockjob.c | 11 ++++++-----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index c5b2c2c..fbc376b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3726,7 +3726,6 @@ void qmp_block_job_resume(const char *device, Error **errp)
     }
 
     trace_qmp_block_job_resume(job);
-    block_job_iostatus_reset(job);
     block_job_user_resume(job);
     aio_context_release(aio_context);
 }
diff --git a/blockjob.c b/blockjob.c
index c5f1d19..c9cb5b1 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -481,6 +481,7 @@ void block_job_user_resume(BlockJob *job)
 {
     if (job && job->user_paused && job->pause_count > 0) {
         job->user_paused = false;
+        block_job_iostatus_reset(job);
         block_job_resume(job);
     }
 }
@@ -496,11 +497,6 @@ void block_job_cancel(BlockJob *job)
     }
 }
 
-void block_job_iostatus_reset(BlockJob *job)
-{
-    job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
-}
-
 static int block_job_finish_sync(BlockJob *job,
                                  void (*finish)(BlockJob *, Error **errp),
                                  Error **errp)
@@ -751,6 +747,11 @@ void block_job_yield(BlockJob *job)
     block_job_pause_point(job);
 }
 
+void block_job_iostatus_reset(BlockJob *job)
+{
+    job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
+}
+
 void block_job_event_ready(BlockJob *job)
 {
     job->ready = true;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH 07/10] blockjob: strengthen a bit test-blockjob-txn
  2017-03-23 17:39 [Qemu-devel] [PATCH for-2.10 00/10] Preparation for block job mutex Paolo Bonzini
                   ` (5 preceding siblings ...)
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 06/10] blockjob: move iostatus reset inside block_job_user_resume Paolo Bonzini
@ 2017-03-23 17:39 ` Paolo Bonzini
  2017-04-08  0:37   ` John Snow
  2017-04-10  9:35   ` Stefan Hajnoczi
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 08/10] blockjob: introduce block_job_cancel_async Paolo Bonzini
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 44+ messages in thread
From: Paolo Bonzini @ 2017-03-23 17:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: jcody, jsnow

Unlike test-blockjob-txn, QMP releases the reference to the transaction
before the jobs finish.  Thus, while working on the next patch,
qemu-iotest 124 showed a failure that the unit tests did not have.
Make the unit test just a little nastier, so that it fails too.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/test-blockjob-txn.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
index 4ccbda1..bfc2aaa 100644
--- a/tests/test-blockjob-txn.c
+++ b/tests/test-blockjob-txn.c
@@ -165,6 +165,11 @@ static void test_pair_jobs(int expected1, int expected2)
     job2 = test_block_job_start(2, true, expected2, &result2);
     block_job_txn_add_job(txn, job2);
 
+    /* Release our reference now to trigger as many nice
+     * use-after-free bugs as possible.
+     */
+    block_job_txn_unref(txn);
+
     if (expected1 == -ECANCELED) {
         block_job_cancel(job1);
     }
@@ -185,8 +190,6 @@ static void test_pair_jobs(int expected1, int expected2)
 
     g_assert_cmpint(result1, ==, expected1);
     g_assert_cmpint(result2, ==, expected2);
-
-    block_job_txn_unref(txn);
 }
 
 static void test_pair_jobs_success(void)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH 08/10] blockjob: introduce block_job_cancel_async
  2017-03-23 17:39 [Qemu-devel] [PATCH for-2.10 00/10] Preparation for block job mutex Paolo Bonzini
                   ` (6 preceding siblings ...)
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 07/10] blockjob: strengthen a bit test-blockjob-txn Paolo Bonzini
@ 2017-03-23 17:39 ` Paolo Bonzini
  2017-04-08  1:13   ` John Snow
  2017-04-10  9:36   ` Stefan Hajnoczi
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 09/10] blockjob: reorganize block_job_completed_txn_abort Paolo Bonzini
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 10/10] blockjob: use deferred_to_main_loop to indicate the coroutine has ended Paolo Bonzini
  9 siblings, 2 replies; 44+ messages in thread
From: Paolo Bonzini @ 2017-03-23 17:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: jcody, jsnow

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockjob.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index c9cb5b1..093962b 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -373,6 +373,12 @@ static void block_job_completed_single(BlockJob *job)
     block_job_unref(job);
 }
 
+static void block_job_cancel_async(BlockJob *job)
+{
+    job->cancelled = true;
+    block_job_iostatus_reset(job);
+}
+
 static void block_job_completed_txn_abort(BlockJob *job)
 {
     AioContext *ctx;
@@ -397,7 +403,7 @@ static void block_job_completed_txn_abort(BlockJob *job)
              * them; this job, however, may or may not be cancelled, depending
              * on the caller, so leave it. */
             if (other_job != job) {
-                other_job->cancelled = true;
+                block_job_cancel_async(other_job);
             }
             continue;
         }
@@ -489,8 +495,7 @@ void block_job_user_resume(BlockJob *job)
 void block_job_cancel(BlockJob *job)
 {
     if (block_job_started(job)) {
-        job->cancelled = true;
-        block_job_iostatus_reset(job);
+        block_job_cancel_async(job);
         block_job_enter(job);
     } else {
         block_job_completed(job, -ECANCELED);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH 09/10] blockjob: reorganize block_job_completed_txn_abort
  2017-03-23 17:39 [Qemu-devel] [PATCH for-2.10 00/10] Preparation for block job mutex Paolo Bonzini
                   ` (7 preceding siblings ...)
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 08/10] blockjob: introduce block_job_cancel_async Paolo Bonzini
@ 2017-03-23 17:39 ` Paolo Bonzini
  2017-04-08  1:15   ` John Snow
                     ` (2 more replies)
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 10/10] blockjob: use deferred_to_main_loop to indicate the coroutine has ended Paolo Bonzini
  9 siblings, 3 replies; 44+ messages in thread
From: Paolo Bonzini @ 2017-03-23 17:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: jcody, jsnow

This splits the part that touches job states from the part that invokes
callbacks.  It will be a bit simpler to understand once job states will
be protected by a different mutex than the AioContext lock.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockjob.c | 165 ++++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 88 insertions(+), 77 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 093962b..3fa2885 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -76,6 +76,39 @@ BlockJob *block_job_get(const char *id)
     return NULL;
 }
 
+BlockJobTxn *block_job_txn_new(void)
+{
+    BlockJobTxn *txn = g_new0(BlockJobTxn, 1);
+    QLIST_INIT(&txn->jobs);
+    txn->refcnt = 1;
+    return txn;
+}
+
+static void block_job_txn_ref(BlockJobTxn *txn)
+{
+    txn->refcnt++;
+}
+
+void block_job_txn_unref(BlockJobTxn *txn)
+{
+    if (txn && --txn->refcnt == 0) {
+        g_free(txn);
+    }
+}
+
+void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
+{
+    if (!txn) {
+        return;
+    }
+
+    assert(!job->txn);
+    job->txn = txn;
+
+    QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
+    block_job_txn_ref(txn);
+}
+
 static void block_job_pause(BlockJob *job)
 {
     job->pause_count++;
@@ -336,6 +369,8 @@ void block_job_start(BlockJob *job)
 
 static void block_job_completed_single(BlockJob *job)
 {
+    assert(job->completed);
+
     if (!job->ret) {
         if (job->driver->commit) {
             job->driver->commit(job);
@@ -376,14 +411,49 @@ static void block_job_completed_single(BlockJob *job)
 static void block_job_cancel_async(BlockJob *job)
 {
     job->cancelled = true;
-    block_job_iostatus_reset(job);
+    if (!job->completed) {
+        block_job_iostatus_reset(job);
+    }
+}
+
+static int block_job_finish_sync(BlockJob *job,
+                                 void (*finish)(BlockJob *, Error **errp),
+                                 Error **errp)
+{
+    Error *local_err = NULL;
+    int ret;
+
+    assert(blk_bs(job->blk)->job == job);
+
+    block_job_ref(job);
+
+    if (finish) {
+        finish(job, &local_err);
+    }
+    if (local_err) {
+        error_propagate(errp, local_err);
+        block_job_unref(job);
+        return -EBUSY;
+    }
+    /* block_job_drain calls block_job_enter, and it should be enough to
+     * induce progress until the job completes or moves to the main thread.
+    */
+    while (!job->deferred_to_main_loop && !job->completed) {
+        block_job_drain(job);
+    }
+    while (!job->completed) {
+        aio_poll(qemu_get_aio_context(), true);
+    }
+    ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
+    block_job_unref(job);
+    return ret;
 }
 
 static void block_job_completed_txn_abort(BlockJob *job)
 {
     AioContext *ctx;
     BlockJobTxn *txn = job->txn;
-    BlockJob *other_job, *next;
+    BlockJob *other_job;
 
     if (txn->aborting) {
         /*
@@ -392,29 +462,34 @@ static void block_job_completed_txn_abort(BlockJob *job)
         return;
     }
     txn->aborting = true;
+    block_job_txn_ref(txn);
+
     /* We are the first failed job. Cancel other jobs. */
     QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
         ctx = blk_get_aio_context(other_job->blk);
         aio_context_acquire(ctx);
     }
+
+    /* Other jobs are "effectively" cancelled by us, set the status for
+     * them; this job, however, may or may not be cancelled, depending
+     * on the caller, so leave it. */
     QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
-        if (other_job == job || other_job->completed) {
-            /* Other jobs are "effectively" cancelled by us, set the status for
-             * them; this job, however, may or may not be cancelled, depending
-             * on the caller, so leave it. */
-            if (other_job != job) {
-                block_job_cancel_async(other_job);
-            }
-            continue;
+        if (other_job != job) {
+            block_job_cancel_async(other_job);
         }
-        block_job_cancel_sync(other_job);
-        assert(other_job->completed);
     }
-    QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
+    while (!QLIST_EMPTY(&txn->jobs)) {
+        other_job = QLIST_FIRST(&txn->jobs);
         ctx = blk_get_aio_context(other_job->blk);
+        if (!other_job->completed) {
+            assert(other_job->cancelled);
+            block_job_finish_sync(other_job, NULL, NULL);
+        }
         block_job_completed_single(other_job);
         aio_context_release(ctx);
     }
+
+    block_job_txn_unref(txn);
 }
 
 static void block_job_completed_txn_success(BlockJob *job)
@@ -502,37 +577,6 @@ void block_job_cancel(BlockJob *job)
     }
 }
 
-static int block_job_finish_sync(BlockJob *job,
-                                 void (*finish)(BlockJob *, Error **errp),
-                                 Error **errp)
-{
-    Error *local_err = NULL;
-    int ret;
-
-    assert(blk_bs(job->blk)->job == job);
-
-    block_job_ref(job);
-
-    finish(job, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
-        block_job_unref(job);
-        return -EBUSY;
-    }
-    /* block_job_drain calls block_job_enter, and it should be enough to
-     * induce progress until the job completes or moves to the main thread.
-    */
-    while (!job->deferred_to_main_loop && !job->completed) {
-        block_job_drain(job);
-    }
-    while (!job->completed) {
-        aio_poll(qemu_get_aio_context(), true);
-    }
-    ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
-    block_job_unref(job);
-    return ret;
-}
-
 /* A wrapper around block_job_cancel() taking an Error ** parameter so it may be
  * used with block_job_finish_sync() without the need for (rather nasty)
  * function pointer casts there. */
@@ -856,36 +900,3 @@ void block_job_defer_to_main_loop(BlockJob *job,
     aio_bh_schedule_oneshot(qemu_get_aio_context(),
                             block_job_defer_to_main_loop_bh, data);
 }
-
-BlockJobTxn *block_job_txn_new(void)
-{
-    BlockJobTxn *txn = g_new0(BlockJobTxn, 1);
-    QLIST_INIT(&txn->jobs);
-    txn->refcnt = 1;
-    return txn;
-}
-
-static void block_job_txn_ref(BlockJobTxn *txn)
-{
-    txn->refcnt++;
-}
-
-void block_job_txn_unref(BlockJobTxn *txn)
-{
-    if (txn && --txn->refcnt == 0) {
-        g_free(txn);
-    }
-}
-
-void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
-{
-    if (!txn) {
-        return;
-    }
-
-    assert(!job->txn);
-    job->txn = txn;
-
-    QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
-    block_job_txn_ref(txn);
-}
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH 10/10] blockjob: use deferred_to_main_loop to indicate the coroutine has ended
  2017-03-23 17:39 [Qemu-devel] [PATCH for-2.10 00/10] Preparation for block job mutex Paolo Bonzini
                   ` (8 preceding siblings ...)
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 09/10] blockjob: reorganize block_job_completed_txn_abort Paolo Bonzini
@ 2017-03-23 17:39 ` Paolo Bonzini
  2017-04-08  1:32   ` John Snow
  2017-04-10  9:46   ` Stefan Hajnoczi
  9 siblings, 2 replies; 44+ messages in thread
From: Paolo Bonzini @ 2017-03-23 17:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: jcody, jsnow

All block jobs are using block_job_defer_to_main_loop as the final
step just before the coroutine terminates.  At this point,
block_job_enter should do nothing, but currently it restarts
the freed coroutine.

Now, the job->co states should probably be changed to an enum
(e.g. BEFORE_START, STARTED, YIELDED, COMPLETED) subsuming
block_job_started, job->deferred_to_main_loop and job->busy.
For now, this patch eliminates the problematic reenter by
removing the reset of job->deferred_to_main_loop (which served
no purpose, as far as I could see) and checking the flag in
block_job_enter.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockjob.c                   | 10 ++++++++--
 include/block/blockjob_int.h |  3 ++-
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 3fa2885..2d80dae 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -750,7 +750,14 @@ void block_job_resume_all(void)
 
 void block_job_enter(BlockJob *job)
 {
-    if (job->co && !job->busy) {
+    if (!block_job_started(job)) {
+        return;
+    }
+    if (job->deferred_to_main_loop) {
+        return;
+    }
+
+    if (!job->busy) {
         qemu_coroutine_enter(job->co);
     }
 }
@@ -874,7 +881,6 @@ static void block_job_defer_to_main_loop_bh(void *opaque)
         aio_context_acquire(aio_context);
     }
 
-    data->job->deferred_to_main_loop = false;
     data->fn(data->job, data->opaque);
 
     if (aio_context != data->aio_context) {
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 97ffc43..4d287ba 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -227,7 +227,8 @@ typedef void BlockJobDeferToMainLoopFn(BlockJob *job, void *opaque);
  * @fn: The function to run in the main loop
  * @opaque: The opaque value that is passed to @fn
  *
- * Execute a given function in the main loop with the BlockDriverState
+ * This function must be called by the main job coroutine just before it
+ * returns.  @fn is executed in the main loop with the BlockDriverState
  * AioContext acquired.  Block jobs must call bdrv_unref(), bdrv_close(), and
  * anything that uses bdrv_drain_all() in the main loop.
  *
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] blockjob: remove unnecessary check
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 01/10] blockjob: remove unnecessary check Paolo Bonzini
@ 2017-04-07 22:30   ` John Snow
  2017-04-07 22:54   ` Philippe Mathieu-Daudé
  2017-04-10  9:27   ` Stefan Hajnoczi
  2 siblings, 0 replies; 44+ messages in thread
From: John Snow @ 2017-04-07 22:30 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: jcody



On 03/23/2017 01:39 PM, Paolo Bonzini wrote:
> !job is always checked prior to the call, drop it from here.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockjob.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/blockjob.c b/blockjob.c
> index 9b619f385..a9fb624 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -480,7 +480,7 @@ static bool block_job_should_pause(BlockJob *job)
>  
>  bool block_job_user_paused(BlockJob *job)
>  {
> -    return job ? job->user_paused : 0;
> +    return job->user_paused;
>  }
>  
>  void coroutine_fn block_job_pause_point(BlockJob *job)
> 

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] blockjob: remove unnecessary check
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 01/10] blockjob: remove unnecessary check Paolo Bonzini
  2017-04-07 22:30   ` John Snow
@ 2017-04-07 22:54   ` Philippe Mathieu-Daudé
  2017-04-07 23:16     ` John Snow
  2017-04-10  9:27   ` Stefan Hajnoczi
  2 siblings, 1 reply; 44+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-04-07 22:54 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: Jeff Cody, John Snow

Hi Paolo,

On 03/23/2017 02:39 PM, Paolo Bonzini wrote:
> !job is always checked prior to the call, drop it from here.

I agree with you this is currently true, *but* block_job_user_paused() 
is exported in "block/blockjob.h" so any future access (external to 
blockdev.c) could eventually happen with job == NULL.
I'd  NACK this patch for for this reason, but I checked and there is no 
access to this function outside of blockdev.c, so I think the best is to 
make block_job_user_paused() static (removing the public declaration) 
and safely drop the !job check, what do you think?

>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockjob.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/blockjob.c b/blockjob.c
> index 9b619f385..a9fb624 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -480,7 +480,7 @@ static bool block_job_should_pause(BlockJob *job)
>
>  bool block_job_user_paused(BlockJob *job)
>  {
> -    return job ? job->user_paused : 0;
> +    return job->user_paused;
>  }
>
>  void coroutine_fn block_job_pause_point(BlockJob *job)
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] blockjob: remove iostatus_reset callback
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 02/10] blockjob: remove iostatus_reset callback Paolo Bonzini
@ 2017-04-07 23:12   ` John Snow
  2017-04-08  0:18     ` Philippe Mathieu-Daudé
  2017-04-10  9:27   ` Stefan Hajnoczi
  1 sibling, 1 reply; 44+ messages in thread
From: John Snow @ 2017-04-07 23:12 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: jcody



On 03/23/2017 01:39 PM, Paolo Bonzini wrote:
> This is unused since commit 66a0fae ("blockjob: Don't touch BDS iostatus",
> 2016-05-19).
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockjob.c                   | 3 ---
>  include/block/blockjob_int.h | 3 ---
>  2 files changed, 6 deletions(-)
> 
> diff --git a/blockjob.c b/blockjob.c
> index a9fb624..24d1e12 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -555,9 +555,6 @@ bool block_job_is_cancelled(BlockJob *job)
>  void block_job_iostatus_reset(BlockJob *job)
>  {
>      job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
> -    if (job->driver->iostatus_reset) {
> -        job->driver->iostatus_reset(job);
> -    }
>  }
>  
>  static int block_job_finish_sync(BlockJob *job,
> diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
> index 3f86cc5..bfcc5d1 100644
> --- a/include/block/blockjob_int.h
> +++ b/include/block/blockjob_int.h
> @@ -44,9 +44,6 @@ struct BlockJobDriver {
>      /** Optional callback for job types that support setting a speed limit */
>      void (*set_speed)(BlockJob *job, int64_t speed, Error **errp);
>  
> -    /** Optional callback for job types that need to forward I/O status reset */
> -    void (*iostatus_reset)(BlockJob *job);
> -
>      /** Mandatory: Entrypoint for the Coroutine. */
>      CoroutineEntry *start;
>  
> 

Good find.

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] blockjob: remove unnecessary check
  2017-04-07 22:54   ` Philippe Mathieu-Daudé
@ 2017-04-07 23:16     ` John Snow
  0 siblings, 0 replies; 44+ messages in thread
From: John Snow @ 2017-04-07 23:16 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Paolo Bonzini, qemu-devel; +Cc: Jeff Cody



On 04/07/2017 06:54 PM, Philippe Mathieu-Daudé wrote:
> Hi Paolo,
> 
> On 03/23/2017 02:39 PM, Paolo Bonzini wrote:
>> !job is always checked prior to the call, drop it from here.
> 
> I agree with you this is currently true, *but* block_job_user_paused()
> is exported in "block/blockjob.h" so any future access (external to
> blockdev.c) could eventually happen with job == NULL.
> I'd  NACK this patch for for this reason, but I checked and there is no
> access to this function outside of blockdev.c, so I think the best is to
> make block_job_user_paused() static (removing the public declaration)
> and safely drop the !job check, what do you think?
> 

Sure, but I would consider this a strict improvement as asking for the
paused status of NULL should be an error, not zero.

Anyway, this function exists almost entirely for the sake of blockdev,
so deleting it out of the public jobs interface isn't a very nice thing
to do.

The "proper" thing might be to add *errp, but that's a lot of paint for
a really tiny shed.

"IMO, etc." I've already spent more time on this email than it'd take to
code that, so whichever way the wind blows is fine with me.

--js

>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  blockjob.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/blockjob.c b/blockjob.c
>> index 9b619f385..a9fb624 100644
>> --- a/blockjob.c
>> +++ b/blockjob.c
>> @@ -480,7 +480,7 @@ static bool block_job_should_pause(BlockJob *job)
>>
>>  bool block_job_user_paused(BlockJob *job)
>>  {
>> -    return job ? job->user_paused : 0;
>> +    return job->user_paused;
>>  }
>>
>>  void coroutine_fn block_job_pause_point(BlockJob *job)
>>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 03/10] blockjob: introduce block_job_fail
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 03/10] blockjob: introduce block_job_fail Paolo Bonzini
@ 2017-04-07 23:24   ` John Snow
  2017-04-10  9:22   ` Stefan Hajnoczi
  1 sibling, 0 replies; 44+ messages in thread
From: John Snow @ 2017-04-07 23:24 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: jcody



On 03/23/2017 01:39 PM, Paolo Bonzini wrote:
> Outside blockjob.c, block_job_unref is only used when a block job fails
> to start, and block_job_ref is not used at all.  The reference counting
> thus is pretty well hidden.  Introduce a separate function to be used
> by block jobs; because block_job_ref and block_job_unref now become
> static, move them earlier in blockjob.c.
> 

Early into the series, "I guess so."

It makes it a lot less obvious and more vague as to what exactly is
happening here, so I like it less.

> Later on, block_job_fail will also have different locking than
> block_job_unref.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/backup.c               |  2 +-
>  block/commit.c               |  2 +-
>  block/mirror.c               |  2 +-
>  blockjob.c                   | 47 ++++++++++++++++++++++++++------------------
>  include/block/blockjob_int.h | 15 +++-----------
>  tests/test-blockjob.c        | 10 +++++-----
>  6 files changed, 39 insertions(+), 39 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index a4fb288..f284fd5 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -692,7 +692,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>      }
>      if (job) {
>          backup_clean(&job->common);
> -        block_job_unref(&job->common);
> +        block_job_fail(&job->common);
>      }
>  
>      return NULL;
> diff --git a/block/commit.c b/block/commit.c
> index 2832482..9b9ea39 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -424,7 +424,7 @@ fail:
>      if (commit_top_bs) {
>          bdrv_set_backing_hd(overlay_bs, top, &error_abort);
>      }
> -    block_job_unref(&s->common);
> +    block_job_fail(&s->common);
>  }
>  
>  
> diff --git a/block/mirror.c b/block/mirror.c
> index ca4baa5..5cb2134 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1243,7 +1243,7 @@ fail:
>      if (s) {
>          g_free(s->replaces);
>          blk_unref(s->target);
> -        block_job_unref(&s->common);
> +        block_job_fail(&s->common);
>      }
>  
>      bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
> diff --git a/blockjob.c b/blockjob.c
> index 24d1e12..1c63d15 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -106,6 +106,32 @@ BlockJob *block_job_get(const char *id)
>      return NULL;
>  }
>  
> +static void block_job_ref(BlockJob *job)
> +{
> +    ++job->refcnt;
> +}
> +
> +static void block_job_attached_aio_context(AioContext *new_context,
> +                                           void *opaque);
> +static void block_job_detach_aio_context(void *opaque);
> +

^ Some tagalongs to the party?

> +static void block_job_unref(BlockJob *job)
> +{
> +    if (--job->refcnt == 0) {
> +        BlockDriverState *bs = blk_bs(job->blk);
> +        bs->job = NULL;
> +        block_job_remove_all_bdrv(job);
> +        blk_remove_aio_context_notifier(job->blk,
> +                                        block_job_attached_aio_context,
> +                                        block_job_detach_aio_context, job);
> +        blk_unref(job->blk);
> +        error_free(job->blocker);
> +        g_free(job->id);
> +        QLIST_REMOVE(job, job_list);
> +        g_free(job);
> +    }
> +}
> +
>  static void block_job_attached_aio_context(AioContext *new_context,
>                                             void *opaque)
>  {
> @@ -293,26 +319,9 @@ void block_job_start(BlockJob *job)
>      qemu_coroutine_enter(job->co);
>  }
>  
> -void block_job_ref(BlockJob *job)
> -{
> -    ++job->refcnt;
> -}
> -
> -void block_job_unref(BlockJob *job)
> +void block_job_fail(BlockJob *job)
>  {
> -    if (--job->refcnt == 0) {
> -        BlockDriverState *bs = blk_bs(job->blk);
> -        bs->job = NULL;
> -        block_job_remove_all_bdrv(job);
> -        blk_remove_aio_context_notifier(job->blk,
> -                                        block_job_attached_aio_context,
> -                                        block_job_detach_aio_context, job);
> -        blk_unref(job->blk);
> -        error_free(job->blocker);
> -        g_free(job->id);
> -        QLIST_REMOVE(job, job_list);
> -        g_free(job);
> -    }
> +    block_job_unref(job);
>  }
>  
>  static void block_job_completed_single(BlockJob *job)
> diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
> index bfcc5d1..97ffc43 100644
> --- a/include/block/blockjob_int.h
> +++ b/include/block/blockjob_int.h
> @@ -156,21 +156,12 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
>  void block_job_yield(BlockJob *job);
>  
>  /**
> - * block_job_ref:
> + * block_job_fail:
>   * @bs: The block device.
>   *
> - * Grab a reference to the block job. Should be paired with block_job_unref.
> + * The block job could not be started, free it.
>   */
> -void block_job_ref(BlockJob *job);
> -
> -/**
> - * block_job_unref:
> - * @bs: The block device.
> - *
> - * Release reference to the block job and release resources if it is the last
> - * reference.
> - */
> -void block_job_unref(BlockJob *job);
> +void block_job_fail(BlockJob *job);
>  
>  /**
>   * block_job_completed:
> diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
> index 740e740..35bbbbc 100644
> --- a/tests/test-blockjob.c
> +++ b/tests/test-blockjob.c
> @@ -116,11 +116,11 @@ static void test_job_ids(void)
>      job[1] = do_test_id(blk[1], "id0", false);
>  
>      /* But once job[0] finishes we can reuse its ID */
> -    block_job_unref(job[0]);
> +    block_job_fail(job[0]);
>      job[1] = do_test_id(blk[1], "id0", true);
>  
>      /* No job ID specified, defaults to the backend name ('drive1') */
> -    block_job_unref(job[1]);
> +    block_job_fail(job[1]);
>      job[1] = do_test_id(blk[1], NULL, true);
>  
>      /* Duplicate job ID */
> @@ -133,9 +133,9 @@ static void test_job_ids(void)
>      /* This one is valid */
>      job[2] = do_test_id(blk[2], "id_2", true);
>  
> -    block_job_unref(job[0]);
> -    block_job_unref(job[1]);
> -    block_job_unref(job[2]);
> +    block_job_fail(job[0]);
> +    block_job_fail(job[1]);
> +    block_job_fail(job[2]);
>  
>      destroy_blk(blk[0]);
>      destroy_blk(blk[1]);
> 

Mechanically OK, of course. Allow me to discover my appetite for it
further into your series.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 04/10] blockjob: introduce block_job_pause/resume_all
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 04/10] blockjob: introduce block_job_pause/resume_all Paolo Bonzini
@ 2017-04-07 23:37   ` John Snow
  2017-04-10  9:26   ` Stefan Hajnoczi
  1 sibling, 0 replies; 44+ messages in thread
From: John Snow @ 2017-04-07 23:37 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: jcody



On 03/23/2017 01:39 PM, Paolo Bonzini wrote:
> Remove use of block_job_pause/resume from outside blockjob.c, thus
> making them static.  Again, one reason to do this is that
> block_job_pause/resume will have different locking rules compared
> to everything else that block.c and block/io.c use.
> 

Ominous

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/io.c               |  18 +-------
>  blockjob.c               | 114 ++++++++++++++++++++++++++++-------------------
>  include/block/blockjob.h |  14 +++---
>  3 files changed, 77 insertions(+), 69 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index 2709a70..1cc9318 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -284,16 +284,9 @@ void bdrv_drain_all_begin(void)
>      bool waited = true;
>      BlockDriverState *bs;
>      BdrvNextIterator it;
> -    BlockJob *job = NULL;
>      GSList *aio_ctxs = NULL, *ctx;
>  
> -    while ((job = block_job_next(job))) {
> -        AioContext *aio_context = blk_get_aio_context(job->blk);
> -
> -        aio_context_acquire(aio_context);
> -        block_job_pause(job);
> -        aio_context_release(aio_context);
> -    }
> +    block_job_pause_all();
>  

Cool. Well, maybe cool. I think it would be cool if we didn't have to
explicitly pause jobs here at all. This is better at least, though.

>      for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
>          AioContext *aio_context = bdrv_get_aio_context(bs);
> @@ -337,7 +330,6 @@ void bdrv_drain_all_end(void)
>  {
>      BlockDriverState *bs;
>      BdrvNextIterator it;
> -    BlockJob *job = NULL;
>  
>      for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
>          AioContext *aio_context = bdrv_get_aio_context(bs);
> @@ -348,13 +340,7 @@ void bdrv_drain_all_end(void)
>          aio_context_release(aio_context);
>      }
>  
> -    while ((job = block_job_next(job))) {
> -        AioContext *aio_context = blk_get_aio_context(job->blk);
> -
> -        aio_context_acquire(aio_context);
> -        block_job_resume(job);
> -        aio_context_release(aio_context);
> -    }
> +    block_job_resume_all();
>  }
>  
>  void bdrv_drain_all(void)
> diff --git a/blockjob.c b/blockjob.c
> index 1c63d15..caca718 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -55,36 +55,6 @@ struct BlockJobTxn {
>  
>  static QLIST_HEAD(, BlockJob) block_jobs = QLIST_HEAD_INITIALIZER(block_jobs);
>  
> -static char *child_job_get_parent_desc(BdrvChild *c)
> -{
> -    BlockJob *job = c->opaque;
> -    return g_strdup_printf("%s job '%s'",
> -                           BlockJobType_lookup[job->driver->job_type],
> -                           job->id);
> -}
> -
> -static const BdrvChildRole child_job = {
> -    .get_parent_desc    = child_job_get_parent_desc,
> -    .stay_at_node       = true,
> -};
> -
> -static void block_job_drained_begin(void *opaque)
> -{
> -    BlockJob *job = opaque;
> -    block_job_pause(job);
> -}
> -
> -static void block_job_drained_end(void *opaque)
> -{
> -    BlockJob *job = opaque;
> -    block_job_resume(job);
> -}
> -
> -static const BlockDevOps block_job_dev_ops = {
> -    .drained_begin = block_job_drained_begin,
> -    .drained_end = block_job_drained_end,
> -};
> -
>  BlockJob *block_job_next(BlockJob *job)
>  {
>      if (!job) {
> @@ -106,6 +76,21 @@ BlockJob *block_job_get(const char *id)
>      return NULL;
>  }
>  
> +static void block_job_pause(BlockJob *job)
> +{
> +    job->pause_count++;
> +}
> +
> +static void block_job_resume(BlockJob *job)
> +{
> +    assert(job->pause_count > 0);
> +    job->pause_count--;
> +    if (job->pause_count) {
> +        return;
> +    }
> +    block_job_enter(job);
> +}
> +
>  static void block_job_ref(BlockJob *job)
>  {
>      ++job->refcnt;
> @@ -171,6 +156,36 @@ static void block_job_detach_aio_context(void *opaque)
>      block_job_unref(job);
>  }
>  
> +static char *child_job_get_parent_desc(BdrvChild *c)
> +{
> +    BlockJob *job = c->opaque;
> +    return g_strdup_printf("%s job '%s'",
> +                           BlockJobType_lookup[job->driver->job_type],
> +                           job->id);
> +}
> +
> +static const BdrvChildRole child_job = {
> +    .get_parent_desc    = child_job_get_parent_desc,
> +    .stay_at_node       = true,
> +};
> +
> +static void block_job_drained_begin(void *opaque)
> +{
> +    BlockJob *job = opaque;
> +    block_job_pause(job);
> +}
> +
> +static void block_job_drained_end(void *opaque)
> +{
> +    BlockJob *job = opaque;
> +    block_job_resume(job);
> +}
> +
> +static const BlockDevOps block_job_dev_ops = {
> +    .drained_begin = block_job_drained_begin,
> +    .drained_end = block_job_drained_end,
> +};
> +

Movement should probably be its own patch, but I'm not a cop or nothin'

>  void block_job_remove_all_bdrv(BlockJob *job)
>  {
>      GSList *l;
> @@ -471,11 +486,6 @@ void block_job_complete(BlockJob *job, Error **errp)
>      job->driver->complete(job, errp);
>  }
>  
> -void block_job_pause(BlockJob *job)
> -{
> -    job->pause_count++;
> -}
> -
>  void block_job_user_pause(BlockJob *job)
>  {
>      job->user_paused = true;
> @@ -520,16 +530,6 @@ void coroutine_fn block_job_pause_point(BlockJob *job)
>      }
>  }
>  
> -void block_job_resume(BlockJob *job)
> -{
> -    assert(job->pause_count > 0);
> -    job->pause_count--;
> -    if (job->pause_count) {
> -        return;
> -    }
> -    block_job_enter(job);
> -}
> -
>  void block_job_user_resume(BlockJob *job)
>  {
>      if (job && job->user_paused && job->pause_count > 0) {
> @@ -723,6 +723,30 @@ static void block_job_event_completed(BlockJob *job, const char *msg)
>                                          &error_abort);
>  }
>  
> +void block_job_pause_all(void)
> +{
> +    BlockJob *job = NULL;
> +    while ((job = block_job_next(job))) {
> +        AioContext *aio_context = blk_get_aio_context(job->blk);
> +
> +        aio_context_acquire(aio_context);
> +        block_job_pause(job);
> +        aio_context_release(aio_context);
> +    }
> +}
> +
> +void block_job_resume_all(void)
> +{
> +    BlockJob *job = NULL;
> +    while ((job = block_job_next(job))) {
> +        AioContext *aio_context = blk_get_aio_context(job->blk);
> +
> +        aio_context_acquire(aio_context);
> +        block_job_resume(job);
> +        aio_context_release(aio_context);
> +    }
> +}
> +
>  void block_job_event_ready(BlockJob *job)
>  {
>      job->ready = true;
> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> index 9e906f7..aac87cd 100644
> --- a/include/block/blockjob.h
> +++ b/include/block/blockjob.h
> @@ -235,12 +235,11 @@ void block_job_complete(BlockJob *job, Error **errp);
>  BlockJobInfo *block_job_query(BlockJob *job, Error **errp);
>  
>  /**
> - * block_job_pause:
> - * @job: The job to be paused.
> + * block_job_pause_all:
>   *
> - * Asynchronously pause the specified job.
> + * Asynchronously pause all jobs.
>   */
> -void block_job_pause(BlockJob *job);
> +void block_job_pause_all(void);
>  
>  /**
>   * block_job_user_pause:
> @@ -260,12 +259,11 @@ void block_job_user_pause(BlockJob *job);
>  bool block_job_user_paused(BlockJob *job);
>  
>  /**
> - * block_job_resume:
> - * @job: The job to be resumed.
> + * block_job_resume_all:
>   *
> - * Resume the specified job.  Must be paired with a preceding block_job_pause.
> + * Resume all block jobs.  Must be paired with a preceding block_job_pause_all.
>   */
> -void block_job_resume(BlockJob *job);
> +void block_job_resume_all(void);
>  
>  /**
>   * block_job_user_resume:
> 

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 05/10] blockjob: separate monitor and blockjob APIs
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 05/10] blockjob: separate monitor and blockjob APIs Paolo Bonzini
@ 2017-04-08  0:03   ` John Snow
  2017-04-08  9:52     ` Paolo Bonzini
  2017-04-10  9:30   ` Stefan Hajnoczi
  1 sibling, 1 reply; 44+ messages in thread
From: John Snow @ 2017-04-08  0:03 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: jcody



On 03/23/2017 01:39 PM, Paolo Bonzini wrote:
> We already have different locking policies for APIs called by the monitor
> and the block job.  Monitor APIs need consistency across block_job_get
> and the actual operation (e.g. block_job_set_speed), so currently there
> are explicit aio_context_acquire/release calls in blockdev.c.
> 
> When a block job needs to do something instead it doesn't care about locking,
> because the whole coroutine runs under the AioContext lock.  When moving
> away from the AioContext lock, the monitor will have to call new
> block_job_lock/unlock APIs, while blockjob APIs will take care of this
> for the users.
> 
> In preparation for that, keep all the blockjob APIs together in the
> blockjob.c file.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockjob.c | 206 +++++++++++++++++++++++++++++++------------------------------

Did you guys order... like, fifty million pizzas?

Hooray for
$ diff -u <(sed -n 's/^-//p' patch) <(sed -n 's/^\+//p' patch)

Looks clean, though it may be useful to do a few more things;

- Demarcate what you think is the monitor API in this file
- Organize blockjob.h to match to serve as a useful reference.

>  1 file changed, 105 insertions(+), 101 deletions(-)
> 
> diff --git a/blockjob.c b/blockjob.c
> index caca718..c5f1d19 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -334,11 +334,6 @@ void block_job_start(BlockJob *job)
>      qemu_coroutine_enter(job->co);
>  }
>  
> -void block_job_fail(BlockJob *job)
> -{
> -    block_job_unref(job);
> -}
> -
>  static void block_job_completed_single(BlockJob *job)
>  {
>      if (!job->ret) {
> @@ -440,21 +435,6 @@ static void block_job_completed_txn_success(BlockJob *job)
>      }
>  }
>  
> -void block_job_completed(BlockJob *job, int ret)
> -{
> -    assert(blk_bs(job->blk)->job == job);
> -    assert(!job->completed);
> -    job->completed = true;
> -    job->ret = ret;
> -    if (!job->txn) {
> -        block_job_completed_single(job);
> -    } else if (ret < 0 || block_job_is_cancelled(job)) {
> -        block_job_completed_txn_abort(job);
> -    } else {
> -        block_job_completed_txn_success(job);
> -    }
> -}
> -
>  void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
>  {
>      Error *local_err = NULL;
> @@ -492,44 +472,11 @@ void block_job_user_pause(BlockJob *job)
>      block_job_pause(job);
>  }
>  
> -static bool block_job_should_pause(BlockJob *job)
> -{
> -    return job->pause_count > 0;
> -}
> -
>  bool block_job_user_paused(BlockJob *job)
>  {
>      return job->user_paused;
>  }
>  
> -void coroutine_fn block_job_pause_point(BlockJob *job)
> -{
> -    assert(job && block_job_started(job));
> -
> -    if (!block_job_should_pause(job)) {
> -        return;
> -    }
> -    if (block_job_is_cancelled(job)) {
> -        return;
> -    }
> -
> -    if (job->driver->pause) {
> -        job->driver->pause(job);
> -    }
> -
> -    if (block_job_should_pause(job) && !block_job_is_cancelled(job)) {
> -        job->paused = true;
> -        job->busy = false;
> -        qemu_coroutine_yield(); /* wait for block_job_resume() */
> -        job->busy = true;
> -        job->paused = false;
> -    }
> -
> -    if (job->driver->resume) {
> -        job->driver->resume(job);
> -    }
> -}
> -
>  void block_job_user_resume(BlockJob *job)
>  {
>      if (job && job->user_paused && job->pause_count > 0) {
> @@ -538,13 +485,6 @@ void block_job_user_resume(BlockJob *job)
>      }
>  }
>  
> -void block_job_enter(BlockJob *job)
> -{
> -    if (job->co && !job->busy) {
> -        qemu_coroutine_enter(job->co);
> -    }
> -}
> -
>  void block_job_cancel(BlockJob *job)
>  {
>      if (block_job_started(job)) {
> @@ -556,11 +496,6 @@ void block_job_cancel(BlockJob *job)
>      }
>  }
>  
> -bool block_job_is_cancelled(BlockJob *job)
> -{
> -    return job->cancelled;
> -}
> -
>  void block_job_iostatus_reset(BlockJob *job)
>  {
>      job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
> @@ -628,42 +563,6 @@ int block_job_complete_sync(BlockJob *job, Error **errp)
>      return block_job_finish_sync(job, &block_job_complete, errp);
>  }
>  
> -void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
> -{
> -    assert(job->busy);
> -
> -    /* Check cancellation *before* setting busy = false, too!  */
> -    if (block_job_is_cancelled(job)) {
> -        return;
> -    }
> -
> -    job->busy = false;
> -    if (!block_job_should_pause(job)) {
> -        co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
> -    }
> -    job->busy = true;
> -
> -    block_job_pause_point(job);
> -}
> -
> -void block_job_yield(BlockJob *job)
> -{
> -    assert(job->busy);
> -
> -    /* Check cancellation *before* setting busy = false, too!  */
> -    if (block_job_is_cancelled(job)) {
> -        return;
> -    }
> -
> -    job->busy = false;
> -    if (!block_job_should_pause(job)) {
> -        qemu_coroutine_yield();
> -    }
> -    job->busy = true;
> -
> -    block_job_pause_point(job);
> -}
> -
>  BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
>  {
>      BlockJobInfo *info;
> @@ -723,6 +622,10 @@ static void block_job_event_completed(BlockJob *job, const char *msg)
>                                          &error_abort);
>  }
>  
> +/*
> + * API for block job drivers and the block layer.
> + */
> +
>  void block_job_pause_all(void)
>  {
>      BlockJob *job = NULL;
> @@ -735,6 +638,59 @@ void block_job_pause_all(void)
>      }
>  }
>  
> +void block_job_fail(BlockJob *job)
> +{
> +    block_job_unref(job);
> +}
> +
> +void block_job_completed(BlockJob *job, int ret)
> +{
> +    assert(blk_bs(job->blk)->job == job);
> +    assert(!job->completed);
> +    job->completed = true;
> +    job->ret = ret;
> +    if (!job->txn) {
> +        block_job_completed_single(job);
> +    } else if (ret < 0 || block_job_is_cancelled(job)) {
> +        block_job_completed_txn_abort(job);
> +    } else {
> +        block_job_completed_txn_success(job);
> +    }
> +}
> +
> +static bool block_job_should_pause(BlockJob *job)
> +{
> +    return job->pause_count > 0;
> +}
> +
> +void coroutine_fn block_job_pause_point(BlockJob *job)
> +{
> +    assert(job && block_job_started(job));
> +
> +    if (!block_job_should_pause(job)) {
> +        return;
> +    }
> +    if (block_job_is_cancelled(job)) {
> +        return;
> +    }
> +
> +    if (job->driver->pause) {
> +        job->driver->pause(job);
> +    }
> +
> +    if (block_job_should_pause(job) && !block_job_is_cancelled(job)) {
> +        job->paused = true;
> +        job->busy = false;
> +        qemu_coroutine_yield(); /* wait for block_job_resume() */
> +        job->busy = true;
> +        job->paused = false;
> +    }
> +
> +    if (job->driver->resume) {
> +        job->driver->resume(job);
> +    }
> +}
> +
>  void block_job_resume_all(void)
>  {
>      BlockJob *job = NULL;
> @@ -747,6 +703,54 @@ void block_job_resume_all(void)
>      }
>  }
>  
> +void block_job_enter(BlockJob *job)
> +{
> +    if (job->co && !job->busy) {
> +        qemu_coroutine_enter(job->co);
> +    }
> +}
> +
> +bool block_job_is_cancelled(BlockJob *job)
> +{
> +    return job->cancelled;
> +}
> +
> +void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
> +{
> +    assert(job->busy);
> +
> +    /* Check cancellation *before* setting busy = false, too!  */
> +    if (block_job_is_cancelled(job)) {
> +        return;
> +    }
> +
> +    job->busy = false;
> +    if (!block_job_should_pause(job)) {
> +        co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
> +    }
> +    job->busy = true;
> +
> +    block_job_pause_point(job);
> +}
> +
> +void block_job_yield(BlockJob *job)
> +{
> +    assert(job->busy);
> +
> +    /* Check cancellation *before* setting busy = false, too!  */
> +    if (block_job_is_cancelled(job)) {
> +        return;
> +    }
> +
> +    job->busy = false;
> +    if (!block_job_should_pause(job)) {
> +        qemu_coroutine_yield();
> +    }
> +    job->busy = true;
> +
> +    block_job_pause_point(job);
> +}
> +
>  void block_job_event_ready(BlockJob *job)
>  {
>      job->ready = true;
> 

-- 
—js

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] blockjob: remove iostatus_reset callback
  2017-04-07 23:12   ` John Snow
@ 2017-04-08  0:18     ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 44+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-04-08  0:18 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: John Snow, Jeff Cody

On 04/07/2017 08:12 PM, John Snow wrote:
> On 03/23/2017 01:39 PM, Paolo Bonzini wrote:
>> This is unused since commit 66a0fae ("blockjob: Don't touch BDS iostatus",
>> 2016-05-19).
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  blockjob.c                   | 3 ---
>>  include/block/blockjob_int.h | 3 ---
>>  2 files changed, 6 deletions(-)
>>
>> diff --git a/blockjob.c b/blockjob.c
>> index a9fb624..24d1e12 100644
>> --- a/blockjob.c
>> +++ b/blockjob.c
>> @@ -555,9 +555,6 @@ bool block_job_is_cancelled(BlockJob *job)
>>  void block_job_iostatus_reset(BlockJob *job)
>>  {
>>      job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
>> -    if (job->driver->iostatus_reset) {
>> -        job->driver->iostatus_reset(job);
>> -    }
>>  }
>>
>>  static int block_job_finish_sync(BlockJob *job,
>> diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
>> index 3f86cc5..bfcc5d1 100644
>> --- a/include/block/blockjob_int.h
>> +++ b/include/block/blockjob_int.h
>> @@ -44,9 +44,6 @@ struct BlockJobDriver {
>>      /** Optional callback for job types that support setting a speed limit */
>>      void (*set_speed)(BlockJob *job, int64_t speed, Error **errp);
>>
>> -    /** Optional callback for job types that need to forward I/O status reset */
>> -    void (*iostatus_reset)(BlockJob *job);
>> -
>>      /** Mandatory: Entrypoint for the Coroutine. */
>>      CoroutineEntry *start;
>>
>>
>
> Good find.
>
> Reviewed-by: John Snow <jsnow@redhat.com>

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 06/10] blockjob: move iostatus reset inside block_job_user_resume
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 06/10] blockjob: move iostatus reset inside block_job_user_resume Paolo Bonzini
@ 2017-04-08  0:28   ` John Snow
  2017-04-10  9:33   ` Stefan Hajnoczi
  1 sibling, 0 replies; 44+ messages in thread
From: John Snow @ 2017-04-08  0:28 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: jcody



On 03/23/2017 01:39 PM, Paolo Bonzini wrote:
> Outside blockjob.c, the block_job_iostatus_reset function is used once
> in the monitor and once in BlockBackend.  When we introduce the block
> job mutex, block_job_iostatus_reset's client is going to be the block
> layer (for which blockjob.c will take the block job mutex) rather than
> the monitor (which will take the block job mutex by itself).
> 

Ah, sure. So we leave the block-backend call alone, here. We're moving
it from "public API" to "specifically block-layer API" to help
facilitate different mutex semantics. OK.

> The monitor's call to block_job_iostatus_reset from the monitor comes
> just before the sole call to block_job_user_resume, so reset the
> iostatus directly from block_job_iostatus_reset.  This will avoid
> the need to introduce separate block_job_iostatus_reset and
> block_job_iostatus_reset_locked APIs.
> 
> After making this change, move the function together with the others
> that were moved in the previous patch.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: John Snow <jsnow@redhat.com>

> ---
>  blockdev.c |  1 -
>  blockjob.c | 11 ++++++-----
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index c5b2c2c..fbc376b 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3726,7 +3726,6 @@ void qmp_block_job_resume(const char *device, Error **errp)
>      }
>  
>      trace_qmp_block_job_resume(job);
> -    block_job_iostatus_reset(job);
>      block_job_user_resume(job);
>      aio_context_release(aio_context);
>  }
> diff --git a/blockjob.c b/blockjob.c
> index c5f1d19..c9cb5b1 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -481,6 +481,7 @@ void block_job_user_resume(BlockJob *job)
>  {
>      if (job && job->user_paused && job->pause_count > 0) {
>          job->user_paused = false;
> +        block_job_iostatus_reset(job);
>          block_job_resume(job);
>      }
>  }
> @@ -496,11 +497,6 @@ void block_job_cancel(BlockJob *job)
>      }
>  }
>  
> -void block_job_iostatus_reset(BlockJob *job)
> -{
> -    job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
> -}
> -
>  static int block_job_finish_sync(BlockJob *job,
>                                   void (*finish)(BlockJob *, Error **errp),
>                                   Error **errp)
> @@ -751,6 +747,11 @@ void block_job_yield(BlockJob *job)
>      block_job_pause_point(job);
>  }
>  
> +void block_job_iostatus_reset(BlockJob *job)
> +{
> +    job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
> +}
> +
>  void block_job_event_ready(BlockJob *job)
>  {
>      job->ready = true;
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 07/10] blockjob: strengthen a bit test-blockjob-txn
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 07/10] blockjob: strengthen a bit test-blockjob-txn Paolo Bonzini
@ 2017-04-08  0:37   ` John Snow
  2017-04-10  9:35   ` Stefan Hajnoczi
  1 sibling, 0 replies; 44+ messages in thread
From: John Snow @ 2017-04-08  0:37 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: jcody



On 03/23/2017 01:39 PM, Paolo Bonzini wrote:
> Unlike test-blockjob-txn, QMP releases the reference to the transaction
> before the jobs finish.  Thus, while working on the next patch,
> qemu-iotest 124 showed a failure that the unit tests did not have.
> Make the unit test just a little nastier, so that it fails too.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  tests/test-blockjob-txn.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
> index 4ccbda1..bfc2aaa 100644
> --- a/tests/test-blockjob-txn.c
> +++ b/tests/test-blockjob-txn.c
> @@ -165,6 +165,11 @@ static void test_pair_jobs(int expected1, int expected2)
>      job2 = test_block_job_start(2, true, expected2, &result2);
>      block_job_txn_add_job(txn, job2);
>  

^ Oh, this might cause you grief too. Should be add-add-start-start, not
start-add-start-add. Fam sent a patch fixing this recently.

> +    /* Release our reference now to trigger as many nice
> +     * use-after-free bugs as possible.
> +     */
> +    block_job_txn_unref(txn);
> +

This is fine, though.

>      if (expected1 == -ECANCELED) {
>          block_job_cancel(job1);
>      }
> @@ -185,8 +190,6 @@ static void test_pair_jobs(int expected1, int expected2)
>  
>      g_assert_cmpint(result1, ==, expected1);
>      g_assert_cmpint(result2, ==, expected2);
> -
> -    block_job_txn_unref(txn);
>  }
>  
>  static void test_pair_jobs_success(void)
> 

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 08/10] blockjob: introduce block_job_cancel_async
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 08/10] blockjob: introduce block_job_cancel_async Paolo Bonzini
@ 2017-04-08  1:13   ` John Snow
  2017-04-08  9:54     ` Paolo Bonzini
  2017-04-10  9:36   ` Stefan Hajnoczi
  1 sibling, 1 reply; 44+ messages in thread
From: John Snow @ 2017-04-08  1:13 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: jcody, Kashyap Chamarthy



On 03/23/2017 01:39 PM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

What was the bad design that required you to fix the previous test? :)

> ---
>  blockjob.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/blockjob.c b/blockjob.c
> index c9cb5b1..093962b 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -373,6 +373,12 @@ static void block_job_completed_single(BlockJob *job)
>      block_job_unref(job);
>  }
>  
> +static void block_job_cancel_async(BlockJob *job)
> +{
> +    job->cancelled = true;
> +    block_job_iostatus_reset(job);
> +}
> +
>  static void block_job_completed_txn_abort(BlockJob *job)
>  {
>      AioContext *ctx;
> @@ -397,7 +403,7 @@ static void block_job_completed_txn_abort(BlockJob *job)
>               * them; this job, however, may or may not be cancelled, depending
>               * on the caller, so leave it. */
>              if (other_job != job) {
> -                other_job->cancelled = true;
> +                block_job_cancel_async(other_job);

Adds an ioreset here, which I think is probably fine...

>              }
>              continue;
>          }
> @@ -489,8 +495,7 @@ void block_job_user_resume(BlockJob *job)
>  void block_job_cancel(BlockJob *job)
>  {
>      if (block_job_started(job)) {
> -        job->cancelled = true;
> -        block_job_iostatus_reset(job);
> +        block_job_cancel_async(job);
>          block_job_enter(job);
>      } else {
>          block_job_completed(job, -ECANCELED);
> 

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 09/10] blockjob: reorganize block_job_completed_txn_abort
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 09/10] blockjob: reorganize block_job_completed_txn_abort Paolo Bonzini
@ 2017-04-08  1:15   ` John Snow
  2017-04-08 10:03     ` Paolo Bonzini
  2017-04-10  9:44   ` Stefan Hajnoczi
  2017-04-10 19:17   ` John Snow
  2 siblings, 1 reply; 44+ messages in thread
From: John Snow @ 2017-04-08  1:15 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: jcody



On 03/23/2017 01:39 PM, Paolo Bonzini wrote:
> This splits the part that touches job states from the part that invokes
> callbacks.  It will be a bit simpler to understand once job states will
> be protected by a different mutex than the AioContext lock.
> 

Maybe easier to review then, too :)

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockjob.c | 165 ++++++++++++++++++++++++++++++++-----------------------------
>  1 file changed, 88 insertions(+), 77 deletions(-)
> 
> diff --git a/blockjob.c b/blockjob.c
> index 093962b..3fa2885 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -76,6 +76,39 @@ BlockJob *block_job_get(const char *id)
>      return NULL;
>  }
>  
> +BlockJobTxn *block_job_txn_new(void)
> +{
> +    BlockJobTxn *txn = g_new0(BlockJobTxn, 1);
> +    QLIST_INIT(&txn->jobs);
> +    txn->refcnt = 1;
> +    return txn;
> +}
> +
> +static void block_job_txn_ref(BlockJobTxn *txn)
> +{
> +    txn->refcnt++;
> +}
> +
> +void block_job_txn_unref(BlockJobTxn *txn)
> +{
> +    if (txn && --txn->refcnt == 0) {
> +        g_free(txn);
> +    }
> +}
> +
> +void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
> +{
> +    if (!txn) {
> +        return;
> +    }
> +
> +    assert(!job->txn);
> +    job->txn = txn;
> +
> +    QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
> +    block_job_txn_ref(txn);
> +}
> +
>  static void block_job_pause(BlockJob *job)
>  {
>      job->pause_count++;
> @@ -336,6 +369,8 @@ void block_job_start(BlockJob *job)
>  
>  static void block_job_completed_single(BlockJob *job)
>  {
> +    assert(job->completed);
> +
>      if (!job->ret) {
>          if (job->driver->commit) {
>              job->driver->commit(job);
> @@ -376,14 +411,49 @@ static void block_job_completed_single(BlockJob *job)
>  static void block_job_cancel_async(BlockJob *job)
>  {
>      job->cancelled = true;
> -    block_job_iostatus_reset(job);
> +    if (!job->completed) {
> +        block_job_iostatus_reset(job);
> +    }
> +}
> +
> +static int block_job_finish_sync(BlockJob *job,
> +                                 void (*finish)(BlockJob *, Error **errp),
> +                                 Error **errp)
> +{
> +    Error *local_err = NULL;
> +    int ret;
> +
> +    assert(blk_bs(job->blk)->job == job);
> +
> +    block_job_ref(job);
> +
> +    if (finish) {
> +        finish(job, &local_err);
> +    }
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        block_job_unref(job);
> +        return -EBUSY;
> +    }
> +    /* block_job_drain calls block_job_enter, and it should be enough to
> +     * induce progress until the job completes or moves to the main thread.
> +    */
> +    while (!job->deferred_to_main_loop && !job->completed) {
> +        block_job_drain(job);
> +    }
> +    while (!job->completed) {
> +        aio_poll(qemu_get_aio_context(), true);
> +    }
> +    ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
> +    block_job_unref(job);
> +    return ret;
>  }
>  
>  static void block_job_completed_txn_abort(BlockJob *job)
>  {
>      AioContext *ctx;
>      BlockJobTxn *txn = job->txn;
> -    BlockJob *other_job, *next;
> +    BlockJob *other_job;
>  
>      if (txn->aborting) {
>          /*
> @@ -392,29 +462,34 @@ static void block_job_completed_txn_abort(BlockJob *job)
>          return;
>      }
>      txn->aborting = true;
> +    block_job_txn_ref(txn);
> +
>      /* We are the first failed job. Cancel other jobs. */
>      QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
>          ctx = blk_get_aio_context(other_job->blk);
>          aio_context_acquire(ctx);
>      }
> +
> +    /* Other jobs are "effectively" cancelled by us, set the status for
> +     * them; this job, however, may or may not be cancelled, depending
> +     * on the caller, so leave it. */
>      QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
> -        if (other_job == job || other_job->completed) {
> -            /* Other jobs are "effectively" cancelled by us, set the status for
> -             * them; this job, however, may or may not be cancelled, depending
> -             * on the caller, so leave it. */
> -            if (other_job != job) {
> -                block_job_cancel_async(other_job);
> -            }
> -            continue;
> +        if (other_job != job) {
> +            block_job_cancel_async(other_job);
>          }
> -        block_job_cancel_sync(other_job);
> -        assert(other_job->completed);
>      }
> -    QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
> +    while (!QLIST_EMPTY(&txn->jobs)) {
> +        other_job = QLIST_FIRST(&txn->jobs);
>          ctx = blk_get_aio_context(other_job->blk);
> +        if (!other_job->completed) {
> +            assert(other_job->cancelled);
> +            block_job_finish_sync(other_job, NULL, NULL);
> +        }
>          block_job_completed_single(other_job);
>          aio_context_release(ctx);
>      }
> +
> +    block_job_txn_unref(txn);
>  }
>  
>  static void block_job_completed_txn_success(BlockJob *job)
> @@ -502,37 +577,6 @@ void block_job_cancel(BlockJob *job)
>      }
>  }
>  
> -static int block_job_finish_sync(BlockJob *job,
> -                                 void (*finish)(BlockJob *, Error **errp),
> -                                 Error **errp)
> -{
> -    Error *local_err = NULL;
> -    int ret;
> -
> -    assert(blk_bs(job->blk)->job == job);
> -
> -    block_job_ref(job);
> -
> -    finish(job, &local_err);
> -    if (local_err) {
> -        error_propagate(errp, local_err);
> -        block_job_unref(job);
> -        return -EBUSY;
> -    }
> -    /* block_job_drain calls block_job_enter, and it should be enough to
> -     * induce progress until the job completes or moves to the main thread.
> -    */
> -    while (!job->deferred_to_main_loop && !job->completed) {
> -        block_job_drain(job);
> -    }
> -    while (!job->completed) {
> -        aio_poll(qemu_get_aio_context(), true);
> -    }
> -    ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
> -    block_job_unref(job);
> -    return ret;
> -}
> -
>  /* A wrapper around block_job_cancel() taking an Error ** parameter so it may be
>   * used with block_job_finish_sync() without the need for (rather nasty)
>   * function pointer casts there. */
> @@ -856,36 +900,3 @@ void block_job_defer_to_main_loop(BlockJob *job,
>      aio_bh_schedule_oneshot(qemu_get_aio_context(),
>                              block_job_defer_to_main_loop_bh, data);
>  }
> -
> -BlockJobTxn *block_job_txn_new(void)
> -{
> -    BlockJobTxn *txn = g_new0(BlockJobTxn, 1);
> -    QLIST_INIT(&txn->jobs);
> -    txn->refcnt = 1;
> -    return txn;
> -}
> -
> -static void block_job_txn_ref(BlockJobTxn *txn)
> -{
> -    txn->refcnt++;
> -}
> -
> -void block_job_txn_unref(BlockJobTxn *txn)
> -{
> -    if (txn && --txn->refcnt == 0) {
> -        g_free(txn);
> -    }
> -}
> -
> -void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
> -{
> -    if (!txn) {
> -        return;
> -    }
> -
> -    assert(!job->txn);
> -    job->txn = txn;
> -
> -    QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
> -    block_job_txn_ref(txn);
> -}
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 10/10] blockjob: use deferred_to_main_loop to indicate the coroutine has ended
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 10/10] blockjob: use deferred_to_main_loop to indicate the coroutine has ended Paolo Bonzini
@ 2017-04-08  1:32   ` John Snow
  2017-04-08  9:56     ` Paolo Bonzini
  2017-04-10  9:46   ` Stefan Hajnoczi
  1 sibling, 1 reply; 44+ messages in thread
From: John Snow @ 2017-04-08  1:32 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: jcody



On 03/23/2017 01:39 PM, Paolo Bonzini wrote:
> All block jobs are using block_job_defer_to_main_loop as the final
> step just before the coroutine terminates.  At this point,
> block_job_enter should do nothing, but currently it restarts
> the freed coroutine.
> 
> Now, the job->co states should probably be changed to an enum
> (e.g. BEFORE_START, STARTED, YIELDED, COMPLETED) subsuming

Yes, I'd love to formalize the FSM for jobs.

> block_job_started, job->deferred_to_main_loop and job->busy.
> For now, this patch eliminates the problematic reenter by
> removing the reset of job->deferred_to_main_loop (which served
> no purpose, as far as I could see) and checking the flag in
> block_job_enter.

Not sure -- the original commit 794f01414 makes it seem like it should
stay so that the correct AIO context can be acquired. Probably a race as
jobs don't often stay long once they've deferred to the main loop, but I
think the reset is harmless as you say.

> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockjob.c                   | 10 ++++++++--
>  include/block/blockjob_int.h |  3 ++-
>  2 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/blockjob.c b/blockjob.c
> index 3fa2885..2d80dae 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -750,7 +750,14 @@ void block_job_resume_all(void)
>  
>  void block_job_enter(BlockJob *job)
>  {
> -    if (job->co && !job->busy) {
> +    if (!block_job_started(job)) {
> +        return;
> +    }
> +    if (job->deferred_to_main_loop) {
> +        return;
> +    }
> +
> +    if (!job->busy) {
>          qemu_coroutine_enter(job->co);
>      }
>  }
> @@ -874,7 +881,6 @@ static void block_job_defer_to_main_loop_bh(void *opaque)
>          aio_context_acquire(aio_context);
>      }
>  
> -    data->job->deferred_to_main_loop = false;
>      data->fn(data->job, data->opaque);
>  
>      if (aio_context != data->aio_context) {
> diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
> index 97ffc43..4d287ba 100644
> --- a/include/block/blockjob_int.h
> +++ b/include/block/blockjob_int.h
> @@ -227,7 +227,8 @@ typedef void BlockJobDeferToMainLoopFn(BlockJob *job, void *opaque);
>   * @fn: The function to run in the main loop
>   * @opaque: The opaque value that is passed to @fn
>   *
> - * Execute a given function in the main loop with the BlockDriverState
> + * This function must be called by the main job coroutine just before it
> + * returns.  @fn is executed in the main loop with the BlockDriverState
>   * AioContext acquired.  Block jobs must call bdrv_unref(), bdrv_close(), and
>   * anything that uses bdrv_drain_all() in the main loop.
>   *
> 

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 05/10] blockjob: separate monitor and blockjob APIs
  2017-04-08  0:03   ` John Snow
@ 2017-04-08  9:52     ` Paolo Bonzini
  2017-04-10 16:05       ` John Snow
  0 siblings, 1 reply; 44+ messages in thread
From: Paolo Bonzini @ 2017-04-08  9:52 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: jcody



On 08/04/2017 08:03, John Snow wrote:
> Looks clean, though it may be useful to do a few more things;
> 
> - Demarcate what you think is the monitor API in this file

It's already there:

+/*
+ * API for block job drivers and the block layer.
+ */
+

where everything before is for the monitor.

> - Organize blockjob.h to match to serve as a useful reference.

Hmm, yes.

Paolo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 08/10] blockjob: introduce block_job_cancel_async
  2017-04-08  1:13   ` John Snow
@ 2017-04-08  9:54     ` Paolo Bonzini
  0 siblings, 0 replies; 44+ messages in thread
From: Paolo Bonzini @ 2017-04-08  9:54 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: jcody, Kashyap Chamarthy



On 08/04/2017 09:13, John Snow wrote:
> 
> 
> On 03/23/2017 01:39 PM, Paolo Bonzini wrote:
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> What was the bad design that required you to fix the previous test? :)

It was actually patch 9 that had the bug, I'll reorder to match the
commit message.

Paolo

>> ---
>>  blockjob.c | 11 ++++++++---
>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/blockjob.c b/blockjob.c
>> index c9cb5b1..093962b 100644
>> --- a/blockjob.c
>> +++ b/blockjob.c
>> @@ -373,6 +373,12 @@ static void block_job_completed_single(BlockJob *job)
>>      block_job_unref(job);
>>  }
>>  
>> +static void block_job_cancel_async(BlockJob *job)
>> +{
>> +    job->cancelled = true;
>> +    block_job_iostatus_reset(job);
>> +}
>> +
>>  static void block_job_completed_txn_abort(BlockJob *job)
>>  {
>>      AioContext *ctx;
>> @@ -397,7 +403,7 @@ static void block_job_completed_txn_abort(BlockJob *job)
>>               * them; this job, however, may or may not be cancelled, depending
>>               * on the caller, so leave it. */
>>              if (other_job != job) {
>> -                other_job->cancelled = true;
>> +                block_job_cancel_async(other_job);
> 
> Adds an ioreset here, which I think is probably fine...
> 
>>              }
>>              continue;
>>          }
>> @@ -489,8 +495,7 @@ void block_job_user_resume(BlockJob *job)
>>  void block_job_cancel(BlockJob *job)
>>  {
>>      if (block_job_started(job)) {
>> -        job->cancelled = true;
>> -        block_job_iostatus_reset(job);
>> +        block_job_cancel_async(job);
>>          block_job_enter(job);
>>      } else {
>>          block_job_completed(job, -ECANCELED);
>>
> 
> Reviewed-by: John Snow <jsnow@redhat.com>
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 10/10] blockjob: use deferred_to_main_loop to indicate the coroutine has ended
  2017-04-08  1:32   ` John Snow
@ 2017-04-08  9:56     ` Paolo Bonzini
  0 siblings, 0 replies; 44+ messages in thread
From: Paolo Bonzini @ 2017-04-08  9:56 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: jcody



On 08/04/2017 09:32, John Snow wrote:
> 
> 
> On 03/23/2017 01:39 PM, Paolo Bonzini wrote:
>> All block jobs are using block_job_defer_to_main_loop as the final
>> step just before the coroutine terminates.  At this point,
>> block_job_enter should do nothing, but currently it restarts
>> the freed coroutine.
>>
>> Now, the job->co states should probably be changed to an enum
>> (e.g. BEFORE_START, STARTED, YIELDED, COMPLETED) subsuming
> 
> Yes, I'd love to formalize the FSM for jobs.
> 
>> block_job_started, job->deferred_to_main_loop and job->busy.
>> For now, this patch eliminates the problematic reenter by
>> removing the reset of job->deferred_to_main_loop (which served
>> no purpose, as far as I could see) and checking the flag in
>> block_job_enter.
> 
> Not sure -- the original commit 794f01414 makes it seem like it should
> stay so that the correct AIO context can be acquired. Probably a race as
> jobs don't often stay long once they've deferred to the main loop, but I
> think the reset is harmless as you say.

You're right.  The difference is that now we pretty much expect the
deferred part to be at the end of the job:

    commit bae8196d9f97916de6323e70e3e374362ee16ec4
    Author: Paolo Bonzini <pbonzini@redhat.com>
    Date:   Thu Oct 27 12:48:50 2016 +0200

    blockjob: introduce .drain callback for jobs

    This is required to decouple block jobs from running in an
    AioContext.  With multiqueue block devices, a BlockDriverState
    does not really belong to a single AioContext.

    The solution is to first wait until all I/O operations are
    complete; then loop in the main thread for the block job to
    complete entirely.

so I'll improve the commit message to point to both 794f01414 and this one.

Paolo
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  blockjob.c                   | 10 ++++++++--
>>  include/block/blockjob_int.h |  3 ++-
>>  2 files changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/blockjob.c b/blockjob.c
>> index 3fa2885..2d80dae 100644
>> --- a/blockjob.c
>> +++ b/blockjob.c
>> @@ -750,7 +750,14 @@ void block_job_resume_all(void)
>>  
>>  void block_job_enter(BlockJob *job)
>>  {
>> -    if (job->co && !job->busy) {
>> +    if (!block_job_started(job)) {
>> +        return;
>> +    }
>> +    if (job->deferred_to_main_loop) {
>> +        return;
>> +    }
>> +
>> +    if (!job->busy) {
>>          qemu_coroutine_enter(job->co);
>>      }
>>  }
>> @@ -874,7 +881,6 @@ static void block_job_defer_to_main_loop_bh(void *opaque)
>>          aio_context_acquire(aio_context);
>>      }
>>  
>> -    data->job->deferred_to_main_loop = false;
>>      data->fn(data->job, data->opaque);
>>  
>>      if (aio_context != data->aio_context) {
>> diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
>> index 97ffc43..4d287ba 100644
>> --- a/include/block/blockjob_int.h
>> +++ b/include/block/blockjob_int.h
>> @@ -227,7 +227,8 @@ typedef void BlockJobDeferToMainLoopFn(BlockJob *job, void *opaque);
>>   * @fn: The function to run in the main loop
>>   * @opaque: The opaque value that is passed to @fn
>>   *
>> - * Execute a given function in the main loop with the BlockDriverState
>> + * This function must be called by the main job coroutine just before it
>> + * returns.  @fn is executed in the main loop with the BlockDriverState
>>   * AioContext acquired.  Block jobs must call bdrv_unref(), bdrv_close(), and
>>   * anything that uses bdrv_drain_all() in the main loop.
>>   *
>>
> 
> Reviewed-by: John Snow <jsnow@redhat.com>
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 09/10] blockjob: reorganize block_job_completed_txn_abort
  2017-04-08  1:15   ` John Snow
@ 2017-04-08 10:03     ` Paolo Bonzini
  0 siblings, 0 replies; 44+ messages in thread
From: Paolo Bonzini @ 2017-04-08 10:03 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: jcody



On 08/04/2017 09:15, John Snow wrote:
> 
> 
> On 03/23/2017 01:39 PM, Paolo Bonzini wrote:
>> This splits the part that touches job states from the part that invokes
>> callbacks.  It will be a bit simpler to understand once job states will
>> be protected by a different mutex than the AioContext lock.
>>
> 
> Maybe easier to review then, too :)

The idea is that the "touching job states" part (block_job_cancel_async)
doesn't release the mutex, while the "invokes callbacks" functions
(block_job_finish_sync, block_job_completed_single) part does.  It is
much simpler to understand if all block_job_cancel_asyncs are done first.

I should once more split code movement from changes, but I think this
patch is still reviewable on its own.  All the "introduce block job
mutex" does in block_job_completed_txn_abort is remove all
aio_context_acquire and release calls.

Paolo

>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  blockjob.c | 165 ++++++++++++++++++++++++++++++++-----------------------------
>>  1 file changed, 88 insertions(+), 77 deletions(-)
>>
>> diff --git a/blockjob.c b/blockjob.c
>> index 093962b..3fa2885 100644
>> --- a/blockjob.c
>> +++ b/blockjob.c
>> @@ -76,6 +76,39 @@ BlockJob *block_job_get(const char *id)
>>      return NULL;
>>  }
>>  
>> +BlockJobTxn *block_job_txn_new(void)
>> +{
>> +    BlockJobTxn *txn = g_new0(BlockJobTxn, 1);
>> +    QLIST_INIT(&txn->jobs);
>> +    txn->refcnt = 1;
>> +    return txn;
>> +}
>> +
>> +static void block_job_txn_ref(BlockJobTxn *txn)
>> +{
>> +    txn->refcnt++;
>> +}
>> +
>> +void block_job_txn_unref(BlockJobTxn *txn)
>> +{
>> +    if (txn && --txn->refcnt == 0) {
>> +        g_free(txn);
>> +    }
>> +}
>> +
>> +void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
>> +{
>> +    if (!txn) {
>> +        return;
>> +    }
>> +
>> +    assert(!job->txn);
>> +    job->txn = txn;
>> +
>> +    QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
>> +    block_job_txn_ref(txn);
>> +}
>> +
>>  static void block_job_pause(BlockJob *job)
>>  {
>>      job->pause_count++;
>> @@ -336,6 +369,8 @@ void block_job_start(BlockJob *job)
>>  
>>  static void block_job_completed_single(BlockJob *job)
>>  {
>> +    assert(job->completed);
>> +
>>      if (!job->ret) {
>>          if (job->driver->commit) {
>>              job->driver->commit(job);
>> @@ -376,14 +411,49 @@ static void block_job_completed_single(BlockJob *job)
>>  static void block_job_cancel_async(BlockJob *job)
>>  {
>>      job->cancelled = true;
>> -    block_job_iostatus_reset(job);
>> +    if (!job->completed) {
>> +        block_job_iostatus_reset(job);
>> +    }
>> +}
>> +
>> +static int block_job_finish_sync(BlockJob *job,
>> +                                 void (*finish)(BlockJob *, Error **errp),
>> +                                 Error **errp)
>> +{
>> +    Error *local_err = NULL;
>> +    int ret;
>> +
>> +    assert(blk_bs(job->blk)->job == job);
>> +
>> +    block_job_ref(job);
>> +
>> +    if (finish) {
>> +        finish(job, &local_err);
>> +    }
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        block_job_unref(job);
>> +        return -EBUSY;
>> +    }
>> +    /* block_job_drain calls block_job_enter, and it should be enough to
>> +     * induce progress until the job completes or moves to the main thread.
>> +    */
>> +    while (!job->deferred_to_main_loop && !job->completed) {
>> +        block_job_drain(job);
>> +    }
>> +    while (!job->completed) {
>> +        aio_poll(qemu_get_aio_context(), true);
>> +    }
>> +    ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
>> +    block_job_unref(job);
>> +    return ret;
>>  }
>>  
>>  static void block_job_completed_txn_abort(BlockJob *job)
>>  {
>>      AioContext *ctx;
>>      BlockJobTxn *txn = job->txn;
>> -    BlockJob *other_job, *next;
>> +    BlockJob *other_job;
>>  
>>      if (txn->aborting) {
>>          /*
>> @@ -392,29 +462,34 @@ static void block_job_completed_txn_abort(BlockJob *job)
>>          return;
>>      }
>>      txn->aborting = true;
>> +    block_job_txn_ref(txn);
>> +
>>      /* We are the first failed job. Cancel other jobs. */
>>      QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
>>          ctx = blk_get_aio_context(other_job->blk);
>>          aio_context_acquire(ctx);
>>      }
>> +
>> +    /* Other jobs are "effectively" cancelled by us, set the status for
>> +     * them; this job, however, may or may not be cancelled, depending
>> +     * on the caller, so leave it. */
>>      QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
>> -        if (other_job == job || other_job->completed) {
>> -            /* Other jobs are "effectively" cancelled by us, set the status for
>> -             * them; this job, however, may or may not be cancelled, depending
>> -             * on the caller, so leave it. */
>> -            if (other_job != job) {
>> -                block_job_cancel_async(other_job);
>> -            }
>> -            continue;
>> +        if (other_job != job) {
>> +            block_job_cancel_async(other_job);
>>          }
>> -        block_job_cancel_sync(other_job);
>> -        assert(other_job->completed);
>>      }
>> -    QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
>> +    while (!QLIST_EMPTY(&txn->jobs)) {
>> +        other_job = QLIST_FIRST(&txn->jobs);
>>          ctx = blk_get_aio_context(other_job->blk);
>> +        if (!other_job->completed) {
>> +            assert(other_job->cancelled);
>> +            block_job_finish_sync(other_job, NULL, NULL);
>> +        }
>>          block_job_completed_single(other_job);
>>          aio_context_release(ctx);
>>      }
>> +
>> +    block_job_txn_unref(txn);
>>  }
>>  
>>  static void block_job_completed_txn_success(BlockJob *job)
>> @@ -502,37 +577,6 @@ void block_job_cancel(BlockJob *job)
>>      }
>>  }
>>  
>> -static int block_job_finish_sync(BlockJob *job,
>> -                                 void (*finish)(BlockJob *, Error **errp),
>> -                                 Error **errp)
>> -{
>> -    Error *local_err = NULL;
>> -    int ret;
>> -
>> -    assert(blk_bs(job->blk)->job == job);
>> -
>> -    block_job_ref(job);
>> -
>> -    finish(job, &local_err);
>> -    if (local_err) {
>> -        error_propagate(errp, local_err);
>> -        block_job_unref(job);
>> -        return -EBUSY;
>> -    }
>> -    /* block_job_drain calls block_job_enter, and it should be enough to
>> -     * induce progress until the job completes or moves to the main thread.
>> -    */
>> -    while (!job->deferred_to_main_loop && !job->completed) {
>> -        block_job_drain(job);
>> -    }
>> -    while (!job->completed) {
>> -        aio_poll(qemu_get_aio_context(), true);
>> -    }
>> -    ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
>> -    block_job_unref(job);
>> -    return ret;
>> -}
>> -
>>  /* A wrapper around block_job_cancel() taking an Error ** parameter so it may be
>>   * used with block_job_finish_sync() without the need for (rather nasty)
>>   * function pointer casts there. */
>> @@ -856,36 +900,3 @@ void block_job_defer_to_main_loop(BlockJob *job,
>>      aio_bh_schedule_oneshot(qemu_get_aio_context(),
>>                              block_job_defer_to_main_loop_bh, data);
>>  }
>> -
>> -BlockJobTxn *block_job_txn_new(void)
>> -{
>> -    BlockJobTxn *txn = g_new0(BlockJobTxn, 1);
>> -    QLIST_INIT(&txn->jobs);
>> -    txn->refcnt = 1;
>> -    return txn;
>> -}
>> -
>> -static void block_job_txn_ref(BlockJobTxn *txn)
>> -{
>> -    txn->refcnt++;
>> -}
>> -
>> -void block_job_txn_unref(BlockJobTxn *txn)
>> -{
>> -    if (txn && --txn->refcnt == 0) {
>> -        g_free(txn);
>> -    }
>> -}
>> -
>> -void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
>> -{
>> -    if (!txn) {
>> -        return;
>> -    }
>> -
>> -    assert(!job->txn);
>> -    job->txn = txn;
>> -
>> -    QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
>> -    block_job_txn_ref(txn);
>> -}
>>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 03/10] blockjob: introduce block_job_fail
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 03/10] blockjob: introduce block_job_fail Paolo Bonzini
  2017-04-07 23:24   ` John Snow
@ 2017-04-10  9:22   ` Stefan Hajnoczi
  2017-04-11  4:00     ` Paolo Bonzini
  1 sibling, 1 reply; 44+ messages in thread
From: Stefan Hajnoczi @ 2017-04-10  9:22 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, jcody, jsnow

[-- Attachment #1: Type: text/plain, Size: 671 bytes --]

On Thu, Mar 23, 2017 at 06:39:21PM +0100, Paolo Bonzini wrote:
> Outside blockjob.c, block_job_unref is only used when a block job fails
> to start, and block_job_ref is not used at all.  The reference counting
> thus is pretty well hidden.  Introduce a separate function to be used
> by block jobs; because block_job_ref and block_job_unref now become
> static, move them earlier in blockjob.c.
> 
> Later on, block_job_fail will also have different locking than
> block_job_unref.

block_job_fail() sounds like it's *the* job failure API.
How about block_job_fail_early()?

It indicates the API is only for the beginning of the job's lifecycle.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 04/10] blockjob: introduce block_job_pause/resume_all
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 04/10] blockjob: introduce block_job_pause/resume_all Paolo Bonzini
  2017-04-07 23:37   ` John Snow
@ 2017-04-10  9:26   ` Stefan Hajnoczi
  1 sibling, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2017-04-10  9:26 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, jcody, jsnow

[-- Attachment #1: Type: text/plain, Size: 662 bytes --]

On Thu, Mar 23, 2017 at 06:39:22PM +0100, Paolo Bonzini wrote:
> Remove use of block_job_pause/resume from outside blockjob.c, thus
> making them static.  Again, one reason to do this is that
> block_job_pause/resume will have different locking rules compared
> to everything else that block.c and block/io.c use.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/io.c               |  18 +-------
>  blockjob.c               | 114 ++++++++++++++++++++++++++++-------------------
>  include/block/blockjob.h |  14 +++---
>  3 files changed, 77 insertions(+), 69 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] blockjob: remove unnecessary check
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 01/10] blockjob: remove unnecessary check Paolo Bonzini
  2017-04-07 22:30   ` John Snow
  2017-04-07 22:54   ` Philippe Mathieu-Daudé
@ 2017-04-10  9:27   ` Stefan Hajnoczi
  2 siblings, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2017-04-10  9:27 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, jcody, jsnow

[-- Attachment #1: Type: text/plain, Size: 319 bytes --]

On Thu, Mar 23, 2017 at 06:39:19PM +0100, Paolo Bonzini wrote:
> !job is always checked prior to the call, drop it from here.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockjob.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] blockjob: remove iostatus_reset callback
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 02/10] blockjob: remove iostatus_reset callback Paolo Bonzini
  2017-04-07 23:12   ` John Snow
@ 2017-04-10  9:27   ` Stefan Hajnoczi
  1 sibling, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2017-04-10  9:27 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, jcody, jsnow

[-- Attachment #1: Type: text/plain, Size: 395 bytes --]

On Thu, Mar 23, 2017 at 06:39:20PM +0100, Paolo Bonzini wrote:
> This is unused since commit 66a0fae ("blockjob: Don't touch BDS iostatus",
> 2016-05-19).
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockjob.c                   | 3 ---
>  include/block/blockjob_int.h | 3 ---
>  2 files changed, 6 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 05/10] blockjob: separate monitor and blockjob APIs
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 05/10] blockjob: separate monitor and blockjob APIs Paolo Bonzini
  2017-04-08  0:03   ` John Snow
@ 2017-04-10  9:30   ` Stefan Hajnoczi
  1 sibling, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2017-04-10  9:30 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, jcody, jsnow

[-- Attachment #1: Type: text/plain, Size: 1029 bytes --]

On Thu, Mar 23, 2017 at 06:39:23PM +0100, Paolo Bonzini wrote:
> We already have different locking policies for APIs called by the monitor
> and the block job.  Monitor APIs need consistency across block_job_get
> and the actual operation (e.g. block_job_set_speed), so currently there
> are explicit aio_context_acquire/release calls in blockdev.c.
> 
> When a block job needs to do something instead it doesn't care about locking,
> because the whole coroutine runs under the AioContext lock.  When moving
> away from the AioContext lock, the monitor will have to call new
> block_job_lock/unlock APIs, while blockjob APIs will take care of this
> for the users.
> 
> In preparation for that, keep all the blockjob APIs together in the
> blockjob.c file.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockjob.c | 206 +++++++++++++++++++++++++++++++------------------------------
>  1 file changed, 105 insertions(+), 101 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 06/10] blockjob: move iostatus reset inside block_job_user_resume
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 06/10] blockjob: move iostatus reset inside block_job_user_resume Paolo Bonzini
  2017-04-08  0:28   ` John Snow
@ 2017-04-10  9:33   ` Stefan Hajnoczi
  1 sibling, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2017-04-10  9:33 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, jcody, jsnow

[-- Attachment #1: Type: text/plain, Size: 1084 bytes --]

On Thu, Mar 23, 2017 at 06:39:24PM +0100, Paolo Bonzini wrote:
> Outside blockjob.c, the block_job_iostatus_reset function is used once
> in the monitor and once in BlockBackend.  When we introduce the block
> job mutex, block_job_iostatus_reset's client is going to be the block
> layer (for which blockjob.c will take the block job mutex) rather than
> the monitor (which will take the block job mutex by itself).
> 
> The monitor's call to block_job_iostatus_reset from the monitor comes
> just before the sole call to block_job_user_resume, so reset the
> iostatus directly from block_job_iostatus_reset.  This will avoid
> the need to introduce separate block_job_iostatus_reset and
> block_job_iostatus_reset_locked APIs.
> 
> After making this change, move the function together with the others
> that were moved in the previous patch.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockdev.c |  1 -
>  blockjob.c | 11 ++++++-----
>  2 files changed, 6 insertions(+), 6 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 07/10] blockjob: strengthen a bit test-blockjob-txn
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 07/10] blockjob: strengthen a bit test-blockjob-txn Paolo Bonzini
  2017-04-08  0:37   ` John Snow
@ 2017-04-10  9:35   ` Stefan Hajnoczi
  1 sibling, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2017-04-10  9:35 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, jcody, jsnow

[-- Attachment #1: Type: text/plain, Size: 556 bytes --]

On Thu, Mar 23, 2017 at 06:39:25PM +0100, Paolo Bonzini wrote:
> Unlike test-blockjob-txn, QMP releases the reference to the transaction
> before the jobs finish.  Thus, while working on the next patch,
> qemu-iotest 124 showed a failure that the unit tests did not have.
> Make the unit test just a little nastier, so that it fails too.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  tests/test-blockjob-txn.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 08/10] blockjob: introduce block_job_cancel_async
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 08/10] blockjob: introduce block_job_cancel_async Paolo Bonzini
  2017-04-08  1:13   ` John Snow
@ 2017-04-10  9:36   ` Stefan Hajnoczi
  1 sibling, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2017-04-10  9:36 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, jcody, jsnow

[-- Attachment #1: Type: text/plain, Size: 256 bytes --]

On Thu, Mar 23, 2017 at 06:39:26PM +0100, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockjob.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 09/10] blockjob: reorganize block_job_completed_txn_abort
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 09/10] blockjob: reorganize block_job_completed_txn_abort Paolo Bonzini
  2017-04-08  1:15   ` John Snow
@ 2017-04-10  9:44   ` Stefan Hajnoczi
  2017-04-10 19:17   ` John Snow
  2 siblings, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2017-04-10  9:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, jcody, jsnow

[-- Attachment #1: Type: text/plain, Size: 533 bytes --]

On Thu, Mar 23, 2017 at 06:39:27PM +0100, Paolo Bonzini wrote:
> This splits the part that touches job states from the part that invokes
> callbacks.  It will be a bit simpler to understand once job states will
> be protected by a different mutex than the AioContext lock.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockjob.c | 165 ++++++++++++++++++++++++++++++++-----------------------------
>  1 file changed, 88 insertions(+), 77 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 10/10] blockjob: use deferred_to_main_loop to indicate the coroutine has ended
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 10/10] blockjob: use deferred_to_main_loop to indicate the coroutine has ended Paolo Bonzini
  2017-04-08  1:32   ` John Snow
@ 2017-04-10  9:46   ` Stefan Hajnoczi
  1 sibling, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2017-04-10  9:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, jcody, jsnow

[-- Attachment #1: Type: text/plain, Size: 953 bytes --]

On Thu, Mar 23, 2017 at 06:39:28PM +0100, Paolo Bonzini wrote:
> All block jobs are using block_job_defer_to_main_loop as the final
> step just before the coroutine terminates.  At this point,
> block_job_enter should do nothing, but currently it restarts
> the freed coroutine.
> 
> Now, the job->co states should probably be changed to an enum
> (e.g. BEFORE_START, STARTED, YIELDED, COMPLETED) subsuming
> block_job_started, job->deferred_to_main_loop and job->busy.
> For now, this patch eliminates the problematic reenter by
> removing the reset of job->deferred_to_main_loop (which served
> no purpose, as far as I could see) and checking the flag in
> block_job_enter.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockjob.c                   | 10 ++++++++--
>  include/block/blockjob_int.h |  3 ++-
>  2 files changed, 10 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 05/10] blockjob: separate monitor and blockjob APIs
  2017-04-08  9:52     ` Paolo Bonzini
@ 2017-04-10 16:05       ` John Snow
  2017-04-11  4:57         ` Paolo Bonzini
  0 siblings, 1 reply; 44+ messages in thread
From: John Snow @ 2017-04-10 16:05 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: jcody



On 04/08/2017 05:52 AM, Paolo Bonzini wrote:
> 
> 
> On 08/04/2017 08:03, John Snow wrote:
>> Looks clean, though it may be useful to do a few more things;
>>
>> - Demarcate what you think is the monitor API in this file
> 
> It's already there:
> 
> +/*
> + * API for block job drivers and the block layer.
> + */
> +
> 
> where everything before is for the monitor.
> 

I meant explicitly, with a comment at the top explaining the demarcation.

>> - Organize blockjob.h to match to serve as a useful reference.
> 
> Hmm, yes.
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 09/10] blockjob: reorganize block_job_completed_txn_abort
  2017-03-23 17:39 ` [Qemu-devel] [PATCH 09/10] blockjob: reorganize block_job_completed_txn_abort Paolo Bonzini
  2017-04-08  1:15   ` John Snow
  2017-04-10  9:44   ` Stefan Hajnoczi
@ 2017-04-10 19:17   ` John Snow
  2017-04-11  4:11     ` Paolo Bonzini
  2 siblings, 1 reply; 44+ messages in thread
From: John Snow @ 2017-04-10 19:17 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: jcody



On 03/23/2017 01:39 PM, Paolo Bonzini wrote:
> This splits the part that touches job states from the part that invokes
> callbacks.  It will be a bit simpler to understand once job states will
> be protected by a different mutex than the AioContext lock.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockjob.c | 165 ++++++++++++++++++++++++++++++++-----------------------------
>  1 file changed, 88 insertions(+), 77 deletions(-)
> 
> diff --git a/blockjob.c b/blockjob.c
> index 093962b..3fa2885 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -76,6 +76,39 @@ BlockJob *block_job_get(const char *id)
>      return NULL;
>  }
>  
> +BlockJobTxn *block_job_txn_new(void)
> +{
> +    BlockJobTxn *txn = g_new0(BlockJobTxn, 1);
> +    QLIST_INIT(&txn->jobs);
> +    txn->refcnt = 1;
> +    return txn;
> +}
> +
> +static void block_job_txn_ref(BlockJobTxn *txn)
> +{
> +    txn->refcnt++;
> +}
> +
> +void block_job_txn_unref(BlockJobTxn *txn)
> +{
> +    if (txn && --txn->refcnt == 0) {
> +        g_free(txn);
> +    }
> +}
> +
> +void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
> +{
> +    if (!txn) {
> +        return;
> +    }
> +
> +    assert(!job->txn);
> +    job->txn = txn;
> +
> +    QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
> +    block_job_txn_ref(txn);
> +}
> +

Pure movement; split it please?

>  static void block_job_pause(BlockJob *job)
>  {
>      job->pause_count++;
> @@ -336,6 +369,8 @@ void block_job_start(BlockJob *job)
>  
>  static void block_job_completed_single(BlockJob *job)
>  {
> +    assert(job->completed);
> +
>      if (!job->ret) {
>          if (job->driver->commit) {
>              job->driver->commit(job);
> @@ -376,14 +411,49 @@ static void block_job_completed_single(BlockJob *job)
>  static void block_job_cancel_async(BlockJob *job)
>  {
>      job->cancelled = true;
> -    block_job_iostatus_reset(job);
> +    if (!job->completed) {
> +        block_job_iostatus_reset(job);
> +    }
> +}
> +
> +static int block_job_finish_sync(BlockJob *job,
> +                                 void (*finish)(BlockJob *, Error **errp),
> +                                 Error **errp)
> +{
> +    Error *local_err = NULL;
> +    int ret;
> +
> +    assert(blk_bs(job->blk)->job == job);
> +
> +    block_job_ref(job);
> +
> +    if (finish) {
> +        finish(job, &local_err);
> +    }
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        block_job_unref(job);
> +        return -EBUSY;
> +    }
> +    /* block_job_drain calls block_job_enter, and it should be enough to
> +     * induce progress until the job completes or moves to the main thread.
> +    */
> +    while (!job->deferred_to_main_loop && !job->completed) {
> +        block_job_drain(job);
> +    }
> +    while (!job->completed) {
> +        aio_poll(qemu_get_aio_context(), true);
> +    }
> +    ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
> +    block_job_unref(job);
> +    return ret;
>  }

block_job_finish_sync is almost pure movement except for the if (finish)
that gets added around the call to finish(job, &local_err).

I guess this is for the new call where we invoke this with the callback
set as NULL, to avoid calling block_job_cancel_async twice.

>  
>  static void block_job_completed_txn_abort(BlockJob *job)
>  {
>      AioContext *ctx;
>      BlockJobTxn *txn = job->txn;
> -    BlockJob *other_job, *next;
> +    BlockJob *other_job;
>  
>      if (txn->aborting) {
>          /*
> @@ -392,29 +462,34 @@ static void block_job_completed_txn_abort(BlockJob *job)
>          return;
>      }
>      txn->aborting = true;
> +    block_job_txn_ref(txn);
> +
>      /* We are the first failed job. Cancel other jobs. */
>      QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
>          ctx = blk_get_aio_context(other_job->blk);
>          aio_context_acquire(ctx);
>      }
> +
> +    /* Other jobs are "effectively" cancelled by us, set the status for
> +     * them; this job, however, may or may not be cancelled, depending
> +     * on the caller, so leave it. */
>      QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
> -        if (other_job == job || other_job->completed) {
> -            /* Other jobs are "effectively" cancelled by us, set the status for
> -             * them; this job, however, may or may not be cancelled, depending
> -             * on the caller, so leave it. */
> -            if (other_job != job) {
> -                block_job_cancel_async(other_job);
> -            }
> -            continue;
> +        if (other_job != job) {
> +            block_job_cancel_async(other_job);
>          }
> -        block_job_cancel_sync(other_job);
> -        assert(other_job->completed);
>      }
> -    QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
> +    while (!QLIST_EMPTY(&txn->jobs)) {
> +        other_job = QLIST_FIRST(&txn->jobs);
>          ctx = blk_get_aio_context(other_job->blk);
> +        if (!other_job->completed) {
> +            assert(other_job->cancelled);
> +            block_job_finish_sync(other_job, NULL, NULL);
> +        }
>          block_job_completed_single(other_job);
>          aio_context_release(ctx);
>      }
> +
> +    block_job_txn_unref(txn);
>  }
>  

OK, so in a nutshell, here's what used to happen:

-Don't do anything to our own job.
-Other jobs that are completed get block_job_cancel_async.
-Other jobs that are not completed get block_job_cancel_sync.
-All jobs then get block_job_completed_single.

And here's what happens now:

- All other jobs get block_job_cancel_async (completed or not.)
- If the job isn't completed, assert it is canceled, then call
block_job_finish_sync.
- All jobs get block_job_completed_single.


Now, cancel_sync eventually does call block_job_cancel_async, so in
practice we were already calling block_job_cancel_async on all other
jobs anyway.

The only difference now is that some jobs may be in a canceled state but
still running, so you handle that with the block_job_finished_sync call
for any job that is still running.

So, it's basically the same between the two, it just takes a hot second
to see.

One thing that I wonder about a little is the push-down of whether or
not to reset iostatus falling to block_job_cancel_async; it seemed to me
as if txn_abort really had the best knowledge as to whether or not we
wanted to reset iostatus, but as it stands it doesn't really make a
difference.

ACK for now, because it's still not perfectly obvious to me how this
will wind up helping, though I do believe you :)

>  static void block_job_completed_txn_success(BlockJob *job)
> @@ -502,37 +577,6 @@ void block_job_cancel(BlockJob *job)
>      }
>  }
>  
> -static int block_job_finish_sync(BlockJob *job,
> -                                 void (*finish)(BlockJob *, Error **errp),
> -                                 Error **errp)
> -{
> -    Error *local_err = NULL;
> -    int ret;
> -
> -    assert(blk_bs(job->blk)->job == job);
> -
> -    block_job_ref(job);
> -
> -    finish(job, &local_err);
> -    if (local_err) {
> -        error_propagate(errp, local_err);
> -        block_job_unref(job);
> -        return -EBUSY;
> -    }
> -    /* block_job_drain calls block_job_enter, and it should be enough to
> -     * induce progress until the job completes or moves to the main thread.
> -    */
> -    while (!job->deferred_to_main_loop && !job->completed) {
> -        block_job_drain(job);
> -    }
> -    while (!job->completed) {
> -        aio_poll(qemu_get_aio_context(), true);
> -    }
> -    ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
> -    block_job_unref(job);
> -    return ret;
> -}
> -
>  /* A wrapper around block_job_cancel() taking an Error ** parameter so it may be
>   * used with block_job_finish_sync() without the need for (rather nasty)
>   * function pointer casts there. */
> @@ -856,36 +900,3 @@ void block_job_defer_to_main_loop(BlockJob *job,
>      aio_bh_schedule_oneshot(qemu_get_aio_context(),
>                              block_job_defer_to_main_loop_bh, data);
>  }

And everything following is pure movement.

> -
> -BlockJobTxn *block_job_txn_new(void)
> -{
> -    BlockJobTxn *txn = g_new0(BlockJobTxn, 1);
> -    QLIST_INIT(&txn->jobs);
> -    txn->refcnt = 1;
> -    return txn;
> -}
> -
> -static void block_job_txn_ref(BlockJobTxn *txn)
> -{
> -    txn->refcnt++;
> -}
> -
> -void block_job_txn_unref(BlockJobTxn *txn)
> -{
> -    if (txn && --txn->refcnt == 0) {
> -        g_free(txn);
> -    }
> -}
> -
> -void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
> -{
> -    if (!txn) {
> -        return;
> -    }
> -
> -    assert(!job->txn);
> -    job->txn = txn;
> -
> -    QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
> -    block_job_txn_ref(txn);
> -}
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 03/10] blockjob: introduce block_job_fail
  2017-04-10  9:22   ` Stefan Hajnoczi
@ 2017-04-11  4:00     ` Paolo Bonzini
  0 siblings, 0 replies; 44+ messages in thread
From: Paolo Bonzini @ 2017-04-11  4:00 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, jcody, jsnow



On 10/04/2017 17:22, Stefan Hajnoczi wrote:
>>
>> Later on, block_job_fail will also have different locking than
>> block_job_unref.
> block_job_fail() sounds like it's *the* job failure API.
> How about block_job_fail_early()?
> 
> It indicates the API is only for the beginning of the job's lifecycle.

Can't disagree and I'll apply your suggestion.  Thanks,

Paolo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 09/10] blockjob: reorganize block_job_completed_txn_abort
  2017-04-10 19:17   ` John Snow
@ 2017-04-11  4:11     ` Paolo Bonzini
  0 siblings, 0 replies; 44+ messages in thread
From: Paolo Bonzini @ 2017-04-11  4:11 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: jcody



On 11/04/2017 03:17, John Snow wrote:
> One thing that I wonder about a little is the push-down of whether or
> not to reset iostatus falling to block_job_cancel_async; it seemed to me
> as if txn_abort really had the best knowledge as to whether or not we
> wanted to reset iostatus, but as it stands it doesn't really make a
> difference.

Actually, there was no iostatus reset in txn_abort; it was done in
block_job_cancel only.  But the BLOCK_JOB_CANCELLED event doesn't
publish the iostatus actually, so I could have also done the opposite:
remove the reset in block_job_cancel---it shouldn't make a difference in
theory.

On the other hand, I like respecting the invariant then the coroutine
always runs with BLOCK_DEVICE_IO_STATUS_OK iostatus, so I think that
it's right for cancel_async to reset the iostatus.  Speaking about
iostatus and invariants, probably we should also set "job->user_paused =
false" in block_job_cancel_async.

> ACK for now, because it's still not perfectly obvious to me how this
> will wind up helping, though I do believe you :)

It just helps me making sure that the locking is correct later.  Maybe
it will help you too!

Paolo

>>  static void block_job_completed_txn_success(BlockJob *job)
>> @@ -502,37 +577,6 @@ void block_job_cancel(BlockJob *job)
>>      }
>>  }
>>  
>> -static int block_job_finish_sync(BlockJob *job,
>> -                                 void (*finish)(BlockJob *, Error **errp),
>> -                                 Error **errp)
>> -{
>> -    Error *local_err = NULL;
>> -    int ret;
>> -
>> -    assert(blk_bs(job->blk)->job == job);
>> -
>> -    block_job_ref(job);
>> -
>> -    finish(job, &local_err);
>> -    if (local_err) {
>> -        error_propagate(errp, local_err);
>> -        block_job_unref(job);
>> -        return -EBUSY;
>> -    }
>> -    /* block_job_drain calls block_job_enter, and it should be enough to
>> -     * induce progress until the job completes or moves to the main thread.
>> -    */
>> -    while (!job->deferred_to_main_loop && !job->completed) {
>> -        block_job_drain(job);
>> -    }
>> -    while (!job->completed) {
>> -        aio_poll(qemu_get_aio_context(), true);
>> -    }
>> -    ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
>> -    block_job_unref(job);
>> -    return ret;
>> -}
>> -
>>  /* A wrapper around block_job_cancel() taking an Error ** parameter so it may be
>>   * used with block_job_finish_sync() without the need for (rather nasty)
>>   * function pointer casts there. */
>> @@ -856,36 +900,3 @@ void block_job_defer_to_main_loop(BlockJob *job,
>>      aio_bh_schedule_oneshot(qemu_get_aio_context(),
>>                              block_job_defer_to_main_loop_bh, data);
>>  }
> 
> And everything following is pure movement.
> 
>> -
>> -BlockJobTxn *block_job_txn_new(void)
>> -{
>> -    BlockJobTxn *txn = g_new0(BlockJobTxn, 1);
>> -    QLIST_INIT(&txn->jobs);
>> -    txn->refcnt = 1;
>> -    return txn;
>> -}
>> -
>> -static void block_job_txn_ref(BlockJobTxn *txn)
>> -{
>> -    txn->refcnt++;
>> -}
>> -
>> -void block_job_txn_unref(BlockJobTxn *txn)
>> -{
>> -    if (txn && --txn->refcnt == 0) {
>> -        g_free(txn);
>> -    }
>> -}
>> -
>> -void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
>> -{
>> -    if (!txn) {
>> -        return;
>> -    }
>> -
>> -    assert(!job->txn);
>> -    job->txn = txn;
>> -
>> -    QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
>> -    block_job_txn_ref(txn);
>> -}
>>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 05/10] blockjob: separate monitor and blockjob APIs
  2017-04-10 16:05       ` John Snow
@ 2017-04-11  4:57         ` Paolo Bonzini
  2017-04-11 16:19           ` John Snow
  0 siblings, 1 reply; 44+ messages in thread
From: Paolo Bonzini @ 2017-04-11  4:57 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: jcody



On 11/04/2017 00:05, John Snow wrote:
> 
> 
> On 04/08/2017 05:52 AM, Paolo Bonzini wrote:
>>
>>
>> On 08/04/2017 08:03, John Snow wrote:
>>> Looks clean, though it may be useful to do a few more things;
>>>
>>> - Demarcate what you think is the monitor API in this file
>>
>> It's already there:
>>
>> +/*
>> + * API for block job drivers and the block layer.
>> + */
>> +
>>
>> where everything before is for the monitor.
>>
> 
> I meant explicitly, with a comment at the top explaining the demarcation.

Oh, sure.

>>> - Organize blockjob.h to match to serve as a useful reference.
>>
>> Hmm, yes.

As it turns out, no headers are necessary---but yours was a very good
remark still, because a couple mistakes in this series stood out when
checking.

The "API for block job drivers and the block layer" is already in
blockjob_int.h while the rest is in blockjob.h.  This is nice because it
provides some validation of the concept behind the patch, and also of
the locking policy I chose for the rest of the work.

But, there are two exceptions.  Both of them are introduced by this
series and they shouldn't be:

- blockjob_pause/resume_all should have its declaration in block_int.h,
so I've fixed patch 4 accordingly

- blockjob_create is in blockjob_int.h, but this patch should move it to
the second part of the file, too.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 05/10] blockjob: separate monitor and blockjob APIs
  2017-04-11  4:57         ` Paolo Bonzini
@ 2017-04-11 16:19           ` John Snow
  0 siblings, 0 replies; 44+ messages in thread
From: John Snow @ 2017-04-11 16:19 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: jcody



On 04/11/2017 12:57 AM, Paolo Bonzini wrote:
> 
> 
> On 11/04/2017 00:05, John Snow wrote:
>>
>>
>> On 04/08/2017 05:52 AM, Paolo Bonzini wrote:
>>>
>>>
>>> On 08/04/2017 08:03, John Snow wrote:
>>>> Looks clean, though it may be useful to do a few more things;
>>>>
>>>> - Demarcate what you think is the monitor API in this file
>>>
>>> It's already there:
>>>
>>> +/*
>>> + * API for block job drivers and the block layer.
>>> + */
>>> +
>>>
>>> where everything before is for the monitor.
>>>
>>
>> I meant explicitly, with a comment at the top explaining the demarcation.
> 
> Oh, sure.
> 
>>>> - Organize blockjob.h to match to serve as a useful reference.
>>>
>>> Hmm, yes.
> 
> As it turns out, no headers are necessary---but yours was a very good
> remark still, because a couple mistakes in this series stood out when
> checking.
> 
> The "API for block job drivers and the block layer" is already in
> blockjob_int.h while the rest is in blockjob.h.  This is nice because it
> provides some validation of the concept behind the patch, and also of
> the locking policy I chose for the rest of the work.
> 
> But, there are two exceptions.  Both of them are introduced by this
> series and they shouldn't be:
> 
> - blockjob_pause/resume_all should have its declaration in block_int.h,
> so I've fixed patch 4 accordingly
> 
> - blockjob_create is in blockjob_int.h, but this patch should move it to
> the second part of the file, too.
> 
> Thanks,
> 
> Paolo
> 

"You were wrong, but so was I, thanks for being wrong in a helpful way."

No problem.

--js

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2017-04-11 16:19 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-23 17:39 [Qemu-devel] [PATCH for-2.10 00/10] Preparation for block job mutex Paolo Bonzini
2017-03-23 17:39 ` [Qemu-devel] [PATCH 01/10] blockjob: remove unnecessary check Paolo Bonzini
2017-04-07 22:30   ` John Snow
2017-04-07 22:54   ` Philippe Mathieu-Daudé
2017-04-07 23:16     ` John Snow
2017-04-10  9:27   ` Stefan Hajnoczi
2017-03-23 17:39 ` [Qemu-devel] [PATCH 02/10] blockjob: remove iostatus_reset callback Paolo Bonzini
2017-04-07 23:12   ` John Snow
2017-04-08  0:18     ` Philippe Mathieu-Daudé
2017-04-10  9:27   ` Stefan Hajnoczi
2017-03-23 17:39 ` [Qemu-devel] [PATCH 03/10] blockjob: introduce block_job_fail Paolo Bonzini
2017-04-07 23:24   ` John Snow
2017-04-10  9:22   ` Stefan Hajnoczi
2017-04-11  4:00     ` Paolo Bonzini
2017-03-23 17:39 ` [Qemu-devel] [PATCH 04/10] blockjob: introduce block_job_pause/resume_all Paolo Bonzini
2017-04-07 23:37   ` John Snow
2017-04-10  9:26   ` Stefan Hajnoczi
2017-03-23 17:39 ` [Qemu-devel] [PATCH 05/10] blockjob: separate monitor and blockjob APIs Paolo Bonzini
2017-04-08  0:03   ` John Snow
2017-04-08  9:52     ` Paolo Bonzini
2017-04-10 16:05       ` John Snow
2017-04-11  4:57         ` Paolo Bonzini
2017-04-11 16:19           ` John Snow
2017-04-10  9:30   ` Stefan Hajnoczi
2017-03-23 17:39 ` [Qemu-devel] [PATCH 06/10] blockjob: move iostatus reset inside block_job_user_resume Paolo Bonzini
2017-04-08  0:28   ` John Snow
2017-04-10  9:33   ` Stefan Hajnoczi
2017-03-23 17:39 ` [Qemu-devel] [PATCH 07/10] blockjob: strengthen a bit test-blockjob-txn Paolo Bonzini
2017-04-08  0:37   ` John Snow
2017-04-10  9:35   ` Stefan Hajnoczi
2017-03-23 17:39 ` [Qemu-devel] [PATCH 08/10] blockjob: introduce block_job_cancel_async Paolo Bonzini
2017-04-08  1:13   ` John Snow
2017-04-08  9:54     ` Paolo Bonzini
2017-04-10  9:36   ` Stefan Hajnoczi
2017-03-23 17:39 ` [Qemu-devel] [PATCH 09/10] blockjob: reorganize block_job_completed_txn_abort Paolo Bonzini
2017-04-08  1:15   ` John Snow
2017-04-08 10:03     ` Paolo Bonzini
2017-04-10  9:44   ` Stefan Hajnoczi
2017-04-10 19:17   ` John Snow
2017-04-11  4:11     ` Paolo Bonzini
2017-03-23 17:39 ` [Qemu-devel] [PATCH 10/10] blockjob: use deferred_to_main_loop to indicate the coroutine has ended Paolo Bonzini
2017-04-08  1:32   ` John Snow
2017-04-08  9:56     ` Paolo Bonzini
2017-04-10  9:46   ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.