All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition
@ 2016-09-30 22:00 John Snow
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 01/11] blockjob: fix dead pointer in txn list John Snow
                   ` (11 more replies)
  0 siblings, 12 replies; 51+ messages in thread
From: John Snow @ 2016-09-30 22:00 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, vsementsov, famz, stefanha, jcody, eblake, qemu-devel, John Snow

There are a few problems with transactional job completion right now.

First, if jobs complete so quickly they complete before remaining jobs
get a chance to join the transaction, the completion mode can leave well
known state and the QLIST can get corrupted and the transactional jobs
can complete in batches or phases instead of all together.

Second, if two or more jobs defer to the main loop at roughly the same
time, it's possible for one job's cleanup to directly invoke the other
job's cleanup from within the same thread, leading to a situation that
will deadlock the entire transaction.

Thanks to Vladimir for pointing out these modes of failure.

This series also does a little digging into refactoring Jobs into public
and private interfaces. It's somewhat unrelated, but it was easier to
include this with this series than separate it out and send it later.
This comprises patches 2-6. The actual fixes here are in patches 1 and
7-10. A new test to catch Vladimir's failure scenario is in patch 11.

v2:
 - Lots of differences in patches 2-9.
 - Cancel should now work on an "unstarted" blockjob.
 - New refactoring patches.
 - Added "start" property for BlockJob Drivers.

________________________________________________________________________________

For convenience, this branch is available at:
https://github.com/jnsnow/qemu.git branch job-manual-start
https://github.com/jnsnow/qemu/tree/job-manual-start

This version is tagged job-manual-start-v2:
https://github.com/jnsnow/qemu/releases/tag/job-manual-start-v2

John Snow (10):
  blockjob: centralize QMP event emissions
  Blockjobs: Internalize user_pause logic
  blockjobs: Always use block_job_get_aio_context
  blockjobs: split interface into public/private
  blockjobs: fix documentation
  blockjob: add .clean property
  blockjob: add .start field
  blockjob: add block_job_start
  blockjob: refactor backup_start as backup_job_create
  iotests: add transactional failure race test

Vladimir Sementsov-Ogievskiy (1):
  blockjob: fix dead pointer in txn list

 block/backup.c               |  59 ++++---
 block/commit.c               |   6 +-
 block/io.c                   |   6 +-
 block/mirror.c               |   7 +-
 block/replication.c          |  13 +-
 block/stream.c               |   6 +-
 blockdev.c                   | 128 +++++++--------
 blockjob.c                   |  72 +++++++--
 include/block/block.h        |   3 +-
 include/block/block_int.h    |  29 ++--
 include/block/blockjob.h     | 345 +++-------------------------------------
 include/block/blockjob_int.h | 366 +++++++++++++++++++++++++++++++++++++++++++
 qemu-img.c                   |   4 +-
 tests/qemu-iotests/124       |  91 +++++++++++
 tests/qemu-iotests/124.out   |   4 +-
 tests/test-blockjob-txn.c    |  14 +-
 tests/test-blockjob.c        |   2 +-
 17 files changed, 688 insertions(+), 467 deletions(-)
 create mode 100644 include/block/blockjob_int.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v2 01/11] blockjob: fix dead pointer in txn list
  2016-09-30 22:00 [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition John Snow
@ 2016-09-30 22:00 ` John Snow
  2016-10-05 13:43   ` Kevin Wolf
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions John Snow
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 51+ messages in thread
From: John Snow @ 2016-09-30 22:00 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, vsementsov, famz, stefanha, jcody, eblake, qemu-devel, John Snow

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Though it is not intended to be reached through normal circumstances,
if we do not gracefully deconstruct the transaction QLIST, we may wind
up with stale pointers in the list.

The rest of this series attempts to address the underlying issues,
but this should fix list inconsistencies.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Tested-by: John Snow <jsnow@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
[Rewrote commit message. --js]
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

Signed-off-by: John Snow <jsnow@redhat.com>
---
 blockjob.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/blockjob.c b/blockjob.c
index a167f96..13e7134 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -220,6 +220,7 @@ static void block_job_completed_single(BlockJob *job)
     }
     job->cb(job->opaque, job->ret);
     if (job->txn) {
+        QLIST_REMOVE(job, txn_list);
         block_job_txn_unref(job->txn);
     }
     block_job_unref(job);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions
  2016-09-30 22:00 [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition John Snow
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 01/11] blockjob: fix dead pointer in txn list John Snow
@ 2016-09-30 22:00 ` John Snow
  2016-10-05 13:43   ` Kevin Wolf
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 03/11] Blockjobs: Internalize user_pause logic John Snow
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 51+ messages in thread
From: John Snow @ 2016-09-30 22:00 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, vsementsov, famz, stefanha, jcody, eblake, qemu-devel, John Snow

There's no reason to leave this to blockdev; we can do it in blockjobs
directly and get rid of an extra callback for most users.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 blockdev.c | 37 ++++++-------------------------------
 blockjob.c | 16 ++++++++++++++--
 2 files changed, 20 insertions(+), 33 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 29c6561..03200e7 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2957,31 +2957,6 @@ out:
     aio_context_release(aio_context);
 }
 
-static void block_job_cb(void *opaque, int ret)
-{
-    /* Note that this function may be executed from another AioContext besides
-     * the QEMU main loop.  If you need to access anything that assumes the
-     * QEMU global mutex, use a BH or introduce a mutex.
-     */
-
-    BlockDriverState *bs = opaque;
-    const char *msg = NULL;
-
-    trace_block_job_cb(bs, bs->job, ret);
-
-    assert(bs->job);
-
-    if (ret < 0) {
-        msg = strerror(-ret);
-    }
-
-    if (block_job_is_cancelled(bs->job)) {
-        block_job_event_cancelled(bs->job);
-    } else {
-        block_job_event_completed(bs->job, msg);
-    }
-}
-
 void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
                       bool has_base, const char *base,
                       bool has_backing_file, const char *backing_file,
@@ -3033,7 +3008,7 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
     base_name = has_backing_file ? backing_file : base_name;
 
     stream_start(has_job_id ? job_id : NULL, bs, base_bs, base_name,
-                 has_speed ? speed : 0, on_error, block_job_cb, bs, &local_err);
+                 has_speed ? speed : 0, on_error, NULL, bs, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         goto out;
@@ -3136,10 +3111,10 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
             goto out;
         }
         commit_active_start(has_job_id ? job_id : NULL, bs, base_bs, speed,
-                            on_error, block_job_cb, bs, &local_err, false);
+                            on_error, NULL, bs, &local_err, false);
     } else {
         commit_start(has_job_id ? job_id : NULL, bs, base_bs, top_bs, speed,
-                     on_error, block_job_cb, bs,
+                     on_error, NULL, bs,
                      has_backing_file ? backing_file : NULL, &local_err);
     }
     if (local_err != NULL) {
@@ -3260,7 +3235,7 @@ static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
 
     backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
                  bmap, backup->compress, backup->on_source_error,
-                 backup->on_target_error, block_job_cb, bs, txn, &local_err);
+                 backup->on_target_error, NULL, bs, txn, &local_err);
     bdrv_unref(target_bs);
     if (local_err != NULL) {
         error_propagate(errp, local_err);
@@ -3330,7 +3305,7 @@ void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
     }
     backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
                  NULL, backup->compress, backup->on_source_error,
-                 backup->on_target_error, block_job_cb, bs, txn, &local_err);
+                 backup->on_target_error, NULL, bs, txn, &local_err);
     if (local_err != NULL) {
         error_propagate(errp, local_err);
     }
@@ -3410,7 +3385,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
                  has_replaces ? replaces : NULL,
                  speed, granularity, buf_size, sync, backing_mode,
                  on_source_error, on_target_error, unmap,
-                 block_job_cb, bs, errp);
+                 NULL, bs, errp);
 }
 
 void qmp_drive_mirror(DriveMirror *arg, Error **errp)
diff --git a/blockjob.c b/blockjob.c
index 13e7134..6a300ba 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -124,7 +124,6 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     BlockBackend *blk;
     BlockJob *job;
 
-    assert(cb);
     if (bs->job) {
         error_setg(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bs));
         return NULL;
@@ -218,7 +217,20 @@ static void block_job_completed_single(BlockJob *job)
             job->driver->abort(job);
         }
     }
-    job->cb(job->opaque, job->ret);
+
+    if (job->cb) {
+        job->cb(job->opaque, job->ret);
+    }
+    if (block_job_is_cancelled(job)) {
+        block_job_event_cancelled(job);
+    } else {
+        const char *msg = NULL;
+        if (job->ret < 0) {
+            msg = strerror(-job->ret);
+        }
+        block_job_event_completed(job, msg);
+    }
+
     if (job->txn) {
         QLIST_REMOVE(job, txn_list);
         block_job_txn_unref(job->txn);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v2 03/11] Blockjobs: Internalize user_pause logic
  2016-09-30 22:00 [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition John Snow
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 01/11] blockjob: fix dead pointer in txn list John Snow
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions John Snow
@ 2016-09-30 22:00 ` John Snow
  2016-10-04  0:57   ` Jeff Cody
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 04/11] blockjobs: Always use block_job_get_aio_context John Snow
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 51+ messages in thread
From: John Snow @ 2016-09-30 22:00 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, vsementsov, famz, stefanha, jcody, eblake, qemu-devel, John Snow

BlockJobs will begin hiding their state in preparation for some
refactorings anyway, so let's internalize the user_pause mechanism
instead of leaving it to callers to correctly manage.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/io.c               |  2 +-
 blockdev.c               | 10 ++++------
 blockjob.c               | 16 ++++++++++++----
 include/block/blockjob.h | 11 ++++++++++-
 4 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/block/io.c b/block/io.c
index fdf7080..868b065 100644
--- a/block/io.c
+++ b/block/io.c
@@ -291,7 +291,7 @@ void bdrv_drain_all(void)
         AioContext *aio_context = blk_get_aio_context(job->blk);
 
         aio_context_acquire(aio_context);
-        block_job_pause(job);
+        block_job_pause(job, false);
         aio_context_release(aio_context);
     }
 
diff --git a/blockdev.c b/blockdev.c
index 03200e7..268452f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3629,7 +3629,7 @@ void qmp_block_job_cancel(const char *device,
         force = false;
     }
 
-    if (job->user_paused && !force) {
+    if (block_job_paused(job) && !force) {
         error_setg(errp, "The block job for device '%s' is currently paused",
                    device);
         goto out;
@@ -3646,13 +3646,12 @@ void qmp_block_job_pause(const char *device, Error **errp)
     AioContext *aio_context;
     BlockJob *job = find_block_job(device, &aio_context, errp);
 
-    if (!job || job->user_paused) {
+    if (!job || block_job_paused(job)) {
         return;
     }
 
-    job->user_paused = true;
     trace_qmp_block_job_pause(job);
-    block_job_pause(job);
+    block_job_pause(job, true);
     aio_context_release(aio_context);
 }
 
@@ -3661,11 +3660,10 @@ void qmp_block_job_resume(const char *device, Error **errp)
     AioContext *aio_context;
     BlockJob *job = find_block_job(device, &aio_context, errp);
 
-    if (!job || !job->user_paused) {
+    if (!job || !block_job_paused(job)) {
         return;
     }
 
-    job->user_paused = false;
     trace_qmp_block_job_resume(job);
     block_job_iostatus_reset(job);
     block_job_resume(job);
diff --git a/blockjob.c b/blockjob.c
index 6a300ba..2a35f50 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -104,7 +104,7 @@ static void block_job_detach_aio_context(void *opaque)
     /* In case the job terminates during aio_poll()... */
     block_job_ref(job);
 
-    block_job_pause(job);
+    block_job_pause(job, false);
 
     if (!job->paused) {
         /* If job is !job->busy this kicks it into the next pause point. */
@@ -343,9 +343,12 @@ void block_job_complete(BlockJob *job, Error **errp)
     job->driver->complete(job, errp);
 }
 
-void block_job_pause(BlockJob *job)
+void block_job_pause(BlockJob *job, bool user)
 {
     job->pause_count++;
+    if (user) {
+        job->user_paused = true;
+    }
 }
 
 static bool block_job_should_pause(BlockJob *job)
@@ -353,6 +356,11 @@ static bool block_job_should_pause(BlockJob *job)
     return job->pause_count > 0;
 }
 
+bool block_job_paused(BlockJob *job)
+{
+    return job ? job->user_paused : 0;
+}
+
 void coroutine_fn block_job_pause_point(BlockJob *job)
 {
     if (!block_job_should_pause(job)) {
@@ -386,6 +394,7 @@ void block_job_resume(BlockJob *job)
     if (job->pause_count) {
         return;
     }
+    job->user_paused = false;
     block_job_enter(job);
 }
 
@@ -592,8 +601,7 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
                                     action, &error_abort);
     if (action == BLOCK_ERROR_ACTION_STOP) {
         /* make the pause user visible, which will be resumed from QMP. */
-        job->user_paused = true;
-        block_job_pause(job);
+        block_job_pause(job, true);
         block_job_iostatus_set_err(job, error);
     }
     return action;
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 4ddb4ae..081f6c2 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -347,10 +347,19 @@ void coroutine_fn block_job_pause_point(BlockJob *job);
 /**
  * block_job_pause:
  * @job: The job to be paused.
+ * @user: Requested explicitly via user?
  *
  * Asynchronously pause the specified job.
  */
-void block_job_pause(BlockJob *job);
+void block_job_pause(BlockJob *job, bool user);
+
+/**
+ * block_job_paused:
+ * @job: The job to query.
+ *
+ * Returns true if the job is user-paused.
+ */
+bool block_job_paused(BlockJob *job);
 
 /**
  * block_job_resume:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v2 04/11] blockjobs: Always use block_job_get_aio_context
  2016-09-30 22:00 [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition John Snow
                   ` (2 preceding siblings ...)
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 03/11] Blockjobs: Internalize user_pause logic John Snow
@ 2016-09-30 22:00 ` John Snow
  2016-10-05 14:02   ` Kevin Wolf
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 05/11] blockjobs: split interface into public/private John Snow
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 51+ messages in thread
From: John Snow @ 2016-09-30 22:00 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, vsementsov, famz, stefanha, jcody, eblake, qemu-devel, John Snow

There are a few places where we're fishing it out for ourselves.
Let's not do that and instead use the helper.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/io.c               | 4 ++--
 blockdev.c               | 4 ++--
 blockjob.c               | 2 +-
 include/block/blockjob.h | 9 +++++++++
 qemu-img.c               | 2 +-
 5 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/block/io.c b/block/io.c
index 868b065..f26a503 100644
--- a/block/io.c
+++ b/block/io.c
@@ -288,7 +288,7 @@ void bdrv_drain_all(void)
     GSList *aio_ctxs = NULL, *ctx;
 
     while ((job = block_job_next(job))) {
-        AioContext *aio_context = blk_get_aio_context(job->blk);
+        AioContext *aio_context = block_job_get_aio_context(job);
 
         aio_context_acquire(aio_context);
         block_job_pause(job, false);
@@ -347,7 +347,7 @@ void bdrv_drain_all(void)
 
     job = NULL;
     while ((job = block_job_next(job))) {
-        AioContext *aio_context = blk_get_aio_context(job->blk);
+        AioContext *aio_context = block_job_get_aio_context(job);
 
         aio_context_acquire(aio_context);
         block_job_resume(job);
diff --git a/blockdev.c b/blockdev.c
index 268452f..0ac507f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3596,7 +3596,7 @@ static BlockJob *find_block_job(const char *id, AioContext **aio_context,
         return NULL;
     }
 
-    *aio_context = blk_get_aio_context(job->blk);
+    *aio_context = block_job_get_aio_context(job);
     aio_context_acquire(*aio_context);
 
     return job;
@@ -3956,7 +3956,7 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 
     for (job = block_job_next(NULL); job; job = block_job_next(job)) {
         BlockJobInfoList *elem = g_new0(BlockJobInfoList, 1);
-        AioContext *aio_context = blk_get_aio_context(job->blk);
+        AioContext *aio_context = block_job_get_aio_context(job);
 
         aio_context_acquire(aio_context);
         elem->value = block_job_query(job);
diff --git a/blockjob.c b/blockjob.c
index 2a35f50..073d9ce 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -78,7 +78,7 @@ BlockJob *block_job_get(const char *id)
  * block_job_defer_to_main_loop() where it runs in the QEMU main loop.  Code
  * that supports both cases uses this helper function.
  */
-static AioContext *block_job_get_aio_context(BlockJob *job)
+AioContext *block_job_get_aio_context(BlockJob *job)
 {
     return job->deferred_to_main_loop ?
            qemu_get_aio_context() :
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 081f6c2..6f28c73 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -513,4 +513,13 @@ void block_job_txn_unref(BlockJobTxn *txn);
  */
 void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job);
 
+/**
+ * block_job_get_aio_context:
+ * @job: Job to get the aio_context for
+ *
+ * Fetch the current context for the given BlockJob. May be the main loop if
+ * the job has already deferred to main for final cleanup.
+ */
+AioContext *block_job_get_aio_context(BlockJob *job);
+
 #endif
diff --git a/qemu-img.c b/qemu-img.c
index ceffefe..204fa9c 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -793,7 +793,7 @@ static void common_block_job_cb(void *opaque, int ret)
 
 static void run_block_job(BlockJob *job, Error **errp)
 {
-    AioContext *aio_context = blk_get_aio_context(job->blk);
+    AioContext *aio_context = block_job_get_aio_context(job);
 
     do {
         aio_poll(aio_context, true);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v2 05/11] blockjobs: split interface into public/private
  2016-09-30 22:00 [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition John Snow
                   ` (3 preceding siblings ...)
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 04/11] blockjobs: Always use block_job_get_aio_context John Snow
@ 2016-09-30 22:00 ` John Snow
  2016-10-05 14:17   ` Kevin Wolf
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 06/11] blockjobs: fix documentation John Snow
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 51+ messages in thread
From: John Snow @ 2016-09-30 22:00 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, vsementsov, famz, stefanha, jcody, eblake, qemu-devel, John Snow

To make it a little more obvious which functions are intended to be
public interface and which are intended to be for use only by jobs
themselves, split the interface into "public" and "private" files.

Convert blockjobs (e.g. block/backup) to using the private interface.
Leave blockdev and others on the public interface.

Give up and let qemu-img use the internal interface, though it doesn't
strictly need to be using it.

As a side-effect, hide the BlockJob and BlockJobDriver implementation
from most of the QEMU codebase.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c               |   2 +-
 block/commit.c               |   2 +-
 block/mirror.c               |   2 +-
 block/replication.c          |   2 +-
 block/stream.c               |   2 +-
 blockjob.c                   |   2 +-
 include/block/block.h        |   3 +-
 include/block/blockjob.h     | 325 +--------------------------------------
 include/block/blockjob_int.h | 355 +++++++++++++++++++++++++++++++++++++++++++
 qemu-img.c                   |   2 +-
 tests/test-blockjob-txn.c    |   2 +-
 tests/test-blockjob.c        |   2 +-
 12 files changed, 368 insertions(+), 333 deletions(-)
 create mode 100644 include/block/blockjob_int.h

diff --git a/block/backup.c b/block/backup.c
index 582bd0f..d667482 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -16,7 +16,7 @@
 #include "trace.h"
 #include "block/block.h"
 #include "block/block_int.h"
-#include "block/blockjob.h"
+#include "block/blockjob_int.h"
 #include "block/block_backup.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
diff --git a/block/commit.c b/block/commit.c
index 9f67a8b..fb5bede 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -15,7 +15,7 @@
 #include "qemu/osdep.h"
 #include "trace.h"
 #include "block/block_int.h"
-#include "block/blockjob.h"
+#include "block/blockjob_int.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/ratelimit.h"
diff --git a/block/mirror.c b/block/mirror.c
index f9d1fec..cc62fb0 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -13,7 +13,7 @@
 
 #include "qemu/osdep.h"
 #include "trace.h"
-#include "block/blockjob.h"
+#include "block/blockjob_int.h"
 #include "block/block_int.h"
 #include "sysemu/block-backend.h"
 #include "qapi/error.h"
diff --git a/block/replication.c b/block/replication.c
index 3bd1cf1..b604b93 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -15,7 +15,7 @@
 #include "qemu/osdep.h"
 #include "qemu-common.h"
 #include "block/nbd.h"
-#include "block/blockjob.h"
+#include "block/blockjob_int.h"
 #include "block/block_int.h"
 #include "block/block_backup.h"
 #include "sysemu/block-backend.h"
diff --git a/block/stream.c b/block/stream.c
index 3187481..71d0e7a 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -14,7 +14,7 @@
 #include "qemu/osdep.h"
 #include "trace.h"
 #include "block/block_int.h"
-#include "block/blockjob.h"
+#include "block/blockjob_int.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/ratelimit.h"
diff --git a/blockjob.c b/blockjob.c
index 073d9ce..09fb602 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -27,7 +27,7 @@
 #include "qemu-common.h"
 #include "trace.h"
 #include "block/block.h"
-#include "block/blockjob.h"
+#include "block/blockjob_int.h"
 #include "block/block_int.h"
 #include "sysemu/block-backend.h"
 #include "qapi/qmp/qerror.h"
diff --git a/include/block/block.h b/include/block/block.h
index e18233a..f556345 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -7,16 +7,15 @@
 #include "qemu/coroutine.h"
 #include "block/accounting.h"
 #include "block/dirty-bitmap.h"
+#include "block/blockjob.h"
 #include "qapi/qmp/qobject.h"
 #include "qapi-types.h"
 #include "qemu/hbitmap.h"
 
 /* block.c */
 typedef struct BlockDriver BlockDriver;
-typedef struct BlockJob BlockJob;
 typedef struct BdrvChild BdrvChild;
 typedef struct BdrvChildRole BdrvChildRole;
-typedef struct BlockJobTxn BlockJobTxn;
 
 typedef struct BlockDriverInfo {
     /* in bytes, 0 if irrelevant */
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 6f28c73..efae26b 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -28,177 +28,9 @@
 
 #include "block/block.h"
 
-/**
- * BlockJobDriver:
- *
- * A class type for block job driver.
- */
-typedef struct BlockJobDriver {
-    /** Derived BlockJob struct size */
-    size_t instance_size;
-
-    /** String describing the operation, part of query-block-jobs QMP API */
-    BlockJobType job_type;
-
-    /** Optional callback for job types that support setting a speed limit */
-    void (*set_speed)(BlockJob *job, int64_t speed, Error **errp);
-
-    /** Optional callback for job types that need to forward I/O status reset */
-    void (*iostatus_reset)(BlockJob *job);
-
-    /**
-     * Optional callback for job types whose completion must be triggered
-     * manually.
-     */
-    void (*complete)(BlockJob *job, Error **errp);
-
-    /**
-     * If the callback is not NULL, it will be invoked when all the jobs
-     * belonging to the same transaction complete; or upon this job's
-     * completion if it is not in a transaction. Skipped if NULL.
-     *
-     * All jobs will complete with a call to either .commit() or .abort() but
-     * never both.
-     */
-    void (*commit)(BlockJob *job);
-
-    /**
-     * If the callback is not NULL, it will be invoked when any job in the
-     * same transaction fails; or upon this job's failure (due to error or
-     * cancellation) if it is not in a transaction. Skipped if NULL.
-     *
-     * All jobs will complete with a call to either .commit() or .abort() but
-     * never both.
-     */
-    void (*abort)(BlockJob *job);
-
-    /**
-     * If the callback is not NULL, it will be invoked when the job transitions
-     * into the paused state.  Paused jobs must not perform any asynchronous
-     * I/O or event loop activity.  This callback is used to quiesce jobs.
-     */
-    void coroutine_fn (*pause)(BlockJob *job);
-
-    /**
-     * If the callback is not NULL, it will be invoked when the job transitions
-     * out of the paused state.  Any asynchronous I/O or event loop activity
-     * should be restarted from this callback.
-     */
-    void coroutine_fn (*resume)(BlockJob *job);
-
-    /*
-     * If the callback is not NULL, it will be invoked before the job is
-     * resumed in a new AioContext.  This is the place to move any resources
-     * besides job->blk to the new AioContext.
-     */
-    void (*attached_aio_context)(BlockJob *job, AioContext *new_context);
-} BlockJobDriver;
-
-/**
- * BlockJob:
- *
- * Long-running operation on a BlockDriverState.
- */
-struct BlockJob {
-    /** The job type, including the job vtable.  */
-    const BlockJobDriver *driver;
-
-    /** The block device on which the job is operating.  */
-    BlockBackend *blk;
-
-    /**
-     * The ID of the block job.
-     */
-    char *id;
-
-    /**
-     * The coroutine that executes the job.  If not NULL, it is
-     * reentered when busy is false and the job is cancelled.
-     */
-    Coroutine *co;
-
-    /**
-     * Set to true if the job should cancel itself.  The flag must
-     * always be tested just before toggling the busy flag from false
-     * to true.  After a job has been cancelled, it should only yield
-     * if #aio_poll will ("sooner or later") reenter the coroutine.
-     */
-    bool cancelled;
-
-    /**
-     * Counter for pause request. If non-zero, the block job is either paused,
-     * or if busy == true will pause itself as soon as possible.
-     */
-    int pause_count;
-
-    /**
-     * Set to true if the job is paused by user.  Can be unpaused with the
-     * block-job-resume QMP command.
-     */
-    bool user_paused;
-
-    /**
-     * Set to false by the job while the coroutine has yielded and may be
-     * re-entered by block_job_enter().  There may still be I/O or event loop
-     * activity pending.
-     */
-    bool busy;
-
-    /**
-     * Set to true by the job while it is in a quiescent state, where
-     * no I/O or event loop activity is pending.
-     */
-    bool paused;
-
-    /**
-     * Set to true when the job is ready to be completed.
-     */
-    bool ready;
-
-    /**
-     * Set to true when the job has deferred work to the main loop.
-     */
-    bool deferred_to_main_loop;
-
-    /** Element of the list of block jobs */
-    QLIST_ENTRY(BlockJob) job_list;
-
-    /** Status that is published by the query-block-jobs QMP API */
-    BlockDeviceIoStatus iostatus;
-
-    /** Offset that is published by the query-block-jobs QMP API */
-    int64_t offset;
-
-    /** Length that is published by the query-block-jobs QMP API */
-    int64_t len;
-
-    /** Speed that was set with @block_job_set_speed.  */
-    int64_t speed;
-
-    /** The completion function that will be called when the job completes.  */
-    BlockCompletionFunc *cb;
-
-    /** Block other operations when block job is running */
-    Error *blocker;
-
-    /** The opaque value that is passed to the completion function.  */
-    void *opaque;
-
-    /** Reference count of the block job */
-    int refcnt;
-
-    /* True if this job has reported completion by calling block_job_completed.
-     */
-    bool completed;
-
-    /* ret code passed to block_job_completed.
-     */
-    int ret;
-
-    /** Non-NULL if this job is part of a transaction */
-    BlockJobTxn *txn;
-    QLIST_ENTRY(BlockJob) txn_list;
-};
+typedef struct BlockJobDriver BlockJobDriver;
+typedef struct BlockJob BlockJob;
+typedef struct BlockJobTxn BlockJobTxn;
 
 /**
  * block_job_next:
@@ -222,76 +54,6 @@ BlockJob *block_job_next(BlockJob *job);
 BlockJob *block_job_get(const char *id);
 
 /**
- * block_job_create:
- * @job_id: The id of the newly-created job, or %NULL to have one
- * generated automatically.
- * @job_type: The class object for the newly-created job.
- * @bs: The block
- * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
- * @cb: Completion function for the job.
- * @opaque: Opaque pointer value passed to @cb.
- * @errp: Error object.
- *
- * Create a new long-running block device job and return it.  The job
- * will call @cb asynchronously when the job completes.  Note that
- * @bs may have been closed at the time the @cb it is called.  If
- * this is the case, the job may be reported as either cancelled or
- * completed.
- *
- * This function is not part of the public job interface; it should be
- * called from a wrapper that is specific to the job type.
- */
-void *block_job_create(const char *job_id, const BlockJobDriver *driver,
-                       BlockDriverState *bs, int64_t speed,
-                       BlockCompletionFunc *cb, void *opaque, Error **errp);
-
-/**
- * block_job_sleep_ns:
- * @job: The job that calls the function.
- * @clock: The clock to sleep on.
- * @ns: How many nanoseconds to stop for.
- *
- * Put the job to sleep (assuming that it wasn't canceled) for @ns
- * nanoseconds.  Canceling the job will interrupt the wait immediately.
- */
-void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
-
-/**
- * block_job_yield:
- * @job: The job that calls the function.
- *
- * Yield the block job coroutine.
- */
-void block_job_yield(BlockJob *job);
-
-/**
- * block_job_ref:
- * @bs: The block device.
- *
- * Grab a reference to the block job. Should be paired with block_job_unref.
- */
-void block_job_ref(BlockJob *job);
-
-/**
- * block_job_unref:
- * @bs: The block device.
- *
- * Release reference to the block job and release resources if it is the last
- * reference.
- */
-void block_job_unref(BlockJob *job);
-
-/**
- * block_job_completed:
- * @job: The job being completed.
- * @ret: The status code.
- *
- * Call the completion function that was registered at creation time, and
- * free @job.
- */
-void block_job_completed(BlockJob *job, int ret);
-
-/**
  * block_job_set_speed:
  * @job: The job to set the speed for.
  * @speed: The new value
@@ -320,14 +82,6 @@ void block_job_cancel(BlockJob *job);
 void block_job_complete(BlockJob *job, Error **errp);
 
 /**
- * block_job_is_cancelled:
- * @job: The job being queried.
- *
- * Returns whether the job is scheduled for cancellation.
- */
-bool block_job_is_cancelled(BlockJob *job);
-
-/**
  * block_job_query:
  * @job: The job to get information about.
  *
@@ -336,15 +90,6 @@ bool block_job_is_cancelled(BlockJob *job);
 BlockJobInfo *block_job_query(BlockJob *job);
 
 /**
- * block_job_pause_point:
- * @job: The job that is ready to pause.
- *
- * Pause now if block_job_pause() has been called.  Block jobs that perform
- * lots of I/O must call this between requests so that the job can be paused.
- */
-void coroutine_fn block_job_pause_point(BlockJob *job);
-
-/**
  * block_job_pause:
  * @job: The job to be paused.
  * @user: Requested explicitly via user?
@@ -370,39 +115,6 @@ bool block_job_paused(BlockJob *job);
 void block_job_resume(BlockJob *job);
 
 /**
- * block_job_enter:
- * @job: The job to enter.
- *
- * Continue the specified job by entering the coroutine.
- */
-void block_job_enter(BlockJob *job);
-
-/**
- * block_job_event_cancelled:
- * @job: The job whose information is requested.
- *
- * Send a BLOCK_JOB_CANCELLED event for the specified job.
- */
-void block_job_event_cancelled(BlockJob *job);
-
-/**
- * block_job_ready:
- * @job: The job which is now ready to complete.
- * @msg: Error message. Only present on failure.
- *
- * Send a BLOCK_JOB_COMPLETED event for the specified job.
- */
-void block_job_event_completed(BlockJob *job, const char *msg);
-
-/**
- * block_job_ready:
- * @job: The job which is now ready to complete.
- *
- * Send a BLOCK_JOB_READY event for the specified job.
- */
-void block_job_event_ready(BlockJob *job);
-
-/**
  * block_job_cancel_sync:
  * @job: The job to be canceled.
  *
@@ -448,37 +160,6 @@ int block_job_complete_sync(BlockJob *job, Error **errp);
 void block_job_iostatus_reset(BlockJob *job);
 
 /**
- * block_job_error_action:
- * @job: The job to signal an error for.
- * @on_err: The error action setting.
- * @is_read: Whether the operation was a read.
- * @error: The error that was reported.
- *
- * Report an I/O error for a block job and possibly stop the VM.  Return the
- * action that was selected based on @on_err and @error.
- */
-BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
-                                        int is_read, int error);
-
-typedef void BlockJobDeferToMainLoopFn(BlockJob *job, void *opaque);
-
-/**
- * block_job_defer_to_main_loop:
- * @job: The job
- * @fn: The function to run in the main loop
- * @opaque: The opaque value that is passed to @fn
- *
- * Execute a given function in the main loop with the BlockDriverState
- * AioContext acquired.  Block jobs must call bdrv_unref(), bdrv_close(), and
- * anything that uses bdrv_drain_all() in the main loop.
- *
- * The @job AioContext is held while @fn executes.
- */
-void block_job_defer_to_main_loop(BlockJob *job,
-                                  BlockJobDeferToMainLoopFn *fn,
-                                  void *opaque);
-
-/**
  * block_job_txn_new:
  *
  * Allocate and return a new block job transaction.  Jobs can be added to the
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
new file mode 100644
index 0000000..0a2d41e
--- /dev/null
+++ b/include/block/blockjob_int.h
@@ -0,0 +1,355 @@
+/*
+ * Declarations for long-running block device operations
+ *
+ * Copyright (c) 2011 IBM Corp.
+ * Copyright (c) 2012 Red Hat, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef BLOCKJOB_INT_H
+#define BLOCKJOB_INT_H
+
+#include "block/blockjob.h"
+#include "block/block.h"
+
+/**
+ * BlockJobDriver:
+ *
+ * A class type for block job driver.
+ */
+struct BlockJobDriver {
+    /** Derived BlockJob struct size */
+    size_t instance_size;
+
+    /** String describing the operation, part of query-block-jobs QMP API */
+    BlockJobType job_type;
+
+    /** Optional callback for job types that support setting a speed limit */
+    void (*set_speed)(BlockJob *job, int64_t speed, Error **errp);
+
+    /** Optional callback for job types that need to forward I/O status reset */
+    void (*iostatus_reset)(BlockJob *job);
+
+    /**
+     * Optional callback for job types whose completion must be triggered
+     * manually.
+     */
+    void (*complete)(BlockJob *job, Error **errp);
+
+    /**
+     * If the callback is not NULL, it will be invoked when all the jobs
+     * belonging to the same transaction complete; or upon this job's
+     * completion if it is not in a transaction. Skipped if NULL.
+     *
+     * All jobs will complete with a call to either .commit() or .abort() but
+     * never both.
+     */
+    void (*commit)(BlockJob *job);
+
+    /**
+     * If the callback is not NULL, it will be invoked when any job in the
+     * same transaction fails; or upon this job's failure (due to error or
+     * cancellation) if it is not in a transaction. Skipped if NULL.
+     *
+     * All jobs will complete with a call to either .commit() or .abort() but
+     * never both.
+     */
+    void (*abort)(BlockJob *job);
+
+    /**
+     * If the callback is not NULL, it will be invoked when the job transitions
+     * into the paused state.  Paused jobs must not perform any asynchronous
+     * I/O or event loop activity.  This callback is used to quiesce jobs.
+     */
+    void coroutine_fn (*pause)(BlockJob *job);
+
+    /**
+     * If the callback is not NULL, it will be invoked when the job transitions
+     * out of the paused state.  Any asynchronous I/O or event loop activity
+     * should be restarted from this callback.
+     */
+    void coroutine_fn (*resume)(BlockJob *job);
+
+    /*
+     * If the callback is not NULL, it will be invoked before the job is
+     * resumed in a new AioContext.  This is the place to move any resources
+     * besides job->blk to the new AioContext.
+     */
+    void (*attached_aio_context)(BlockJob *job, AioContext *new_context);
+};
+
+/**
+ * BlockJob:
+ *
+ * Long-running operation on a BlockDriverState.
+ */
+struct BlockJob {
+    /** The job type, including the job vtable.  */
+    const BlockJobDriver *driver;
+
+    /** The block device on which the job is operating.  */
+    BlockBackend *blk;
+
+    /**
+     * The ID of the block job.
+     */
+    char *id;
+
+    /**
+     * The coroutine that executes the job.  If not NULL, it is
+     * reentered when busy is false and the job is cancelled.
+     */
+    Coroutine *co;
+
+    /**
+     * Set to true if the job should cancel itself.  The flag must
+     * always be tested just before toggling the busy flag from false
+     * to true.  After a job has been cancelled, it should only yield
+     * if #aio_poll will ("sooner or later") reenter the coroutine.
+     */
+    bool cancelled;
+
+    /**
+     * Counter for pause request. If non-zero, the block job is either paused,
+     * or if busy == true will pause itself as soon as possible.
+     */
+    int pause_count;
+
+    /**
+     * Set to true if the job is paused by user.  Can be unpaused with the
+     * block-job-resume QMP command.
+     */
+    bool user_paused;
+
+    /**
+     * Set to false by the job while the coroutine has yielded and may be
+     * re-entered by block_job_enter().  There may still be I/O or event loop
+     * activity pending.
+     */
+    bool busy;
+
+    /**
+     * Set to true by the job while it is in a quiescent state, where
+     * no I/O or event loop activity is pending.
+     */
+    bool paused;
+
+    /**
+     * Set to true when the job is ready to be completed.
+     */
+    bool ready;
+
+    /**
+     * Set to true when the job has deferred work to the main loop.
+     */
+    bool deferred_to_main_loop;
+
+    /** Element of the list of block jobs */
+    QLIST_ENTRY(BlockJob) job_list;
+
+    /** Status that is published by the query-block-jobs QMP API */
+    BlockDeviceIoStatus iostatus;
+
+    /** Offset that is published by the query-block-jobs QMP API */
+    int64_t offset;
+
+    /** Length that is published by the query-block-jobs QMP API */
+    int64_t len;
+
+    /** Speed that was set with @block_job_set_speed.  */
+    int64_t speed;
+
+    /** The completion function that will be called when the job completes.  */
+    BlockCompletionFunc *cb;
+
+    /** Block other operations when block job is running */
+    Error *blocker;
+
+    /** The opaque value that is passed to the completion function.  */
+    void *opaque;
+
+    /** Reference count of the block job */
+    int refcnt;
+
+    /* True if this job has reported completion by calling block_job_completed.
+     */
+    bool completed;
+
+    /* ret code passed to block_job_completed.
+     */
+    int ret;
+
+    /** Non-NULL if this job is part of a transaction */
+    BlockJobTxn *txn;
+    QLIST_ENTRY(BlockJob) txn_list;
+};
+
+/**
+ * block_job_create:
+ * @job_id: The id of the newly-created job, or %NULL to have one
+ * generated automatically.
+ * @job_type: The class object for the newly-created job.
+ * @bs: The block
+ * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
+ * @cb: Completion function for the job.
+ * @opaque: Opaque pointer value passed to @cb.
+ * @errp: Error object.
+ *
+ * Create a new long-running block device job and return it.  The job
+ * will call @cb asynchronously when the job completes.  Note that
+ * @bs may have been closed at the time the @cb it is called.  If
+ * this is the case, the job may be reported as either cancelled or
+ * completed.
+ *
+ * This function is not part of the public job interface; it should be
+ * called from a wrapper that is specific to the job type.
+ */
+void *block_job_create(const char *job_id, const BlockJobDriver *driver,
+                       BlockDriverState *bs, int64_t speed,
+                       BlockCompletionFunc *cb, void *opaque, Error **errp);
+
+/**
+ * block_job_sleep_ns:
+ * @job: The job that calls the function.
+ * @clock: The clock to sleep on.
+ * @ns: How many nanoseconds to stop for.
+ *
+ * Put the job to sleep (assuming that it wasn't canceled) for @ns
+ * nanoseconds.  Canceling the job will interrupt the wait immediately.
+ */
+void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
+
+/**
+ * block_job_yield:
+ * @job: The job that calls the function.
+ *
+ * Yield the block job coroutine.
+ */
+void block_job_yield(BlockJob *job);
+
+/**
+ * block_job_ref:
+ * @bs: The block device.
+ *
+ * Grab a reference to the block job. Should be paired with block_job_unref.
+ */
+void block_job_ref(BlockJob *job);
+
+/**
+ * block_job_unref:
+ * @bs: The block device.
+ *
+ * Release reference to the block job and release resources if it is the last
+ * reference.
+ */
+void block_job_unref(BlockJob *job);
+
+/**
+ * block_job_completed:
+ * @job: The job being completed.
+ * @ret: The status code.
+ *
+ * Call the completion function that was registered at creation time, and
+ * free @job.
+ */
+void block_job_completed(BlockJob *job, int ret);
+
+/**
+ * block_job_is_cancelled:
+ * @job: The job being queried.
+ *
+ * Returns whether the job is scheduled for cancellation.
+ */
+bool block_job_is_cancelled(BlockJob *job);
+
+/**
+ * block_job_pause_point:
+ * @job: The job that is ready to pause.
+ *
+ * Pause now if block_job_pause() has been called.  Block jobs that perform
+ * lots of I/O must call this between requests so that the job can be paused.
+ */
+void coroutine_fn block_job_pause_point(BlockJob *job);
+
+/**
+ * block_job_enter:
+ * @job: The job to enter.
+ *
+ * Continue the specified job by entering the coroutine.
+ */
+void block_job_enter(BlockJob *job);
+
+/**
+ * block_job_event_cancelled:
+ * @job: The job whose information is requested.
+ *
+ * Send a BLOCK_JOB_CANCELLED event for the specified job.
+ */
+void block_job_event_cancelled(BlockJob *job);
+
+/**
+ * block_job_ready:
+ * @job: The job which is now ready to complete.
+ * @msg: Error message. Only present on failure.
+ *
+ * Send a BLOCK_JOB_COMPLETED event for the specified job.
+ */
+void block_job_event_completed(BlockJob *job, const char *msg);
+
+/**
+ * block_job_ready:
+ * @job: The job which is now ready to complete.
+ *
+ * Send a BLOCK_JOB_READY event for the specified job.
+ */
+void block_job_event_ready(BlockJob *job);
+
+/**
+ * block_job_error_action:
+ * @job: The job to signal an error for.
+ * @on_err: The error action setting.
+ * @is_read: Whether the operation was a read.
+ * @error: The error that was reported.
+ *
+ * Report an I/O error for a block job and possibly stop the VM.  Return the
+ * action that was selected based on @on_err and @error.
+ */
+BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
+                                        int is_read, int error);
+
+typedef void BlockJobDeferToMainLoopFn(BlockJob *job, void *opaque);
+
+/**
+ * block_job_defer_to_main_loop:
+ * @job: The job
+ * @fn: The function to run in the main loop
+ * @opaque: The opaque value that is passed to @fn
+ *
+ * Execute a given function in the main loop with the BlockDriverState
+ * AioContext acquired.  Block jobs must call bdrv_unref(), bdrv_close(), and
+ * anything that uses bdrv_drain_all() in the main loop.
+ *
+ * The @job AioContext is held while @fn executes.
+ */
+void block_job_defer_to_main_loop(BlockJob *job,
+                                  BlockJobDeferToMainLoopFn *fn,
+                                  void *opaque);
+
+#endif
diff --git a/qemu-img.c b/qemu-img.c
index 204fa9c..5b8c4c4 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -37,7 +37,7 @@
 #include "sysemu/sysemu.h"
 #include "sysemu/block-backend.h"
 #include "block/block_int.h"
-#include "block/blockjob.h"
+#include "block/blockjob_int.h"
 #include "block/qapi.h"
 #include "crypto/init.h"
 #include "trace/control.h"
diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
index d049cba..736e18f 100644
--- a/tests/test-blockjob-txn.c
+++ b/tests/test-blockjob-txn.c
@@ -13,7 +13,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qemu/main-loop.h"
-#include "block/blockjob.h"
+#include "block/blockjob_int.h"
 #include "sysemu/block-backend.h"
 
 typedef struct {
diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
index 5b0e934..5b25dfe 100644
--- a/tests/test-blockjob.c
+++ b/tests/test-blockjob.c
@@ -13,7 +13,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qemu/main-loop.h"
-#include "block/blockjob.h"
+#include "block/blockjob_int.h"
 #include "sysemu/block-backend.h"
 
 static const BlockJobDriver test_block_job_driver = {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v2 06/11] blockjobs: fix documentation
  2016-09-30 22:00 [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition John Snow
                   ` (4 preceding siblings ...)
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 05/11] blockjobs: split interface into public/private John Snow
@ 2016-09-30 22:00 ` John Snow
  2016-10-05 15:03   ` Kevin Wolf
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 07/11] blockjob: add .clean property John Snow
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 51+ messages in thread
From: John Snow @ 2016-09-30 22:00 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, vsementsov, famz, stefanha, jcody, eblake, qemu-devel, John Snow

Wrong function names in documentation.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 include/block/blockjob_int.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 0a2d41e..c6da7e4 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -305,7 +305,7 @@ void block_job_enter(BlockJob *job);
 void block_job_event_cancelled(BlockJob *job);
 
 /**
- * block_job_ready:
+ * block_job_event_completed:
  * @job: The job which is now ready to complete.
  * @msg: Error message. Only present on failure.
  *
@@ -314,8 +314,8 @@ void block_job_event_cancelled(BlockJob *job);
 void block_job_event_completed(BlockJob *job, const char *msg);
 
 /**
- * block_job_ready:
- * @job: The job which is now ready to complete.
+ * block_job_event_ready:
+ * @job: The job which is now ready to be completed.
  *
  * Send a BLOCK_JOB_READY event for the specified job.
  */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v2 07/11] blockjob: add .clean property
  2016-09-30 22:00 [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition John Snow
                   ` (5 preceding siblings ...)
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 06/11] blockjobs: fix documentation John Snow
@ 2016-09-30 22:00 ` John Snow
  2016-10-12 11:11   ` Vladimir Sementsov-Ogievskiy
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 08/11] blockjob: add .start field John Snow
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 51+ messages in thread
From: John Snow @ 2016-09-30 22:00 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, vsementsov, famz, stefanha, jcody, eblake, qemu-devel, John Snow

Cleaning up after we have deferred to the main thread but before the
transaction has converged can be dangerous and result in deadlocks
if the job cleanup invokes any BH polling loops.

A job may attempt to begin cleaning up, but may induce another job to
enter its cleanup routine. The second job, part of our same transaction,
will block waiting for the first job to finish, so neither job may now
make progress.

To rectify this, allow jobs to register a cleanup operation that will
always run regardless of if the job was in a transaction or not, and
if the transaction job group completed successfully or not.

Move sensitive cleanup to this callback instead which is guaranteed to
be run only after the transaction has converged, which removes sensitive
timing constraints from said cleanup.

Furthermore, in future patches these cleanup operations will be performed
regardless of whether or not we actually started the job. Therefore,
cleanup callbacks should essentially confine themselves to undoing create
operations, e.g. setup actions taken in what is now backup_run.

Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c               | 11 ++++++++---
 blockjob.c                   |  5 +++--
 include/block/blockjob_int.h |  8 ++++++++
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index d667482..42ff4c0 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -242,6 +242,13 @@ static void backup_abort(BlockJob *job)
     }
 }
 
+static void backup_clean(BlockJob *job)
+{
+    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
+    assert(s->target);
+    blk_unref(s->target);
+}
+
 static void backup_attached_aio_context(BlockJob *job, AioContext *aio_context)
 {
     BackupBlockJob *s = container_of(job, BackupBlockJob, common);
@@ -306,6 +313,7 @@ static const BlockJobDriver backup_job_driver = {
     .set_speed              = backup_set_speed,
     .commit                 = backup_commit,
     .abort                  = backup_abort,
+    .clean                  = backup_clean,
     .attached_aio_context   = backup_attached_aio_context,
 };
 
@@ -327,11 +335,8 @@ typedef struct {
 
 static void backup_complete(BlockJob *job, void *opaque)
 {
-    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
     BackupCompleteData *data = opaque;
 
-    blk_unref(s->target);
-
     block_job_completed(job, data->ret);
     g_free(data);
 }
diff --git a/blockjob.c b/blockjob.c
index 09fb602..44cbf6c 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -217,7 +217,9 @@ static void block_job_completed_single(BlockJob *job)
             job->driver->abort(job);
         }
     }
-
+    if (job->driver->clean) {
+        job->driver->clean(job);
+    }
     if (job->cb) {
         job->cb(job->opaque, job->ret);
     }
@@ -230,7 +232,6 @@ static void block_job_completed_single(BlockJob *job)
         }
         block_job_event_completed(job, msg);
     }
-
     if (job->txn) {
         QLIST_REMOVE(job, txn_list);
         block_job_txn_unref(job->txn);
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index c6da7e4..b7aeaef 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -74,6 +74,14 @@ struct BlockJobDriver {
     void (*abort)(BlockJob *job);
 
     /**
+     * If the callback is not NULL, it will be invoked after a call to either
+     * .commit() or .abort(). Regardless of which callback is invoked after
+     * completion, .clean() will always be called, even if the job does not
+     * belong to a transaction group.
+     */
+    void (*clean)(BlockJob *job);
+
+    /**
      * If the callback is not NULL, it will be invoked when the job transitions
      * into the paused state.  Paused jobs must not perform any asynchronous
      * I/O or event loop activity.  This callback is used to quiesce jobs.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v2 08/11] blockjob: add .start field
  2016-09-30 22:00 [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition John Snow
                   ` (6 preceding siblings ...)
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 07/11] blockjob: add .clean property John Snow
@ 2016-09-30 22:00 ` John Snow
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 09/11] blockjob: add block_job_start John Snow
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 51+ messages in thread
From: John Snow @ 2016-09-30 22:00 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, vsementsov, famz, stefanha, jcody, eblake, qemu-devel, John Snow

Add an explicit start field to specify the entrypoint. We already have
ownership of the coroutine itself AND managing the lifetime of the
coroutine, let's take control of creation of the coroutine, too.

This will allow us to delay creation of the actual coroutine until we
know we'll actually start a BlockJob in block_job_start. This avoids
the sticky question of how to "un-create" a Coroutine that hasn't been
started yet.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c               | 23 ++++++++++++-----------
 block/commit.c               |  3 ++-
 block/mirror.c               |  4 +++-
 block/stream.c               |  3 ++-
 include/block/blockjob_int.h |  3 +++
 5 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 42ff4c0..58dfc58 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -307,16 +307,6 @@ void backup_cow_request_end(CowRequest *req)
     cow_request_end(req);
 }
 
-static const BlockJobDriver backup_job_driver = {
-    .instance_size          = sizeof(BackupBlockJob),
-    .job_type               = BLOCK_JOB_TYPE_BACKUP,
-    .set_speed              = backup_set_speed,
-    .commit                 = backup_commit,
-    .abort                  = backup_abort,
-    .clean                  = backup_clean,
-    .attached_aio_context   = backup_attached_aio_context,
-};
-
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
                                             bool read, int error)
 {
@@ -526,6 +516,17 @@ static void coroutine_fn backup_run(void *opaque)
     block_job_defer_to_main_loop(&job->common, backup_complete, data);
 }
 
+static const BlockJobDriver backup_job_driver = {
+    .instance_size          = sizeof(BackupBlockJob),
+    .job_type               = BLOCK_JOB_TYPE_BACKUP,
+    .start                  = backup_run,
+    .set_speed              = backup_set_speed,
+    .commit                 = backup_commit,
+    .abort                  = backup_abort,
+    .clean                  = backup_clean,
+    .attached_aio_context   = backup_attached_aio_context,
+};
+
 void backup_start(const char *job_id, BlockDriverState *bs,
                   BlockDriverState *target, int64_t speed,
                   MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
@@ -636,7 +637,7 @@ void backup_start(const char *job_id, BlockDriverState *bs,
 
     bdrv_op_block_all(target, job->common.blocker);
     job->common.len = len;
-    job->common.co = qemu_coroutine_create(backup_run, job);
+    job->common.co = qemu_coroutine_create(job->common.driver->start, job);
     block_job_txn_add_job(txn, &job->common);
     qemu_coroutine_enter(job->common.co);
     return;
diff --git a/block/commit.c b/block/commit.c
index fb5bede..dbaf39e 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -205,6 +205,7 @@ static const BlockJobDriver commit_job_driver = {
     .instance_size = sizeof(CommitBlockJob),
     .job_type      = BLOCK_JOB_TYPE_COMMIT,
     .set_speed     = commit_set_speed,
+    .start         = commit_run,
 };
 
 void commit_start(const char *job_id, BlockDriverState *bs,
@@ -274,7 +275,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
     s->backing_file_str = g_strdup(backing_file_str);
 
     s->on_error = on_error;
-    s->common.co = qemu_coroutine_create(commit_run, s);
+    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
 
     trace_commit_start(bs, base, top, s, s->common.co, opaque);
     qemu_coroutine_enter(s->common.co);
diff --git a/block/mirror.c b/block/mirror.c
index cc62fb0..ef54e5b 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -891,6 +891,7 @@ static const BlockJobDriver mirror_job_driver = {
     .instance_size          = sizeof(MirrorBlockJob),
     .job_type               = BLOCK_JOB_TYPE_MIRROR,
     .set_speed              = mirror_set_speed,
+    .start                  = mirror_run,
     .complete               = mirror_complete,
     .pause                  = mirror_pause,
     .attached_aio_context   = mirror_attached_aio_context,
@@ -900,6 +901,7 @@ static const BlockJobDriver commit_active_job_driver = {
     .instance_size          = sizeof(MirrorBlockJob),
     .job_type               = BLOCK_JOB_TYPE_COMMIT,
     .set_speed              = mirror_set_speed,
+    .start                  = mirror_run,
     .complete               = mirror_complete,
     .pause                  = mirror_pause,
     .attached_aio_context   = mirror_attached_aio_context,
@@ -967,7 +969,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
 
     bdrv_op_block_all(target, s->common.blocker);
 
-    s->common.co = qemu_coroutine_create(mirror_run, s);
+    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
     trace_mirror_start(bs, s, s->common.co, opaque);
     qemu_coroutine_enter(s->common.co);
 }
diff --git a/block/stream.c b/block/stream.c
index 71d0e7a..2a1c814 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -212,6 +212,7 @@ static const BlockJobDriver stream_job_driver = {
     .instance_size = sizeof(StreamBlockJob),
     .job_type      = BLOCK_JOB_TYPE_STREAM,
     .set_speed     = stream_set_speed,
+    .start         = stream_run,
 };
 
 void stream_start(const char *job_id, BlockDriverState *bs,
@@ -231,7 +232,7 @@ void stream_start(const char *job_id, BlockDriverState *bs,
     s->backing_file_str = g_strdup(backing_file_str);
 
     s->on_error = on_error;
-    s->common.co = qemu_coroutine_create(stream_run, s);
+    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
     trace_stream_start(bs, base, s, s->common.co, opaque);
     qemu_coroutine_enter(s->common.co);
 }
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index b7aeaef..f538801 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -47,6 +47,9 @@ struct BlockJobDriver {
     /** Optional callback for job types that need to forward I/O status reset */
     void (*iostatus_reset)(BlockJob *job);
 
+    /** Mandatory: Entrypoint for the Coroutine. */
+    CoroutineEntry *start;
+
     /**
      * Optional callback for job types whose completion must be triggered
      * manually.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v2 09/11] blockjob: add block_job_start
  2016-09-30 22:00 [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition John Snow
                   ` (7 preceding siblings ...)
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 08/11] blockjob: add .start field John Snow
@ 2016-09-30 22:00 ` John Snow
  2016-10-05 15:17   ` Kevin Wolf
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 10/11] blockjob: refactor backup_start as backup_job_create John Snow
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 51+ messages in thread
From: John Snow @ 2016-09-30 22:00 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, vsementsov, famz, stefanha, jcody, eblake, qemu-devel, John Snow

Instead of automatically starting jobs at creation time via backup_start
et al, we'd like to return a job object pointer that can be started
manually at later point in time.

For now, add the block_job_start mechanism and start the jobs
automatically as we have been doing, with conversions job-by-job coming
in later patches.

Of note: cancellation of unstarted jobs will perform all the normal
cleanup as if the job had started, particularly abort and clean. The
only difference is that we will not emit any events, because the job
never actually started.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c            |  3 +--
 block/commit.c            |  3 +--
 block/mirror.c            |  3 +--
 block/stream.c            |  3 +--
 blockjob.c                | 48 +++++++++++++++++++++++++++++++++++------------
 include/block/block_int.h |  8 ++++++++
 tests/test-blockjob-txn.c | 12 ++++++------
 7 files changed, 54 insertions(+), 26 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 58dfc58..7294169 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -637,9 +637,8 @@ void backup_start(const char *job_id, BlockDriverState *bs,
 
     bdrv_op_block_all(target, job->common.blocker);
     job->common.len = len;
-    job->common.co = qemu_coroutine_create(job->common.driver->start, job);
     block_job_txn_add_job(txn, &job->common);
-    qemu_coroutine_enter(job->common.co);
+    block_job_start(&job->common);
     return;
 
  error:
diff --git a/block/commit.c b/block/commit.c
index dbaf39e..e64fdf3 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -275,10 +275,9 @@ void commit_start(const char *job_id, BlockDriverState *bs,
     s->backing_file_str = g_strdup(backing_file_str);
 
     s->on_error = on_error;
-    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
 
     trace_commit_start(bs, base, top, s, s->common.co, opaque);
-    qemu_coroutine_enter(s->common.co);
+    block_job_start(&s->common);
 }
 
 
diff --git a/block/mirror.c b/block/mirror.c
index ef54e5b..2a6a662 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -969,9 +969,8 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
 
     bdrv_op_block_all(target, s->common.blocker);
 
-    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
     trace_mirror_start(bs, s, s->common.co, opaque);
-    qemu_coroutine_enter(s->common.co);
+    block_job_start(&s->common);
 }
 
 void mirror_start(const char *job_id, BlockDriverState *bs,
diff --git a/block/stream.c b/block/stream.c
index 2a1c814..55cbbe7 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -232,7 +232,6 @@ void stream_start(const char *job_id, BlockDriverState *bs,
     s->backing_file_str = g_strdup(backing_file_str);
 
     s->on_error = on_error;
-    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
     trace_stream_start(bs, base, s, s->common.co, opaque);
-    qemu_coroutine_enter(s->common.co);
+    block_job_start(&s->common);
 }
diff --git a/blockjob.c b/blockjob.c
index 44cbf6c..ac9a68d 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -161,7 +161,8 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     job->blk           = blk;
     job->cb            = cb;
     job->opaque        = opaque;
-    job->busy          = true;
+    job->busy          = false;
+    job->paused        = true;
     job->refcnt        = 1;
     bs->job = job;
 
@@ -184,6 +185,15 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     return job;
 }
 
+void block_job_start(BlockJob *job)
+{
+    assert(job && !job->co && job->paused && !job->busy && job->driver->start);
+    job->paused = false;
+    job->busy = true;
+    job->co = qemu_coroutine_create(job->driver->start, job);
+    qemu_coroutine_enter(job->co);
+}
+
 void block_job_ref(BlockJob *job)
 {
     ++job->refcnt;
@@ -220,18 +230,24 @@ static void block_job_completed_single(BlockJob *job)
     if (job->driver->clean) {
         job->driver->clean(job);
     }
+
     if (job->cb) {
         job->cb(job->opaque, job->ret);
     }
-    if (block_job_is_cancelled(job)) {
-        block_job_event_cancelled(job);
-    } else {
-        const char *msg = NULL;
-        if (job->ret < 0) {
-            msg = strerror(-job->ret);
+
+    /* Emit events only if we actually started */
+    if (job->co) {
+        if (block_job_is_cancelled(job)) {
+            block_job_event_cancelled(job);
+        } else {
+            const char *msg = NULL;
+            if (job->ret < 0) {
+                msg = strerror(-job->ret);
+            }
+            block_job_event_completed(job, msg);
         }
-        block_job_event_completed(job, msg);
     }
+
     if (job->txn) {
         QLIST_REMOVE(job, txn_list);
         block_job_txn_unref(job->txn);
@@ -335,7 +351,8 @@ void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
 
 void block_job_complete(BlockJob *job, Error **errp)
 {
-    if (job->pause_count || job->cancelled || !job->driver->complete) {
+    if (job->pause_count || job->cancelled ||
+        !job->co || !job->driver->complete) {
         error_setg(errp, "The active block job '%s' cannot be completed",
                    job->id);
         return;
@@ -364,6 +381,8 @@ bool block_job_paused(BlockJob *job)
 
 void coroutine_fn block_job_pause_point(BlockJob *job)
 {
+    assert(job && job->co);
+
     if (!block_job_should_pause(job)) {
         return;
     }
@@ -408,9 +427,14 @@ void block_job_enter(BlockJob *job)
 
 void block_job_cancel(BlockJob *job)
 {
-    job->cancelled = true;
-    block_job_iostatus_reset(job);
-    block_job_enter(job);
+    if (job->co) {
+        job->cancelled = true;
+        block_job_iostatus_reset(job);
+        block_job_enter(job);
+    } else {
+        /* Job was queued but unstarted. Perform cleanup. */
+        block_job_completed(job, -ECANCELED);
+    }
 }
 
 bool block_job_is_cancelled(BlockJob *job)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 3e79228..686f6a8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -763,6 +763,14 @@ void backup_start(const char *job_id, BlockDriverState *bs,
                   BlockCompletionFunc *cb, void *opaque,
                   BlockJobTxn *txn, Error **errp);
 
+/**
+ * block_job_start:
+ * @job: The job object as returned by @block_job_create.
+ *
+ * Begins execution of a block job.
+ */
+void block_job_start(BlockJob *job);
+
 void hmp_drive_add_node(Monitor *mon, const char *optstr);
 
 BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
index 736e18f..d694b52 100644
--- a/tests/test-blockjob-txn.c
+++ b/tests/test-blockjob-txn.c
@@ -24,10 +24,6 @@ typedef struct {
     int *result;
 } TestBlockJob;
 
-static const BlockJobDriver test_block_job_driver = {
-    .instance_size = sizeof(TestBlockJob),
-};
-
 static void test_block_job_complete(BlockJob *job, void *opaque)
 {
     BlockDriverState *bs = blk_bs(job->blk);
@@ -77,6 +73,11 @@ static void test_block_job_cb(void *opaque, int ret)
     g_free(data);
 }
 
+static const BlockJobDriver test_block_job_driver = {
+    .instance_size = sizeof(TestBlockJob),
+    .start = test_block_job_run,
+};
+
 /* Create a block job that completes with a given return code after a given
  * number of event loop iterations.  The return code is stored in the given
  * result pointer.
@@ -103,10 +104,9 @@ static BlockJob *test_block_job_start(unsigned int iterations,
     s->use_timer = use_timer;
     s->rc = rc;
     s->result = result;
-    s->common.co = qemu_coroutine_create(test_block_job_run, s);
     data->job = s;
     data->result = result;
-    qemu_coroutine_enter(s->common.co);
+    block_job_start(&s->common);
     return &s->common;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v2 10/11] blockjob: refactor backup_start as backup_job_create
  2016-09-30 22:00 [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition John Snow
                   ` (8 preceding siblings ...)
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 09/11] blockjob: add block_job_start John Snow
@ 2016-09-30 22:00 ` John Snow
  2016-10-07 18:39   ` John Snow
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 11/11] iotests: add transactional failure race test John Snow
  2016-09-30 22:22 ` [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition no-reply
  11 siblings, 1 reply; 51+ messages in thread
From: John Snow @ 2016-09-30 22:00 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, vsementsov, famz, stefanha, jcody, eblake, qemu-devel, John Snow

Refactor backup_start as backup_job_create, which only creates the job,
but does not automatically start it. The old interface, 'backup_start',
is not kept in favor of limiting the number of nearly-identical iterfaces
that would have to be edited to keep up with QAPI changes in the future.

Callers that wish to synchronously start the backup_block_job can
instead just call block_job_start immediately after calling
backup_job_create.

Transactions are updated to use the new interface, calling block_job_start
only during the .commit phase, which helps prevent race conditions where
jobs may finish before we even finish building the transaction. This may
happen, for instance, during empty block backup jobs.

Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c            | 26 ++++++++-------
 block/replication.c       | 11 ++++---
 blockdev.c                | 81 +++++++++++++++++++++++++++++++----------------
 include/block/block_int.h | 21 ++++++------
 4 files changed, 86 insertions(+), 53 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 7294169..aad69eb 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -527,7 +527,7 @@ static const BlockJobDriver backup_job_driver = {
     .attached_aio_context   = backup_attached_aio_context,
 };
 
-void backup_start(const char *job_id, BlockDriverState *bs,
+BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                   BlockDriverState *target, int64_t speed,
                   MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
                   bool compress,
@@ -546,52 +546,52 @@ void backup_start(const char *job_id, BlockDriverState *bs,
 
     if (bs == target) {
         error_setg(errp, "Source and target cannot be the same");
-        return;
+        return NULL;
     }
 
     if (!bdrv_is_inserted(bs)) {
         error_setg(errp, "Device is not inserted: %s",
                    bdrv_get_device_name(bs));
-        return;
+        return NULL;
     }
 
     if (!bdrv_is_inserted(target)) {
         error_setg(errp, "Device is not inserted: %s",
                    bdrv_get_device_name(target));
-        return;
+        return NULL;
     }
 
     if (compress && target->drv->bdrv_co_pwritev_compressed == NULL) {
         error_setg(errp, "Compression is not supported for this drive %s",
                    bdrv_get_device_name(target));
-        return;
+        return NULL;
     }
 
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) {
-        return;
+        return NULL;
     }
 
     if (bdrv_op_is_blocked(target, BLOCK_OP_TYPE_BACKUP_TARGET, errp)) {
-        return;
+        return NULL;
     }
 
     if (sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
         if (!sync_bitmap) {
             error_setg(errp, "must provide a valid bitmap name for "
                              "\"incremental\" sync mode");
-            return;
+            return NULL;
         }
 
         /* Create a new bitmap, and freeze/disable this one. */
         if (bdrv_dirty_bitmap_create_successor(bs, sync_bitmap, errp) < 0) {
-            return;
+            return NULL;
         }
     } else if (sync_bitmap) {
         error_setg(errp,
                    "a sync_bitmap was provided to backup_run, "
                    "but received an incompatible sync_mode (%s)",
                    MirrorSyncMode_lookup[sync_mode]);
-        return;
+        return NULL;
     }
 
     len = bdrv_getlength(bs);
@@ -638,8 +638,8 @@ void backup_start(const char *job_id, BlockDriverState *bs,
     bdrv_op_block_all(target, job->common.blocker);
     job->common.len = len;
     block_job_txn_add_job(txn, &job->common);
-    block_job_start(&job->common);
-    return;
+
+    return &job->common;
 
  error:
     if (sync_bitmap) {
@@ -649,4 +649,6 @@ void backup_start(const char *job_id, BlockDriverState *bs,
         blk_unref(job->target);
         block_job_unref(&job->common);
     }
+
+    return NULL;
 }
diff --git a/block/replication.c b/block/replication.c
index b604b93..d9cdc36 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -409,6 +409,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
     int64_t active_length, hidden_length, disk_length;
     AioContext *aio_context;
     Error *local_err = NULL;
+    BlockJob *job;
 
     aio_context = bdrv_get_aio_context(bs);
     aio_context_acquire(aio_context);
@@ -496,16 +497,18 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         bdrv_op_block_all(top_bs, s->blocker);
         bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
 
-        backup_start("replication-backup", s->secondary_disk->bs,
-                     s->hidden_disk->bs, 0, MIRROR_SYNC_MODE_NONE, NULL, false,
-                     BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
-                     backup_job_completed, s, NULL, &local_err);
+        job = backup_job_create("replication-backup", s->secondary_disk->bs,
+                                s->hidden_disk->bs, 0, MIRROR_SYNC_MODE_NONE,
+                                NULL, false, BLOCKDEV_ON_ERROR_REPORT,
+                                BLOCKDEV_ON_ERROR_REPORT, backup_job_completed,
+                                s, NULL, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
             backup_job_cleanup(s);
             aio_context_release(aio_context);
             return;
         }
+        block_job_start(job);
         break;
     default:
         aio_context_release(aio_context);
diff --git a/blockdev.c b/blockdev.c
index 0ac507f..37d78d3 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1866,7 +1866,7 @@ typedef struct DriveBackupState {
     BlockJob *job;
 } DriveBackupState;
 
-static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
+static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
                             Error **errp);
 
 static void drive_backup_prepare(BlkActionState *common, Error **errp)
@@ -1890,23 +1890,27 @@ static void drive_backup_prepare(BlkActionState *common, Error **errp)
     bdrv_drained_begin(bs);
     state->bs = bs;
 
-    do_drive_backup(backup, common->block_job_txn, &local_err);
+    state->job = do_drive_backup(backup, common->block_job_txn, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
     }
+}
 
-    state->job = state->bs->job;
+static void drive_backup_commit(BlkActionState *common)
+{
+    DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
+    if (state->job) {
+        block_job_start(state->job);
+    }
 }
 
 static void drive_backup_abort(BlkActionState *common)
 {
     DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
-    BlockDriverState *bs = state->bs;
 
-    /* Only cancel if it's the job we started */
-    if (bs && bs->job && bs->job == state->job) {
-        block_job_cancel_sync(bs->job);
+    if (state->job) {
+        block_job_cancel_sync(state->job);
     }
 }
 
@@ -1927,8 +1931,8 @@ typedef struct BlockdevBackupState {
     AioContext *aio_context;
 } BlockdevBackupState;
 
-static void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
-                               Error **errp);
+static BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
+                                    Error **errp);
 
 static void blockdev_backup_prepare(BlkActionState *common, Error **errp)
 {
@@ -1961,23 +1965,27 @@ static void blockdev_backup_prepare(BlkActionState *common, Error **errp)
     state->bs = bs;
     bdrv_drained_begin(state->bs);
 
-    do_blockdev_backup(backup, common->block_job_txn, &local_err);
+    state->job = do_blockdev_backup(backup, common->block_job_txn, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
     }
+}
 
-    state->job = state->bs->job;
+static void blockdev_backup_commit(BlkActionState *common)
+{
+    BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, common);
+    if (state->job) {
+        block_job_start(state->job);
+    }
 }
 
 static void blockdev_backup_abort(BlkActionState *common)
 {
     BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, common);
-    BlockDriverState *bs = state->bs;
 
-    /* Only cancel if it's the job we started */
-    if (bs && bs->job && bs->job == state->job) {
-        block_job_cancel_sync(bs->job);
+    if (state->job) {
+        block_job_cancel_sync(state->job);
     }
 }
 
@@ -2127,12 +2135,14 @@ static const BlkActionOps actions[] = {
     [TRANSACTION_ACTION_KIND_DRIVE_BACKUP] = {
         .instance_size = sizeof(DriveBackupState),
         .prepare = drive_backup_prepare,
+        .commit = drive_backup_commit,
         .abort = drive_backup_abort,
         .clean = drive_backup_clean,
     },
     [TRANSACTION_ACTION_KIND_BLOCKDEV_BACKUP] = {
         .instance_size = sizeof(BlockdevBackupState),
         .prepare = blockdev_backup_prepare,
+        .commit = blockdev_backup_commit,
         .abort = blockdev_backup_abort,
         .clean = blockdev_backup_clean,
     },
@@ -3126,11 +3136,13 @@ out:
     aio_context_release(aio_context);
 }
 
-static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
+static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
+                                 Error **errp)
 {
     BlockDriverState *bs;
     BlockDriverState *target_bs;
     BlockDriverState *source = NULL;
+    BlockJob *job = NULL;
     BdrvDirtyBitmap *bmap = NULL;
     AioContext *aio_context;
     QDict *options = NULL;
@@ -3159,7 +3171,7 @@ static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
 
     bs = qmp_get_root_bs(backup->device, errp);
     if (!bs) {
-        return;
+        return NULL;
     }
 
     aio_context = bdrv_get_aio_context(bs);
@@ -3233,9 +3245,10 @@ static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
         }
     }
 
-    backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
-                 bmap, backup->compress, backup->on_source_error,
-                 backup->on_target_error, NULL, bs, txn, &local_err);
+    job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
+                            backup->sync, bmap, backup->compress,
+                            backup->on_source_error, backup->on_target_error,
+                            NULL, bs, txn, &local_err);
     bdrv_unref(target_bs);
     if (local_err != NULL) {
         error_propagate(errp, local_err);
@@ -3244,11 +3257,17 @@ static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
 
 out:
     aio_context_release(aio_context);
+    return job;
 }
 
 void qmp_drive_backup(DriveBackup *arg, Error **errp)
 {
-    return do_drive_backup(arg, NULL, errp);
+
+    BlockJob *job;
+    job = do_drive_backup(arg, NULL, errp);
+    if (job) {
+        block_job_start(job);
+    }
 }
 
 BlockDeviceInfoList *qmp_query_named_block_nodes(Error **errp)
@@ -3256,12 +3275,14 @@ BlockDeviceInfoList *qmp_query_named_block_nodes(Error **errp)
     return bdrv_named_nodes_list(errp);
 }
 
-void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
+BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
+                             Error **errp)
 {
     BlockDriverState *bs;
     BlockDriverState *target_bs;
     Error *local_err = NULL;
     AioContext *aio_context;
+    BlockJob *job = NULL;
 
     if (!backup->has_speed) {
         backup->speed = 0;
@@ -3281,7 +3302,7 @@ void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
 
     bs = qmp_get_root_bs(backup->device, errp);
     if (!bs) {
-        return;
+        return NULL;
     }
 
     aio_context = bdrv_get_aio_context(bs);
@@ -3303,19 +3324,25 @@ void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
             goto out;
         }
     }
-    backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
-                 NULL, backup->compress, backup->on_source_error,
-                 backup->on_target_error, NULL, bs, txn, &local_err);
+    job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
+                            backup->sync, NULL, backup->compress,
+                            backup->on_source_error, backup->on_target_error,
+                            NULL, bs, txn, &local_err);
     if (local_err != NULL) {
         error_propagate(errp, local_err);
     }
 out:
     aio_context_release(aio_context);
+    return job;
 }
 
 void qmp_blockdev_backup(BlockdevBackup *arg, Error **errp)
 {
-    do_blockdev_backup(arg, NULL, errp);
+    BlockJob *job;
+    job = do_blockdev_backup(arg, NULL, errp);
+    if (job) {
+        block_job_start(job);
+    }
 }
 
 /* Parameter check and block job starting for drive mirroring.
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 686f6a8..738e4b4 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -737,7 +737,7 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
                   void *opaque, Error **errp);
 
 /*
- * backup_start:
+ * backup_job_create:
  * @job_id: The id of the newly-created job, or %NULL to use the
  * device name of @bs.
  * @bs: Block device to operate on.
@@ -751,17 +751,18 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
  * @opaque: Opaque pointer value passed to @cb.
  * @txn: Transaction that this job is part of (may be NULL).
  *
- * Start a backup operation on @bs.  Clusters in @bs are written to @target
+ * Create a backup operation on @bs.  Clusters in @bs are written to @target
  * until the job is cancelled or manually completed.
  */
-void backup_start(const char *job_id, BlockDriverState *bs,
-                  BlockDriverState *target, int64_t speed,
-                  MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
-                  bool compress,
-                  BlockdevOnError on_source_error,
-                  BlockdevOnError on_target_error,
-                  BlockCompletionFunc *cb, void *opaque,
-                  BlockJobTxn *txn, Error **errp);
+BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
+                            BlockDriverState *target, int64_t speed,
+                            MirrorSyncMode sync_mode,
+                            BdrvDirtyBitmap *sync_bitmap,
+                            bool compress,
+                            BlockdevOnError on_source_error,
+                            BlockdevOnError on_target_error,
+                            BlockCompletionFunc *cb, void *opaque,
+                            BlockJobTxn *txn, Error **errp);
 
 /**
  * block_job_start:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v2 11/11] iotests: add transactional failure race test
  2016-09-30 22:00 [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition John Snow
                   ` (9 preceding siblings ...)
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 10/11] blockjob: refactor backup_start as backup_job_create John Snow
@ 2016-09-30 22:00 ` John Snow
  2016-10-12 11:26   ` Vladimir Sementsov-Ogievskiy
  2016-09-30 22:22 ` [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition no-reply
  11 siblings, 1 reply; 51+ messages in thread
From: John Snow @ 2016-09-30 22:00 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, vsementsov, famz, stefanha, jcody, eblake, qemu-devel, John Snow

Add a regression test for the case found by Vladimir.

Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
---
 tests/qemu-iotests/124     | 91 ++++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/124.out |  4 +-
 2 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/124 b/tests/qemu-iotests/124
index 2f0bc24..b8f7dad 100644
--- a/tests/qemu-iotests/124
+++ b/tests/qemu-iotests/124
@@ -512,6 +512,97 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
         self.check_backups()
 
 
+    def test_transaction_failure_race(self):
+        '''Test: Verify that transactions with jobs that have no data to
+        transfer do not cause race conditions in the cancellation of the entire
+        transaction job group.
+        '''
+
+        # Create a second drive, with pattern:
+        drive1 = self.add_node('drive1')
+        self.img_create(drive1['file'], drive1['fmt'])
+        io_write_patterns(drive1['file'], (('0x14', 0, 512),
+                                           ('0x5d', '1M', '32k'),
+                                           ('0xcd', '32M', '124k')))
+
+        # Create a blkdebug interface to this img as 'drive1'
+        result = self.vm.qmp('blockdev-add', options={
+            'node-name': drive1['id'],
+            'driver': drive1['fmt'],
+            'file': {
+                'driver': 'blkdebug',
+                'image': {
+                    'driver': 'file',
+                    'filename': drive1['file']
+                },
+                'set-state': [{
+                    'event': 'flush_to_disk',
+                    'state': 1,
+                    'new_state': 2
+                }],
+                'inject-error': [{
+                    'event': 'read_aio',
+                    'errno': 5,
+                    'state': 2,
+                    'immediately': False,
+                    'once': True
+                }],
+            }
+        })
+        self.assert_qmp(result, 'return', {})
+
+        # Create bitmaps and full backups for both drives
+        drive0 = self.drives[0]
+        dr0bm0 = self.add_bitmap('bitmap0', drive0)
+        dr1bm0 = self.add_bitmap('bitmap0', drive1)
+        self.create_anchor_backup(drive0)
+        self.create_anchor_backup(drive1)
+        self.assert_no_active_block_jobs()
+        self.assertFalse(self.vm.get_qmp_events(wait=False))
+
+        # Emulate some writes
+        self.hmp_io_writes(drive1['id'], (('0xba', 0, 512),
+                                          ('0xef', '16M', '256k'),
+                                          ('0x46', '32736k', '64k')))
+
+        # Create incremental backup targets
+        target0 = self.prepare_backup(dr0bm0)
+        target1 = self.prepare_backup(dr1bm0)
+
+        # Ask for a new incremental backup per-each drive, expecting drive1's
+        # backup to fail and attempt to cancel the empty drive0 job.
+        transaction = [
+            transaction_drive_backup(drive0['id'], target0, sync='incremental',
+                                     format=drive0['fmt'], mode='existing',
+                                     bitmap=dr0bm0.name),
+            transaction_drive_backup(drive1['id'], target1, sync='incremental',
+                                     format=drive1['fmt'], mode='existing',
+                                     bitmap=dr1bm0.name)
+        ]
+        result = self.vm.qmp('transaction', actions=transaction,
+                             properties={'completion-mode': 'grouped'} )
+        self.assert_qmp(result, 'return', {})
+
+        # Observe that drive0's backup is cancelled and drive1 completes with
+        # an error.
+        self.wait_qmp_backup_cancelled(drive0['id'])
+        self.assertFalse(self.wait_qmp_backup(drive1['id']))
+        error = self.vm.event_wait('BLOCK_JOB_ERROR')
+        self.assert_qmp(error, 'data', {'device': drive1['id'],
+                                        'action': 'report',
+                                        'operation': 'read'})
+        self.assertFalse(self.vm.get_qmp_events(wait=False))
+        self.assert_no_active_block_jobs()
+
+        # Delete drive0's successful target and eliminate our record of the
+        # unsuccessful drive1 target.
+        dr0bm0.del_target()
+        dr1bm0.del_target()
+
+        self.vm.shutdown()
+
+
+
     def test_sync_dirty_bitmap_missing(self):
         self.assert_no_active_block_jobs()
         self.files.append(self.err_img)
diff --git a/tests/qemu-iotests/124.out b/tests/qemu-iotests/124.out
index 36376be..e56cae0 100644
--- a/tests/qemu-iotests/124.out
+++ b/tests/qemu-iotests/124.out
@@ -1,5 +1,5 @@
-..........
+...........
 ----------------------------------------------------------------------
-Ran 10 tests
+Ran 11 tests
 
 OK
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition
  2016-09-30 22:00 [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition John Snow
                   ` (10 preceding siblings ...)
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 11/11] iotests: add transactional failure race test John Snow
@ 2016-09-30 22:22 ` no-reply
  11 siblings, 0 replies; 51+ messages in thread
From: no-reply @ 2016-09-30 22:22 UTC (permalink / raw)
  To: jsnow; +Cc: famz, qemu-block, kwolf, vsementsov

Hi,

Your series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 1475272849-19990-1-git-send-email-jsnow@redhat.com
Subject: [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git show --no-patch --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/1475272849-19990-1-git-send-email-jsnow@redhat.com -> patchew/1475272849-19990-1-git-send-email-jsnow@redhat.com
Switched to a new branch 'test'
d3bf4cf iotests: add transactional failure race test
ea3adad blockjob: refactor backup_start as backup_job_create
6b5e7c6 blockjob: add block_job_start
fc02e4d blockjob: add .start field
f34ba3c blockjob: add .clean property
6dc649d blockjobs: fix documentation
3e472b6 blockjobs: split interface into public/private
6cbfc4c blockjobs: Always use block_job_get_aio_context
b2ec7f6 Blockjobs: Internalize user_pause logic
e76a966 blockjob: centralize QMP event emissions
cb59e5e blockjob: fix dead pointer in txn list

=== OUTPUT BEGIN ===
Checking PATCH 1/11: blockjob: fix dead pointer in txn list...
Checking PATCH 2/11: blockjob: centralize QMP event emissions...
Checking PATCH 3/11: Blockjobs: Internalize user_pause logic...
Checking PATCH 4/11: blockjobs: Always use block_job_get_aio_context...
Checking PATCH 5/11: blockjobs: split interface into public/private...
ERROR: struct BlockJobDriver should normally be const
#301: FILE: include/block/blockjob.h:31:
+typedef struct BlockJobDriver BlockJobDriver;

ERROR: struct BlockJobDriver should normally be const
#535: FILE: include/block/blockjob_int.h:37:
+struct BlockJobDriver {

ERROR: space prohibited between function name and open parenthesis '('
#579: FILE: include/block/blockjob_int.h:81:
+    void coroutine_fn (*pause)(BlockJob *job);

ERROR: space prohibited between function name and open parenthesis '('
#586: FILE: include/block/blockjob_int.h:88:
+    void coroutine_fn (*resume)(BlockJob *job);

total: 4 errors, 0 warnings, 805 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 6/11: blockjobs: fix documentation...
Checking PATCH 7/11: blockjob: add .clean property...
Checking PATCH 8/11: blockjob: add .start field...
Checking PATCH 9/11: blockjob: add block_job_start...
Checking PATCH 10/11: blockjob: refactor backup_start as backup_job_create...
Checking PATCH 11/11: iotests: add transactional failure race test...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/11] Blockjobs: Internalize user_pause logic
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 03/11] Blockjobs: Internalize user_pause logic John Snow
@ 2016-10-04  0:57   ` Jeff Cody
  2016-10-04  2:46     ` John Snow
  2016-10-04 18:35     ` John Snow
  0 siblings, 2 replies; 51+ messages in thread
From: Jeff Cody @ 2016-10-04  0:57 UTC (permalink / raw)
  To: John Snow
  Cc: qemu-block, kwolf, vsementsov, famz, stefanha, eblake, qemu-devel

On Fri, Sep 30, 2016 at 06:00:41PM -0400, John Snow wrote:
> BlockJobs will begin hiding their state in preparation for some
> refactorings anyway, so let's internalize the user_pause mechanism
> instead of leaving it to callers to correctly manage.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  block/io.c               |  2 +-
>  blockdev.c               | 10 ++++------
>  blockjob.c               | 16 ++++++++++++----
>  include/block/blockjob.h | 11 ++++++++++-
>  4 files changed, 27 insertions(+), 12 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index fdf7080..868b065 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -291,7 +291,7 @@ void bdrv_drain_all(void)
>          AioContext *aio_context = blk_get_aio_context(job->blk);
>  
>          aio_context_acquire(aio_context);
> -        block_job_pause(job);
> +        block_job_pause(job, false);

Hmm.  Maybe semantically, an enum might work better here than a bool.  When
I see "block_job_pause(job, false)", it reads like you just un-paused the
job, rather than paused it with some contextual info.  Perhaps
JOB_USER_PAUSE and JOB_SYS_PAUSE (or just JOB_USER and JOB_SYS)?

>          aio_context_release(aio_context);
>      }
>  
> diff --git a/blockdev.c b/blockdev.c
> index 03200e7..268452f 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3629,7 +3629,7 @@ void qmp_block_job_cancel(const char *device,
>          force = false;
>      }
>  
> -    if (job->user_paused && !force) {
> +    if (block_job_paused(job) && !force) {
>          error_setg(errp, "The block job for device '%s' is currently paused",
>                     device);
>          goto out;
> @@ -3646,13 +3646,12 @@ void qmp_block_job_pause(const char *device, Error **errp)
>      AioContext *aio_context;
>      BlockJob *job = find_block_job(device, &aio_context, errp);
>  
> -    if (!job || job->user_paused) {
> +    if (!job || block_job_paused(job)) {
>          return;
>      }
>  
> -    job->user_paused = true;
>      trace_qmp_block_job_pause(job);
> -    block_job_pause(job);
> +    block_job_pause(job, true);
>      aio_context_release(aio_context);
>  }
>  
> @@ -3661,11 +3660,10 @@ void qmp_block_job_resume(const char *device, Error **errp)
>      AioContext *aio_context;
>      BlockJob *job = find_block_job(device, &aio_context, errp);
>  
> -    if (!job || !job->user_paused) {
> +    if (!job || !block_job_paused(job)) {
>          return;
>      }
>  
> -    job->user_paused = false;
>      trace_qmp_block_job_resume(job);
>      block_job_iostatus_reset(job);
>      block_job_resume(job);
> diff --git a/blockjob.c b/blockjob.c
> index 6a300ba..2a35f50 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -104,7 +104,7 @@ static void block_job_detach_aio_context(void *opaque)
>      /* In case the job terminates during aio_poll()... */
>      block_job_ref(job);
>  
> -    block_job_pause(job);
> +    block_job_pause(job, false);
>  
>      if (!job->paused) {
>          /* If job is !job->busy this kicks it into the next pause point. */
> @@ -343,9 +343,12 @@ void block_job_complete(BlockJob *job, Error **errp)
>      job->driver->complete(job, errp);
>  }
>  
> -void block_job_pause(BlockJob *job)
> +void block_job_pause(BlockJob *job, bool user)
>  {
>      job->pause_count++;
> +    if (user) {
> +        job->user_paused = true;
> +    }
>  }
>  
>  static bool block_job_should_pause(BlockJob *job)
> @@ -353,6 +356,11 @@ static bool block_job_should_pause(BlockJob *job)
>      return job->pause_count > 0;
>  }
>  
> +bool block_job_paused(BlockJob *job)

I think a more apt name for this would be "block_job_user_paused()", because
it doesn't tell if the job is paused per se, but only if it is paused by a
user.

> +{
> +    return job ? job->user_paused : 0;
> +}
> +
>  void coroutine_fn block_job_pause_point(BlockJob *job)
>  {
>      if (!block_job_should_pause(job)) {
> @@ -386,6 +394,7 @@ void block_job_resume(BlockJob *job)
>      if (job->pause_count) {
>          return;
>      }
> +    job->user_paused = false;

At first I was thinking block_job_resume() would need a way of knowing if it
was a user resume, similar to block_job_pause().  But I guess not... if the
user paused it, then job->pause_count should always be > 0, I think.

But wait...could we ever get into a state where something like this happens:

block_job_pause(job, true);  /* user paused */

[...]

block_job_pause(job, false); /* system paused */

[...]

block_job_resume(job);       /* user resumes */

[...]   /* job->user_paused is still true, although it shouldn't be */

block_job_resume(job);       /* system resumes */

                             /* now job->user_paused is false */


>      block_job_enter(job);
>  }
>  
> @@ -592,8 +601,7 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
>                                      action, &error_abort);
>      if (action == BLOCK_ERROR_ACTION_STOP) {
>          /* make the pause user visible, which will be resumed from QMP. */
> -        job->user_paused = true;
> -        block_job_pause(job);
> +        block_job_pause(job, true);
>          block_job_iostatus_set_err(job, error);
>      }
>      return action;
> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> index 4ddb4ae..081f6c2 100644
> --- a/include/block/blockjob.h
> +++ b/include/block/blockjob.h
> @@ -347,10 +347,19 @@ void coroutine_fn block_job_pause_point(BlockJob *job);
>  /**
>   * block_job_pause:
>   * @job: The job to be paused.
> + * @user: Requested explicitly via user?
>   *
>   * Asynchronously pause the specified job.
>   */
> -void block_job_pause(BlockJob *job);
> +void block_job_pause(BlockJob *job, bool user);
> +
> +/**
> + * block_job_paused:
> + * @job: The job to query.
> + *
> + * Returns true if the job is user-paused.
> + */
> +bool block_job_paused(BlockJob *job);
>  
>  /**
>   * block_job_resume:
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/11] Blockjobs: Internalize user_pause logic
  2016-10-04  0:57   ` Jeff Cody
@ 2016-10-04  2:46     ` John Snow
  2016-10-04 18:35     ` John Snow
  1 sibling, 0 replies; 51+ messages in thread
From: John Snow @ 2016-10-04  2:46 UTC (permalink / raw)
  To: Jeff Cody
  Cc: qemu-block, kwolf, vsementsov, famz, stefanha, eblake, qemu-devel



On 10/03/2016 08:57 PM, Jeff Cody wrote:
> On Fri, Sep 30, 2016 at 06:00:41PM -0400, John Snow wrote:
>> BlockJobs will begin hiding their state in preparation for some
>> refactorings anyway, so let's internalize the user_pause mechanism
>> instead of leaving it to callers to correctly manage.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  block/io.c               |  2 +-
>>  blockdev.c               | 10 ++++------
>>  blockjob.c               | 16 ++++++++++++----
>>  include/block/blockjob.h | 11 ++++++++++-
>>  4 files changed, 27 insertions(+), 12 deletions(-)
>>
>> diff --git a/block/io.c b/block/io.c
>> index fdf7080..868b065 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -291,7 +291,7 @@ void bdrv_drain_all(void)
>>          AioContext *aio_context = blk_get_aio_context(job->blk);
>>
>>          aio_context_acquire(aio_context);
>> -        block_job_pause(job);
>> +        block_job_pause(job, false);
>
> Hmm.  Maybe semantically, an enum might work better here than a bool.  When
> I see "block_job_pause(job, false)", it reads like you just un-paused the
> job, rather than paused it with some contextual info.  Perhaps
> JOB_USER_PAUSE and JOB_SYS_PAUSE (or just JOB_USER and JOB_SYS)?
>

Thanks for taking a look. Mostly I was trying to squish users of jobs 
internals into API accesses instead, in semi-prep for shifting things 
around a good bit later.

The real inspiration was trying to figure out if adding a new state was 
safe, and figuring out who-touched-what-and-when was easier if I split 
things into "internal" and "external" functions, then I could audit the 
external functions more carefully.

(This is a tip for auditing the safety of what I do later in this 
series...!)

To the point, you're probably right, this was a bit of a quick hack. 
I'll make this nicer tomorrow.

>>          aio_context_release(aio_context);
>>      }
>>
>> diff --git a/blockdev.c b/blockdev.c
>> index 03200e7..268452f 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -3629,7 +3629,7 @@ void qmp_block_job_cancel(const char *device,
>>          force = false;
>>      }
>>
>> -    if (job->user_paused && !force) {
>> +    if (block_job_paused(job) && !force) {
>>          error_setg(errp, "The block job for device '%s' is currently paused",
>>                     device);
>>          goto out;
>> @@ -3646,13 +3646,12 @@ void qmp_block_job_pause(const char *device, Error **errp)
>>      AioContext *aio_context;
>>      BlockJob *job = find_block_job(device, &aio_context, errp);
>>
>> -    if (!job || job->user_paused) {
>> +    if (!job || block_job_paused(job)) {
>>          return;
>>      }
>>
>> -    job->user_paused = true;
>>      trace_qmp_block_job_pause(job);
>> -    block_job_pause(job);
>> +    block_job_pause(job, true);
>>      aio_context_release(aio_context);
>>  }
>>
>> @@ -3661,11 +3660,10 @@ void qmp_block_job_resume(const char *device, Error **errp)
>>      AioContext *aio_context;
>>      BlockJob *job = find_block_job(device, &aio_context, errp);
>>
>> -    if (!job || !job->user_paused) {
>> +    if (!job || !block_job_paused(job)) {
>>          return;
>>      }
>>
>> -    job->user_paused = false;
>>      trace_qmp_block_job_resume(job);
>>      block_job_iostatus_reset(job);
>>      block_job_resume(job);
>> diff --git a/blockjob.c b/blockjob.c
>> index 6a300ba..2a35f50 100644
>> --- a/blockjob.c
>> +++ b/blockjob.c
>> @@ -104,7 +104,7 @@ static void block_job_detach_aio_context(void *opaque)
>>      /* In case the job terminates during aio_poll()... */
>>      block_job_ref(job);
>>
>> -    block_job_pause(job);
>> +    block_job_pause(job, false);
>>
>>      if (!job->paused) {
>>          /* If job is !job->busy this kicks it into the next pause point. */
>> @@ -343,9 +343,12 @@ void block_job_complete(BlockJob *job, Error **errp)
>>      job->driver->complete(job, errp);
>>  }
>>
>> -void block_job_pause(BlockJob *job)
>> +void block_job_pause(BlockJob *job, bool user)
>>  {
>>      job->pause_count++;
>> +    if (user) {
>> +        job->user_paused = true;
>> +    }
>>  }
>>
>>  static bool block_job_should_pause(BlockJob *job)
>> @@ -353,6 +356,11 @@ static bool block_job_should_pause(BlockJob *job)
>>      return job->pause_count > 0;
>>  }
>>
>> +bool block_job_paused(BlockJob *job)
>
> I think a more apt name for this would be "block_job_user_paused()", because
> it doesn't tell if the job is paused per se, but only if it is paused by a
> user.
>

Sure.

>> +{
>> +    return job ? job->user_paused : 0;
>> +}
>> +
>>  void coroutine_fn block_job_pause_point(BlockJob *job)
>>  {
>>      if (!block_job_should_pause(job)) {
>> @@ -386,6 +394,7 @@ void block_job_resume(BlockJob *job)
>>      if (job->pause_count) {
>>          return;
>>      }
>> +    job->user_paused = false;
>
> At first I was thinking block_job_resume() would need a way of knowing if it
> was a user resume, similar to block_job_pause().  But I guess not... if the
> user paused it, then job->pause_count should always be > 0, I think.
>
> But wait...could we ever get into a state where something like this happens:
>
> block_job_pause(job, true);  /* user paused */
>
> [...]
>
> block_job_pause(job, false); /* system paused */
>
> [...]
>
> block_job_resume(job);       /* user resumes */
>
> [...]   /* job->user_paused is still true, although it shouldn't be */
>
> block_job_resume(job);       /* system resumes */
>
>                              /* now job->user_paused is false */
>
>

Good question. And if it does happen, what's the impact?

(1) The user could not cancel the job until whatever was pausing it lets 
go, unless they used the force option.

Maybe this leads to new cancel failures that didn't exist previously.

(2) The user could not re-pause the job until the job resumed again.

This doesn't seem critical, though it is annoying to deny a request we 
could have easily serviced.

(3) The user may be able to re-resume the job.

This will ruin the pause counter.

That's all kind of annoying, so perhaps a counterpart change to the 
resume API is warranted.

>>      block_job_enter(job);
>>  }
>>
>> @@ -592,8 +601,7 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
>>                                      action, &error_abort);
>>      if (action == BLOCK_ERROR_ACTION_STOP) {
>>          /* make the pause user visible, which will be resumed from QMP. */
>> -        job->user_paused = true;
>> -        block_job_pause(job);
>> +        block_job_pause(job, true);
>>          block_job_iostatus_set_err(job, error);
>>      }
>>      return action;
>> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
>> index 4ddb4ae..081f6c2 100644
>> --- a/include/block/blockjob.h
>> +++ b/include/block/blockjob.h
>> @@ -347,10 +347,19 @@ void coroutine_fn block_job_pause_point(BlockJob *job);
>>  /**
>>   * block_job_pause:
>>   * @job: The job to be paused.
>> + * @user: Requested explicitly via user?
>>   *
>>   * Asynchronously pause the specified job.
>>   */
>> -void block_job_pause(BlockJob *job);
>> +void block_job_pause(BlockJob *job, bool user);
>> +
>> +/**
>> + * block_job_paused:
>> + * @job: The job to query.
>> + *
>> + * Returns true if the job is user-paused.
>> + */
>> +bool block_job_paused(BlockJob *job);
>>
>>  /**
>>   * block_job_resume:
>> --
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/11] Blockjobs: Internalize user_pause logic
  2016-10-04  0:57   ` Jeff Cody
  2016-10-04  2:46     ` John Snow
@ 2016-10-04 18:35     ` John Snow
  1 sibling, 0 replies; 51+ messages in thread
From: John Snow @ 2016-10-04 18:35 UTC (permalink / raw)
  To: Jeff Cody; +Cc: kwolf, vsementsov, famz, qemu-block, qemu-devel, stefanha



On 10/03/2016 08:57 PM, Jeff Cody wrote:
> On Fri, Sep 30, 2016 at 06:00:41PM -0400, John Snow wrote:
>> BlockJobs will begin hiding their state in preparation for some
>> refactorings anyway, so let's internalize the user_pause mechanism
>> instead of leaving it to callers to correctly manage.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  block/io.c               |  2 +-
>>  blockdev.c               | 10 ++++------
>>  blockjob.c               | 16 ++++++++++++----
>>  include/block/blockjob.h | 11 ++++++++++-
>>  4 files changed, 27 insertions(+), 12 deletions(-)
>>
>> diff --git a/block/io.c b/block/io.c
>> index fdf7080..868b065 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -291,7 +291,7 @@ void bdrv_drain_all(void)
>>          AioContext *aio_context = blk_get_aio_context(job->blk);
>>
>>          aio_context_acquire(aio_context);
>> -        block_job_pause(job);
>> +        block_job_pause(job, false);
>
> Hmm.  Maybe semantically, an enum might work better here than a bool.  When
> I see "block_job_pause(job, false)", it reads like you just un-paused the
> job, rather than paused it with some contextual info.  Perhaps
> JOB_USER_PAUSE and JOB_SYS_PAUSE (or just JOB_USER and JOB_SYS)?
>
>>          aio_context_release(aio_context);
>>      }
>>
>> diff --git a/blockdev.c b/blockdev.c
>> index 03200e7..268452f 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -3629,7 +3629,7 @@ void qmp_block_job_cancel(const char *device,
>>          force = false;
>>      }
>>
>> -    if (job->user_paused && !force) {
>> +    if (block_job_paused(job) && !force) {
>>          error_setg(errp, "The block job for device '%s' is currently paused",
>>                     device);
>>          goto out;
>> @@ -3646,13 +3646,12 @@ void qmp_block_job_pause(const char *device, Error **errp)
>>      AioContext *aio_context;
>>      BlockJob *job = find_block_job(device, &aio_context, errp);
>>
>> -    if (!job || job->user_paused) {
>> +    if (!job || block_job_paused(job)) {
>>          return;
>>      }
>>
>> -    job->user_paused = true;
>>      trace_qmp_block_job_pause(job);
>> -    block_job_pause(job);
>> +    block_job_pause(job, true);
>>      aio_context_release(aio_context);
>>  }
>>
>> @@ -3661,11 +3660,10 @@ void qmp_block_job_resume(const char *device, Error **errp)
>>      AioContext *aio_context;
>>      BlockJob *job = find_block_job(device, &aio_context, errp);
>>
>> -    if (!job || !job->user_paused) {
>> +    if (!job || !block_job_paused(job)) {
>>          return;
>>      }
>>
>> -    job->user_paused = false;
>>      trace_qmp_block_job_resume(job);
>>      block_job_iostatus_reset(job);
>>      block_job_resume(job);
>> diff --git a/blockjob.c b/blockjob.c
>> index 6a300ba..2a35f50 100644
>> --- a/blockjob.c
>> +++ b/blockjob.c
>> @@ -104,7 +104,7 @@ static void block_job_detach_aio_context(void *opaque)
>>      /* In case the job terminates during aio_poll()... */
>>      block_job_ref(job);
>>
>> -    block_job_pause(job);
>> +    block_job_pause(job, false);
>>
>>      if (!job->paused) {
>>          /* If job is !job->busy this kicks it into the next pause point. */
>> @@ -343,9 +343,12 @@ void block_job_complete(BlockJob *job, Error **errp)
>>      job->driver->complete(job, errp);
>>  }
>>
>> -void block_job_pause(BlockJob *job)
>> +void block_job_pause(BlockJob *job, bool user)
>>  {
>>      job->pause_count++;
>> +    if (user) {
>> +        job->user_paused = true;
>> +    }
>>  }
>>
>>  static bool block_job_should_pause(BlockJob *job)
>> @@ -353,6 +356,11 @@ static bool block_job_should_pause(BlockJob *job)
>>      return job->pause_count > 0;
>>  }
>>
>> +bool block_job_paused(BlockJob *job)
>
> I think a more apt name for this would be "block_job_user_paused()", because
> it doesn't tell if the job is paused per se, but only if it is paused by a
> user.
>

Done.

>> +{
>> +    return job ? job->user_paused : 0;
>> +}
>> +
>>  void coroutine_fn block_job_pause_point(BlockJob *job)
>>  {
>>      if (!block_job_should_pause(job)) {
>> @@ -386,6 +394,7 @@ void block_job_resume(BlockJob *job)
>>      if (job->pause_count) {
>>          return;
>>      }
>> +    job->user_paused = false;
>
> At first I was thinking block_job_resume() would need a way of knowing if it
> was a user resume, similar to block_job_pause().  But I guess not... if the
> user paused it, then job->pause_count should always be > 0, I think.
>
> But wait...could we ever get into a state where something like this happens:
>
> block_job_pause(job, true);  /* user paused */
>
> [...]
>
> block_job_pause(job, false); /* system paused */
>
> [...]
>
> block_job_resume(job);       /* user resumes */
>
> [...]   /* job->user_paused is still true, although it shouldn't be */
>
> block_job_resume(job);       /* system resumes */
>
>                              /* now job->user_paused is false */
>
>

I decided to fix both problems by just having a block_job_user_pause() 
and a block_job_user_resume().

I'll wait for more reviews before I send out the v2.

>>      block_job_enter(job);
>>  }
>>
>> @@ -592,8 +601,7 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
>>                                      action, &error_abort);
>>      if (action == BLOCK_ERROR_ACTION_STOP) {
>>          /* make the pause user visible, which will be resumed from QMP. */
>> -        job->user_paused = true;
>> -        block_job_pause(job);
>> +        block_job_pause(job, true);
>>          block_job_iostatus_set_err(job, error);
>>      }
>>      return action;
>> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
>> index 4ddb4ae..081f6c2 100644
>> --- a/include/block/blockjob.h
>> +++ b/include/block/blockjob.h
>> @@ -347,10 +347,19 @@ void coroutine_fn block_job_pause_point(BlockJob *job);
>>  /**
>>   * block_job_pause:
>>   * @job: The job to be paused.
>> + * @user: Requested explicitly via user?
>>   *
>>   * Asynchronously pause the specified job.
>>   */
>> -void block_job_pause(BlockJob *job);
>> +void block_job_pause(BlockJob *job, bool user);
>> +
>> +/**
>> + * block_job_paused:
>> + * @job: The job to query.
>> + *
>> + * Returns true if the job is user-paused.
>> + */
>> +bool block_job_paused(BlockJob *job);
>>
>>  /**
>>   * block_job_resume:
>> --
>> 2.7.4
>>
>

--js

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions John Snow
@ 2016-10-05 13:43   ` Kevin Wolf
  2016-10-05 18:49     ` John Snow
  0 siblings, 1 reply; 51+ messages in thread
From: Kevin Wolf @ 2016-10-05 13:43 UTC (permalink / raw)
  To: John Snow
  Cc: qemu-block, vsementsov, famz, stefanha, jcody, eblake, qemu-devel

Am 01.10.2016 um 00:00 hat John Snow geschrieben:
> There's no reason to leave this to blockdev; we can do it in blockjobs
> directly and get rid of an extra callback for most users.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  blockdev.c | 37 ++++++-------------------------------
>  blockjob.c | 16 ++++++++++++++--
>  2 files changed, 20 insertions(+), 33 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 29c6561..03200e7 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -2957,31 +2957,6 @@ out:
>      aio_context_release(aio_context);
>  }
>  
> -static void block_job_cb(void *opaque, int ret)
> -{
> -    /* Note that this function may be executed from another AioContext besides
> -     * the QEMU main loop.  If you need to access anything that assumes the
> -     * QEMU global mutex, use a BH or introduce a mutex.
> -     */
> -
> -    BlockDriverState *bs = opaque;
> -    const char *msg = NULL;
> -
> -    trace_block_job_cb(bs, bs->job, ret);

This trace event is removed from the code, but not from trace-events.

> -
> -    assert(bs->job);
> -
> -    if (ret < 0) {
> -        msg = strerror(-ret);
> -    }
> -
> -    if (block_job_is_cancelled(bs->job)) {
> -        block_job_event_cancelled(bs->job);
> -    } else {
> -        block_job_event_completed(bs->job, msg);

block_job_event_cancelled/completed can become static now.

> -    }
> -}
> -
>  void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
>                        bool has_base, const char *base,
>                        bool has_backing_file, const char *backing_file,
> @@ -3033,7 +3008,7 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
>      base_name = has_backing_file ? backing_file : base_name;
>  
>      stream_start(has_job_id ? job_id : NULL, bs, base_bs, base_name,
> -                 has_speed ? speed : 0, on_error, block_job_cb, bs, &local_err);
> +                 has_speed ? speed : 0, on_error, NULL, bs, &local_err);

Passing cb == NULL, but opaque != NULL is harmless, but feels odd.

And actually this is the only caller of stream_start, so the parameters
could just be dropped.

>      if (local_err) {
>          error_propagate(errp, local_err);
>          goto out;
> @@ -3136,10 +3111,10 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
>              goto out;
>          }
>          commit_active_start(has_job_id ? job_id : NULL, bs, base_bs, speed,
> -                            on_error, block_job_cb, bs, &local_err, false);
> +                            on_error, NULL, bs, &local_err, false);

Here we have an additional caller in block/replication.c and qemu-img,
so the parameters must stay. For qemu-img, nothing changes. For
replication, the block job events are added as a side effect.

Not sure if we want to emit such events for an internal block job, but
if we do want the change, it should be explicit.

>      } else {
>          commit_start(has_job_id ? job_id : NULL, bs, base_bs, top_bs, speed,
> -                     on_error, block_job_cb, bs,
> +                     on_error, NULL, bs,
>                       has_backing_file ? backing_file : NULL, &local_err);

Like stream_start, drop the parameters.

>      }
>      if (local_err != NULL) {
> @@ -3260,7 +3235,7 @@ static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
>  
>      backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
>                   bmap, backup->compress, backup->on_source_error,
> -                 backup->on_target_error, block_job_cb, bs, txn, &local_err);
> +                 backup->on_target_error, NULL, bs, txn, &local_err);
>      bdrv_unref(target_bs);
>      if (local_err != NULL) {
>          error_propagate(errp, local_err);
> @@ -3330,7 +3305,7 @@ void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
>      }
>      backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
>                   NULL, backup->compress, backup->on_source_error,
> -                 backup->on_target_error, block_job_cb, bs, txn, &local_err);
> +                 backup->on_target_error, NULL, bs, txn, &local_err);

Backup is another job used by replication, too. Same question as above.

>      if (local_err != NULL) {
>          error_propagate(errp, local_err);
>      }
> @@ -3410,7 +3385,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>                   has_replaces ? replaces : NULL,
>                   speed, granularity, buf_size, sync, backing_mode,
>                   on_source_error, on_target_error, unmap,
> -                 block_job_cb, bs, errp);
> +                 NULL, bs, errp);

And again, the parameters can be dropped.

Kevin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/11] blockjob: fix dead pointer in txn list
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 01/11] blockjob: fix dead pointer in txn list John Snow
@ 2016-10-05 13:43   ` Kevin Wolf
  0 siblings, 0 replies; 51+ messages in thread
From: Kevin Wolf @ 2016-10-05 13:43 UTC (permalink / raw)
  To: John Snow
  Cc: qemu-block, vsementsov, famz, stefanha, jcody, eblake, qemu-devel

Am 01.10.2016 um 00:00 hat John Snow geschrieben:
> From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 
> Though it is not intended to be reached through normal circumstances,
> if we do not gracefully deconstruct the transaction QLIST, we may wind
> up with stale pointers in the list.
> 
> The rest of this series attempts to address the underlying issues,
> but this should fix list inconsistencies.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Tested-by: John Snow <jsnow@redhat.com>
> Reviewed-by: John Snow <jsnow@redhat.com>
> [Rewrote commit message. --js]
> Signed-off-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
> Signed-off-by: John Snow <jsnow@redhat.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/11] blockjobs: Always use block_job_get_aio_context
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 04/11] blockjobs: Always use block_job_get_aio_context John Snow
@ 2016-10-05 14:02   ` Kevin Wolf
  2016-10-06 20:22     ` John Snow
  0 siblings, 1 reply; 51+ messages in thread
From: Kevin Wolf @ 2016-10-05 14:02 UTC (permalink / raw)
  To: John Snow
  Cc: qemu-block, vsementsov, famz, stefanha, jcody, eblake, qemu-devel

Am 01.10.2016 um 00:00 hat John Snow geschrieben:
> There are a few places where we're fishing it out for ourselves.
> Let's not do that and instead use the helper.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>

That change makes a difference when the block job is running its
completion part after block_job_defer_to_main_loop(). The commit message
could be more explicit about whether this is intentional or whether this
case is expected to happen at all.

I suspect that if it can happen, this is a bug fix. Please check and
update the commit message accordingly.

Kevin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/11] blockjobs: split interface into public/private
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 05/11] blockjobs: split interface into public/private John Snow
@ 2016-10-05 14:17   ` Kevin Wolf
  2016-10-05 16:20     ` John Snow
  0 siblings, 1 reply; 51+ messages in thread
From: Kevin Wolf @ 2016-10-05 14:17 UTC (permalink / raw)
  To: John Snow
  Cc: qemu-block, vsementsov, famz, stefanha, jcody, eblake, qemu-devel

Am 01.10.2016 um 00:00 hat John Snow geschrieben:
> To make it a little more obvious which functions are intended to be
> public interface and which are intended to be for use only by jobs
> themselves, split the interface into "public" and "private" files.
> 
> Convert blockjobs (e.g. block/backup) to using the private interface.
> Leave blockdev and others on the public interface.
> 
> Give up and let qemu-img use the internal interface, though it doesn't
> strictly need to be using it.
> 
> As a side-effect, hide the BlockJob and BlockJobDriver implementation
> from most of the QEMU codebase.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>

> --- a/block/replication.c
> +++ b/block/replication.c
> @@ -15,7 +15,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu-common.h"
>  #include "block/nbd.h"
> -#include "block/blockjob.h"
> +#include "block/blockjob_int.h"
>  #include "block/block_int.h"
>  #include "block/block_backup.h"
>  #include "sysemu/block-backend.h"

This one is wrong. replication.c is a block job user, not part of the
implementation. After removing this hunk, the result still compiles.

> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -7,16 +7,15 @@
>  #include "qemu/coroutine.h"
>  #include "block/accounting.h"
>  #include "block/dirty-bitmap.h"
> +#include "block/blockjob.h"
>  #include "qapi/qmp/qobject.h"
>  #include "qapi-types.h"
>  #include "qemu/hbitmap.h"

Hm... This header file doesn't need any of the definitions from
blockjob.h, so I guess .c files should include it explicitly rather than
getting it from here. That would be a separate patch, though.

>  /* block.c */
>  typedef struct BlockDriver BlockDriver;
> -typedef struct BlockJob BlockJob;
>  typedef struct BdrvChild BdrvChild;
>  typedef struct BdrvChildRole BdrvChildRole;
> -typedef struct BlockJobTxn BlockJobTxn;
>  
>  typedef struct BlockDriverInfo {
>      /* in bytes, 0 if irrelevant */

> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -37,7 +37,7 @@
>  #include "sysemu/sysemu.h"
>  #include "sysemu/block-backend.h"
>  #include "block/block_int.h"
> -#include "block/blockjob.h"
> +#include "block/blockjob_int.h"
>  #include "block/qapi.h"
>  #include "crypto/init.h"
>  #include "trace/control.h"

qemu-img doesn't compile without including blockjob_int.h, but that's
just a sign that the split isn't completely right yet. Maybe it needs to
use the pulic function block_job_query() to get the information it takes
directly from the BlockJob struct.

Kevin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/11] blockjobs: fix documentation
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 06/11] blockjobs: fix documentation John Snow
@ 2016-10-05 15:03   ` Kevin Wolf
  2016-10-05 16:22     ` John Snow
  0 siblings, 1 reply; 51+ messages in thread
From: Kevin Wolf @ 2016-10-05 15:03 UTC (permalink / raw)
  To: John Snow
  Cc: qemu-block, vsementsov, famz, stefanha, jcody, eblake, qemu-devel

Am 01.10.2016 um 00:00 hat John Snow geschrieben:
> Wrong function names in documentation.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  include/block/blockjob_int.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
> index 0a2d41e..c6da7e4 100644
> --- a/include/block/blockjob_int.h
> +++ b/include/block/blockjob_int.h
> @@ -305,7 +305,7 @@ void block_job_enter(BlockJob *job);
>  void block_job_event_cancelled(BlockJob *job);
>  
>  /**
> - * block_job_ready:
> + * block_job_event_completed:

Yes...

>   * @job: The job which is now ready to complete.

...that was half of the fix.

>   * @msg: Error message. Only present on failure.
>   *

Kevin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/11] blockjob: add block_job_start
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 09/11] blockjob: add block_job_start John Snow
@ 2016-10-05 15:17   ` Kevin Wolf
  2016-10-06 22:44     ` John Snow
  0 siblings, 1 reply; 51+ messages in thread
From: Kevin Wolf @ 2016-10-05 15:17 UTC (permalink / raw)
  To: John Snow
  Cc: qemu-block, vsementsov, famz, stefanha, jcody, eblake, qemu-devel

Am 01.10.2016 um 00:00 hat John Snow geschrieben:
> Instead of automatically starting jobs at creation time via backup_start
> et al, we'd like to return a job object pointer that can be started
> manually at later point in time.
> 
> For now, add the block_job_start mechanism and start the jobs
> automatically as we have been doing, with conversions job-by-job coming
> in later patches.
> 
> Of note: cancellation of unstarted jobs will perform all the normal
> cleanup as if the job had started, particularly abort and clean. The
> only difference is that we will not emit any events, because the job
> never actually started.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>

Should we make sure that jobs are only added to the block_jobs list once
they are started? It doesn't sound like a good idea to make a job
without a coroutine user-accessible.

Kevin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/11] blockjobs: split interface into public/private
  2016-10-05 14:17   ` Kevin Wolf
@ 2016-10-05 16:20     ` John Snow
  0 siblings, 0 replies; 51+ messages in thread
From: John Snow @ 2016-10-05 16:20 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, vsementsov, famz, stefanha, jcody, eblake, qemu-devel



On 10/05/2016 10:17 AM, Kevin Wolf wrote:
> Am 01.10.2016 um 00:00 hat John Snow geschrieben:
>> To make it a little more obvious which functions are intended to be
>> public interface and which are intended to be for use only by jobs
>> themselves, split the interface into "public" and "private" files.
>>
>> Convert blockjobs (e.g. block/backup) to using the private interface.
>> Leave blockdev and others on the public interface.
>>
>> Give up and let qemu-img use the internal interface, though it doesn't
>> strictly need to be using it.
>>
>> As a side-effect, hide the BlockJob and BlockJobDriver implementation
>> from most of the QEMU codebase.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>
>> --- a/block/replication.c
>> +++ b/block/replication.c
>> @@ -15,7 +15,7 @@
>>  #include "qemu/osdep.h"
>>  #include "qemu-common.h"
>>  #include "block/nbd.h"
>> -#include "block/blockjob.h"
>> +#include "block/blockjob_int.h"
>>  #include "block/block_int.h"
>>  #include "block/block_backup.h"
>>  #include "sysemu/block-backend.h"
>
> This one is wrong. replication.c is a block job user, not part of the
> implementation. After removing this hunk, the result still compiles.
>

Sorry, this is a mistake from an intermediate form of the patch.

>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -7,16 +7,15 @@
>>  #include "qemu/coroutine.h"
>>  #include "block/accounting.h"
>>  #include "block/dirty-bitmap.h"
>> +#include "block/blockjob.h"
>>  #include "qapi/qmp/qobject.h"
>>  #include "qapi-types.h"
>>  #include "qemu/hbitmap.h"
>
> Hm... This header file doesn't need any of the definitions from
> blockjob.h, so I guess .c files should include it explicitly rather than
> getting it from here. That would be a separate patch, though.
>

OK. The reason this happened was because we used to have the struct 
definitions in block_int.h; and by moving those to a Blockjob-specific 
header, I needed to reintroduce those definitions somehow. I reasoned 
that BlockJobs were part of the public interface for the Block layer, so 
I added it here.

I can remove it and include the public blockjob interface only where 
specifically required if desired, though.

>>  /* block.c */
>>  typedef struct BlockDriver BlockDriver;
>> -typedef struct BlockJob BlockJob;
>>  typedef struct BdrvChild BdrvChild;
>>  typedef struct BdrvChildRole BdrvChildRole;
>> -typedef struct BlockJobTxn BlockJobTxn;
>>
>>  typedef struct BlockDriverInfo {
>>      /* in bytes, 0 if irrelevant */
>
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>> @@ -37,7 +37,7 @@
>>  #include "sysemu/sysemu.h"
>>  #include "sysemu/block-backend.h"
>>  #include "block/block_int.h"
>> -#include "block/blockjob.h"
>> +#include "block/blockjob_int.h"
>>  #include "block/qapi.h"
>>  #include "crypto/init.h"
>>  #include "trace/control.h"
>
> qemu-img doesn't compile without including blockjob_int.h, but that's
> just a sign that the split isn't completely right yet. Maybe it needs to
> use the pulic function block_job_query() to get the information it takes
> directly from the BlockJob struct.
>
> Kevin
>

Yes, you caught me. I'd need to spend a little more time shining up 
qemu-img, which just takes time away from that FDC rev-- I'm kidding. 
I'll fix this up at your request.

--js

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/11] blockjobs: fix documentation
  2016-10-05 15:03   ` Kevin Wolf
@ 2016-10-05 16:22     ` John Snow
  0 siblings, 0 replies; 51+ messages in thread
From: John Snow @ 2016-10-05 16:22 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, vsementsov, famz, stefanha, jcody, eblake, qemu-devel



On 10/05/2016 11:03 AM, Kevin Wolf wrote:
> Am 01.10.2016 um 00:00 hat John Snow geschrieben:
>> Wrong function names in documentation.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  include/block/blockjob_int.h | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
>> index 0a2d41e..c6da7e4 100644
>> --- a/include/block/blockjob_int.h
>> +++ b/include/block/blockjob_int.h
>> @@ -305,7 +305,7 @@ void block_job_enter(BlockJob *job);
>>  void block_job_event_cancelled(BlockJob *job);
>>
>>  /**
>> - * block_job_ready:
>> + * block_job_event_completed:
>
> Yes...
>
>>   * @job: The job which is now ready to complete.
>
> ...that was half of the fix.
>

:(

>>   * @msg: Error message. Only present on failure.
>>   *
>
> Kevin
>

I'm sorry for not double-checking this one.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions
  2016-10-05 13:43   ` Kevin Wolf
@ 2016-10-05 18:49     ` John Snow
  2016-10-05 19:24       ` Eric Blake
                         ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: John Snow @ 2016-10-05 18:49 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: vsementsov, famz, qemu-block, jcody, qemu-devel, stefanha,
	Eric Blake, Wen Congyang



On 10/05/2016 09:43 AM, Kevin Wolf wrote:
> Am 01.10.2016 um 00:00 hat John Snow geschrieben:
>> There's no reason to leave this to blockdev; we can do it in blockjobs
>> directly and get rid of an extra callback for most users.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  blockdev.c | 37 ++++++-------------------------------
>>  blockjob.c | 16 ++++++++++++++--
>>  2 files changed, 20 insertions(+), 33 deletions(-)
>>
>> diff --git a/blockdev.c b/blockdev.c
>> index 29c6561..03200e7 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -2957,31 +2957,6 @@ out:
>>      aio_context_release(aio_context);
>>  }
>>
>> -static void block_job_cb(void *opaque, int ret)
>> -{
>> -    /* Note that this function may be executed from another AioContext besides
>> -     * the QEMU main loop.  If you need to access anything that assumes the
>> -     * QEMU global mutex, use a BH or introduce a mutex.
>> -     */
>> -
>> -    BlockDriverState *bs = opaque;
>> -    const char *msg = NULL;
>> -
>> -    trace_block_job_cb(bs, bs->job, ret);
>
> This trace event is removed from the code, but not from trace-events.
>
>> -
>> -    assert(bs->job);
>> -
>> -    if (ret < 0) {
>> -        msg = strerror(-ret);
>> -    }
>> -
>> -    if (block_job_is_cancelled(bs->job)) {
>> -        block_job_event_cancelled(bs->job);
>> -    } else {
>> -        block_job_event_completed(bs->job, msg);
>
> block_job_event_cancelled/completed can become static now.
>
>> -    }
>> -}
>> -
>>  void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
>>                        bool has_base, const char *base,
>>                        bool has_backing_file, const char *backing_file,
>> @@ -3033,7 +3008,7 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
>>      base_name = has_backing_file ? backing_file : base_name;
>>
>>      stream_start(has_job_id ? job_id : NULL, bs, base_bs, base_name,
>> -                 has_speed ? speed : 0, on_error, block_job_cb, bs, &local_err);
>> +                 has_speed ? speed : 0, on_error, NULL, bs, &local_err);
>
> Passing cb == NULL, but opaque != NULL is harmless, but feels odd.
>

Yes.

> And actually this is the only caller of stream_start, so the parameters
> could just be dropped.
>

OK. I left the parameters in on purpose, but they can be re-added in the 
future if desired.

(Hm, maybe as part of a CommonJobOpts parameter someday?)

>>      if (local_err) {
>>          error_propagate(errp, local_err);
>>          goto out;
>> @@ -3136,10 +3111,10 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
>>              goto out;
>>          }
>>          commit_active_start(has_job_id ? job_id : NULL, bs, base_bs, speed,
>> -                            on_error, block_job_cb, bs, &local_err, false);
>> +                            on_error, NULL, bs, &local_err, false);
>
> Here we have an additional caller in block/replication.c and qemu-img,
> so the parameters must stay. For qemu-img, nothing changes. For
> replication, the block job events are added as a side effect.
>
> Not sure if we want to emit such events for an internal block job, but
> if we do want the change, it should be explicit.
>

Hmm, do we want to make it so some jobs are invisible and others are 
not? Because as it stands right now, neither case is strictly true. We 
only emit cancelled/completed events if it was started via QMP, however 
we do emit events for error and ready regardless of who started the job.

That didn't seem particularly consistent to me; either all events should 
be controlled by the job layer itself or none of them should be.

I opted for "all."

For "internal" jobs that did not previously emit any events, is it not 
true that these jobs still appear in the block job list and are 
effectively public regardless? I'd argue that these messages may be of 
value for management utilities who are still blocked by these jobs 
whether or not they are 'internal' or not.

I'll push for keeping it mandatory and explicit. If it becomes a 
problem, we can always add a 'silent' job property that silences ALL qmp 
events, including all completion, error, and ready notices.

I've CC'd Wen Congyang and Eric Blake to talk me down if they wish.

>>      } else {
>>          commit_start(has_job_id ? job_id : NULL, bs, base_bs, top_bs, speed,
>> -                     on_error, block_job_cb, bs,
>> +                     on_error, NULL, bs,
>>                       has_backing_file ? backing_file : NULL, &local_err);
>
> Like stream_start, drop the parameters.
>
>>      }
>>      if (local_err != NULL) {
>> @@ -3260,7 +3235,7 @@ static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
>>
>>      backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
>>                   bmap, backup->compress, backup->on_source_error,
>> -                 backup->on_target_error, block_job_cb, bs, txn, &local_err);
>> +                 backup->on_target_error, NULL, bs, txn, &local_err);
>>      bdrv_unref(target_bs);
>>      if (local_err != NULL) {
>>          error_propagate(errp, local_err);
>> @@ -3330,7 +3305,7 @@ void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
>>      }
>>      backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
>>                   NULL, backup->compress, backup->on_source_error,
>> -                 backup->on_target_error, block_job_cb, bs, txn, &local_err);
>> +                 backup->on_target_error, NULL, bs, txn, &local_err);
>
> Backup is another job used by replication, too. Same question as above.
>
>>      if (local_err != NULL) {
>>          error_propagate(errp, local_err);
>>      }
>> @@ -3410,7 +3385,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>>                   has_replaces ? replaces : NULL,
>>                   speed, granularity, buf_size, sync, backing_mode,
>>                   on_source_error, on_target_error, unmap,
>> -                 block_job_cb, bs, errp);
>> +                 NULL, bs, errp);
>
> And again, the parameters can be dropped.
>
> Kevin
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions
  2016-10-05 18:49     ` John Snow
@ 2016-10-05 19:24       ` Eric Blake
  2016-10-05 21:00         ` John Snow
  2016-10-06  7:44       ` Kevin Wolf
  2016-10-11  9:50       ` Markus Armbruster
  2 siblings, 1 reply; 51+ messages in thread
From: Eric Blake @ 2016-10-05 19:24 UTC (permalink / raw)
  To: John Snow, Kevin Wolf
  Cc: vsementsov, famz, qemu-block, jcody, qemu-devel, stefanha, Wen Congyang

[-- Attachment #1: Type: text/plain, Size: 2315 bytes --]

On 10/05/2016 01:49 PM, John Snow wrote:

>> Here we have an additional caller in block/replication.c and qemu-img,
>> so the parameters must stay. For qemu-img, nothing changes. For
>> replication, the block job events are added as a side effect.
>>
>> Not sure if we want to emit such events for an internal block job, but
>> if we do want the change, it should be explicit.
>>
> 
> Hmm, do we want to make it so some jobs are invisible and others are
> not? Because as it stands right now, neither case is strictly true. We
> only emit cancelled/completed events if it was started via QMP, however
> we do emit events for error and ready regardless of who started the job.

Libvirt tries to mirror any block job event it receives to upper layers.
 But if it cannot figure out which upper-layer disk the event is
associated with, it just drops the event.  So I think that from the
libvirt perspective, you are okay if if you always report job events,
even for internal jobs.  (Do we have a quick and easy way to set up an
internal job event, to double-check if I can produce some sort of
libvirt scenario to trigger the event and see that it gets safely ignored?)

> 
> That didn't seem particularly consistent to me; either all events should
> be controlled by the job layer itself or none of them should be.
> 
> I opted for "all."
> 
> For "internal" jobs that did not previously emit any events, is it not
> true that these jobs still appear in the block job list and are
> effectively public regardless? I'd argue that these messages may be of
> value for management utilities who are still blocked by these jobs
> whether or not they are 'internal' or not.
> 
> I'll push for keeping it mandatory and explicit. If it becomes a
> problem, we can always add a 'silent' job property that silences ALL qmp
> events, including all completion, error, and ready notices.

Completely silencing an internal job seems a little cleaner than having
events for the job but not being able to query it.  But if nothing
breaks by exposing the internal jobs, that seems even easier than trying
to decide which jobs are internal and hidden vs. user-requested and public.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions
  2016-10-05 19:24       ` Eric Blake
@ 2016-10-05 21:00         ` John Snow
  2016-10-10 16:45           ` Kashyap Chamarthy
  0 siblings, 1 reply; 51+ messages in thread
From: John Snow @ 2016-10-05 21:00 UTC (permalink / raw)
  To: Eric Blake, Kevin Wolf
  Cc: vsementsov, famz, qemu-block, jcody, qemu-devel, stefanha, Wen Congyang



On 10/05/2016 03:24 PM, Eric Blake wrote:
> On 10/05/2016 01:49 PM, John Snow wrote:
>
>>> Here we have an additional caller in block/replication.c and qemu-img,
>>> so the parameters must stay. For qemu-img, nothing changes. For
>>> replication, the block job events are added as a side effect.
>>>
>>> Not sure if we want to emit such events for an internal block job, but
>>> if we do want the change, it should be explicit.
>>>
>>
>> Hmm, do we want to make it so some jobs are invisible and others are
>> not? Because as it stands right now, neither case is strictly true. We
>> only emit cancelled/completed events if it was started via QMP, however
>> we do emit events for error and ready regardless of who started the job.
>
> Libvirt tries to mirror any block job event it receives to upper layers.
>  But if it cannot figure out which upper-layer disk the event is
> associated with, it just drops the event.  So I think that from the
> libvirt perspective, you are okay if if you always report job events,
> even for internal jobs.  (Do we have a quick and easy way to set up an
> internal job event, to double-check if I can produce some sort of
> libvirt scenario to trigger the event and see that it gets safely ignored?)
>

Not in a QEMU release yet, I think.

>>
>> That didn't seem particularly consistent to me; either all events should
>> be controlled by the job layer itself or none of them should be.
>>
>> I opted for "all."
>>
>> For "internal" jobs that did not previously emit any events, is it not
>> true that these jobs still appear in the block job list and are
>> effectively public regardless? I'd argue that these messages may be of
>> value for management utilities who are still blocked by these jobs
>> whether or not they are 'internal' or not.
>>
>> I'll push for keeping it mandatory and explicit. If it becomes a
>> problem, we can always add a 'silent' job property that silences ALL qmp
>> events, including all completion, error, and ready notices.
>
> Completely silencing an internal job seems a little cleaner than having
> events for the job but not being able to query it.  But if nothing
> breaks by exposing the internal jobs, that seems even easier than trying
> to decide which jobs are internal and hidden vs. user-requested and public.
>

Well, at the moment anything requested directly via blockdev.c is 
"public." Before 2.8, all jobs were public ones, with the exception of 
those in qemu-img which is a bit of a different/special case.

We have this block/replication.c beast now, though, and it uses 
backup_start and commit_active_start as it sees fit without direct user 
intervention.

As it stands, I believe the jobs that replication creates are 
user-visible via query, will not issue cancellation or completion 
events, but WILL emit error events. It may emit ready events for the 
mirror job it uses, but I haven't traced that. (It could, at least.)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions
  2016-10-05 18:49     ` John Snow
  2016-10-05 19:24       ` Eric Blake
@ 2016-10-06  7:44       ` Kevin Wolf
  2016-10-06 16:57         ` John Snow
  2016-10-11  9:50       ` Markus Armbruster
  2 siblings, 1 reply; 51+ messages in thread
From: Kevin Wolf @ 2016-10-06  7:44 UTC (permalink / raw)
  To: John Snow
  Cc: vsementsov, famz, qemu-block, jcody, qemu-devel, stefanha,
	Eric Blake, Wen Congyang

Am 05.10.2016 um 20:49 hat John Snow geschrieben:
> On 10/05/2016 09:43 AM, Kevin Wolf wrote:
> >Am 01.10.2016 um 00:00 hat John Snow geschrieben:
> >>@@ -3136,10 +3111,10 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
> >>             goto out;
> >>         }
> >>         commit_active_start(has_job_id ? job_id : NULL, bs, base_bs, speed,
> >>-                            on_error, block_job_cb, bs, &local_err, false);
> >>+                            on_error, NULL, bs, &local_err, false);
> >
> >Here we have an additional caller in block/replication.c and qemu-img,
> >so the parameters must stay. For qemu-img, nothing changes. For
> >replication, the block job events are added as a side effect.
> >
> >Not sure if we want to emit such events for an internal block job, but
> >if we do want the change, it should be explicit.
> >
> 
> Hmm, do we want to make it so some jobs are invisible and others are
> not? Because as it stands right now, neither case is strictly true.
> We only emit cancelled/completed events if it was started via QMP,
> however we do emit events for error and ready regardless of who
> started the job.
> 
> That didn't seem particularly consistent to me; either all events
> should be controlled by the job layer itself or none of them should
> be.

Yes, I agree. The use of block jobs in replication is rather broken and
we should change it one way or another. But I'd prefer to do so
explicitly instead of doing it as a side-effect of a patch like this
one.

> I opted for "all."
> 
> For "internal" jobs that did not previously emit any events, is it
> not true that these jobs still appear in the block job list and are
> effectively public regardless? I'd argue that these messages may be
> of value for management utilities who are still blocked by these
> jobs whether or not they are 'internal' or not.
> 
> I'll push for keeping it mandatory and explicit. If it becomes a
> problem, we can always add a 'silent' job property that silences ALL
> qmp events, including all completion, error, and ready notices.

Actually, there is at least one other reason why the block jobs in
replication are a bad a idea as they are today: Job naming. Currently
they use a fixed string, conflicting with the user-controlled job
namespace and with itself (i.e. restricting replication to a single
disk).

And are we really prepared to handle cases where the user decides to
pause, complete or cancel an internal job?

I think we should really hide them from the user. And maybe the way to
do so isn't a bool job->user flag, but actually job->id = NULL. Then it
would work the same way as named/internal BlockBackends do and we would
get rid of the naming problem, too.

Kevin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions
  2016-10-06  7:44       ` Kevin Wolf
@ 2016-10-06 16:57         ` John Snow
  2016-10-06 18:16           ` Eric Blake
  0 siblings, 1 reply; 51+ messages in thread
From: John Snow @ 2016-10-06 16:57 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: vsementsov, famz, qemu-block, jcody, qemu-devel, stefanha,
	Wen Congyang, Eric Blake



On 10/06/2016 03:44 AM, Kevin Wolf wrote:
> Am 05.10.2016 um 20:49 hat John Snow geschrieben:
>> On 10/05/2016 09:43 AM, Kevin Wolf wrote:
>>> Am 01.10.2016 um 00:00 hat John Snow geschrieben:
>>>> @@ -3136,10 +3111,10 @@ void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
>>>>             goto out;
>>>>         }
>>>>         commit_active_start(has_job_id ? job_id : NULL, bs, base_bs, speed,
>>>> -                            on_error, block_job_cb, bs, &local_err, false);
>>>> +                            on_error, NULL, bs, &local_err, false);
>>>
>>> Here we have an additional caller in block/replication.c and qemu-img,
>>> so the parameters must stay. For qemu-img, nothing changes. For
>>> replication, the block job events are added as a side effect.
>>>
>>> Not sure if we want to emit such events for an internal block job, but
>>> if we do want the change, it should be explicit.
>>>
>>
>> Hmm, do we want to make it so some jobs are invisible and others are
>> not? Because as it stands right now, neither case is strictly true.
>> We only emit cancelled/completed events if it was started via QMP,
>> however we do emit events for error and ready regardless of who
>> started the job.
>>
>> That didn't seem particularly consistent to me; either all events
>> should be controlled by the job layer itself or none of them should
>> be.
>
> Yes, I agree. The use of block jobs in replication is rather broken and
> we should change it one way or another. But I'd prefer to do so
> explicitly instead of doing it as a side-effect of a patch like this
> one.
>

I can always split this patch out and CC Wen, Eric, Markus et al and 
adjust the commit message to be explicit.

>> I opted for "all."
>>
>> For "internal" jobs that did not previously emit any events, is it
>> not true that these jobs still appear in the block job list and are
>> effectively public regardless? I'd argue that these messages may be
>> of value for management utilities who are still blocked by these
>> jobs whether or not they are 'internal' or not.
>>
>> I'll push for keeping it mandatory and explicit. If it becomes a
>> problem, we can always add a 'silent' job property that silences ALL
>> qmp events, including all completion, error, and ready notices.
>
> Actually, there is at least one other reason why the block jobs in
> replication are a bad a idea as they are today: Job naming. Currently
> they use a fixed string, conflicting with the user-controlled job
> namespace and with itself (i.e. restricting replication to a single
> disk).
>
> And are we really prepared to handle cases where the user decides to
> pause, complete or cancel an internal job?
>
> I think we should really hide them from the user. And maybe the way to
> do so isn't a bool job->user flag, but actually job->id = NULL. Then it
> would work the same way as named/internal BlockBackends do and we would
> get rid of the naming problem, too.
>
> Kevin
>

Mirrors "internal bitmaps," too.

I can rig it such that if a job has no ID, it will cease to show up via 
query and no longer emit events.

Downside: Whether or not a device is busy or can accept another job 
becomes opaque to the management layer.

--js

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions
  2016-10-06 16:57         ` John Snow
@ 2016-10-06 18:16           ` Eric Blake
  2016-10-06 18:19             ` John Snow
  0 siblings, 1 reply; 51+ messages in thread
From: Eric Blake @ 2016-10-06 18:16 UTC (permalink / raw)
  To: John Snow, Kevin Wolf
  Cc: vsementsov, famz, qemu-block, jcody, qemu-devel, stefanha, Wen Congyang

[-- Attachment #1: Type: text/plain, Size: 603 bytes --]

On 10/06/2016 11:57 AM, John Snow wrote:
> Mirrors "internal bitmaps," too.
> 
> I can rig it such that if a job has no ID, it will cease to show up via
> query and no longer emit events.
> 
> Downside: Whether or not a device is busy or can accept another job
> becomes opaque to the management layer.

A bit, but isn't the point of this exercise to ultimately reach the
point where we can support multiple simultaneous jobs, so long as
op-blockers prove the jobs don't collide?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions
  2016-10-06 18:16           ` Eric Blake
@ 2016-10-06 18:19             ` John Snow
  0 siblings, 0 replies; 51+ messages in thread
From: John Snow @ 2016-10-06 18:19 UTC (permalink / raw)
  To: Eric Blake, Kevin Wolf
  Cc: vsementsov, famz, qemu-block, jcody, qemu-devel, stefanha, Wen Congyang



On 10/06/2016 02:16 PM, Eric Blake wrote:
> On 10/06/2016 11:57 AM, John Snow wrote:
>> Mirrors "internal bitmaps," too.
>>
>> I can rig it such that if a job has no ID, it will cease to show up via
>> query and no longer emit events.
>>
>> Downside: Whether or not a device is busy or can accept another job
>> becomes opaque to the management layer.
>
> A bit, but isn't the point of this exercise to ultimately reach the
> point where we can support multiple simultaneous jobs, so long as
> op-blockers prove the jobs don't collide?
>

Yes, exactly. It's only a temporary drawback. If libvirt can cope for 
now, I am happy.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/11] blockjobs: Always use block_job_get_aio_context
  2016-10-05 14:02   ` Kevin Wolf
@ 2016-10-06 20:22     ` John Snow
  2016-10-07  7:49       ` Paolo Bonzini
  0 siblings, 1 reply; 51+ messages in thread
From: John Snow @ 2016-10-06 20:22 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: vsementsov, famz, qemu-block, jcody, qemu-devel, stefanha, Paolo Bonzini



On 10/05/2016 10:02 AM, Kevin Wolf wrote:
> Am 01.10.2016 um 00:00 hat John Snow geschrieben:
>> There are a few places where we're fishing it out for ourselves.
>> Let's not do that and instead use the helper.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>
> That change makes a difference when the block job is running its
> completion part after block_job_defer_to_main_loop(). The commit message
> could be more explicit about whether this is intentional or whether this
> case is expected to happen at all.
>
> I suspect that if it can happen, this is a bug fix. Please check and
> update the commit message accordingly.
>

Because I'm bad with being concise, I wrote a TLDR at the bottom. 
Otherwise, enjoy this wall of text.

> Kevin
>

Intentional under the premise of:

(1) Acquiring the context for which a job is not actually running under 
is likely incorrect (or at the very least misleading), and

(2) If using the main thread context for any would-be callers is 
incorrect, this is a problem with the job lifetime that needs to be 
corrected anyway.


In general, if we are acquiring the context to secure exclusive access 
to the BlockJob state itself, using the getter here is perfectly safe. 
If we are acquiring context for other reasons, we need to consider more 
carefully.


The callers are:

(A) bdrv_drain_all (block/io)

	Obtains context for the sake of pause/resume. Pauses all jobs before 
draining all BDSes. For starters, pausing a job that has deferred to 
main has no effect (and neither does resuming). This usage appears 
slightly erroneous, though, in that if we are not running from the main 
thread, we are definitely not securing exclusive rights to the block 
job. We could, in theory, race on reads/writes to the pause count field. 
This would be a bugfix.

(B) find_block_job (all monitor context)

	Acquires context as a courtesy for its callers:
	- qmp_block_job_set_speed
	- qmp_block_job_cancel
	- qmp_block_job_pause
	- qmp_block_job_resume
	- qmp_block_job_complete

In an "already deferred to main" sense...  in general, if the job has 
already deferred to main we don't need to acquire the block's context to 
get safe access to the job, because we're already running in the main 
context. Further, none of these functions actually have any meaning for 
a job in such a state.

	- set_speed: Sets speed parameters, harmless either way.
	- cancel: Will set the cancelled boolean, reset iostatus, then attempt 
to enter the job. Since a job that enters the main loop remains busy, 
the enter is a NOP. The BlockBackend AIO context here is therefore 
extraneous, and the getter is safe.
	- pause: Only increments a counter, and will have no effect.
	- resume: Decrements a counter. Attempts to enter(), but as stated 
above this is a NOP.
	- complete: Calls .complete(), for which the only implementation is 
mirror_complete. Uh, this actually seems messy. Looks like there's 
nothing to prevent us from calling this after we've already told it to 
complete once. This could be a legitimate bug that this patch does 
nothing in particular to address. If complete() is shored up such that 
it can be called precisely once, this becomes safe.

(C) qmp_query_block_jobs (monitor context)

	Just a getter. Using get_context is safe in either state.

(D) run_block_job (qemu-img)

	Never called when the context is in the main loop anyway. Effectively 
no change here.



So, with the exception of .complete, I think this is a safe change as it 
stands... However... Paolo wants to complicate my life and get rid of 
this getter for his own fiendish purposes. He suggests pushing down 
context acquisition into blockjob.c directly for any QMP callers:

- qmp_block_job_set_speed -> block_job_set_speed
- qmp_block_job_cancel -> block_job_cancel
- qmp_block_job_pause -> block_job_user_pause
- qmp_block_job_resume -> block_job_user_resume
- qmp_block_job_complete -> block_job_complete
- qmp_query_block_jobs -> block_job_query

Most of these have only one caller in the QMP layer:

block_job_set_speed
block_job_user_pause
block_job_user_resume
block_job_query

These can easily just take the context they need, removing external uses 
of job->blk for purposes of acquiring the context.

block_job_cancel and block_job_complete are different.

block_job_cancel is called in many places, but we can just add a similar 
block_job_user_cancel if we wanted a version which takes care to acquire 
context and one that does not. (Or we could just acquire the context 
regardless, but Paolo warned me ominously that recursive locks are EVIL. 
He sounded serious.)

block_job_complete has no direct callers outside of QMP, but it is also 
used as a callback by block_job_complete_sync, used in qemu-img for 
run_block_job. I can probably rewrite qemu_img to avoid this usage.



TLDR:
- This change should be perfectly safe, but Paolo wants to get rid of 
this usage anyway.
- At least 5/6 uses of external context grabbing can be internalized easily.
- qemu-img's run_block_job needs to be refactored a bit, though I don't 
have an idea for that yet, but as you pointed out it needs to be done 
for the public/private split anyway.
- block_job_complete needs to be touched up no matter what we do.
- The aio_context getter can probably be removed entirely as per Paolo's 
wishes, but I'll have to change bdrv_drain_all a bit. A 
block_job_pause_all and block_job_resume all would work, though that's a 
bit special purpose. I could craft up a block_job_apply_all for the 
purpose instead. (e.g. block_job_apply_all(block_job_pause))

I think that answers everyone's questions...
--js

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/11] blockjob: add block_job_start
  2016-10-05 15:17   ` Kevin Wolf
@ 2016-10-06 22:44     ` John Snow
  2016-10-17 18:00       ` John Snow
  0 siblings, 1 reply; 51+ messages in thread
From: John Snow @ 2016-10-06 22:44 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, vsementsov, famz, stefanha, jcody, eblake, qemu-devel



On 10/05/2016 11:17 AM, Kevin Wolf wrote:
> Am 01.10.2016 um 00:00 hat John Snow geschrieben:
>> Instead of automatically starting jobs at creation time via backup_start
>> et al, we'd like to return a job object pointer that can be started
>> manually at later point in time.
>>
>> For now, add the block_job_start mechanism and start the jobs
>> automatically as we have been doing, with conversions job-by-job coming
>> in later patches.
>>
>> Of note: cancellation of unstarted jobs will perform all the normal
>> cleanup as if the job had started, particularly abort and clean. The
>> only difference is that we will not emit any events, because the job
>> never actually started.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>
> Should we make sure that jobs are only added to the block_jobs list once
> they are started? It doesn't sound like a good idea to make a job
> without a coroutine user-accessible.
>
> Kevin
>

That would certainly help limit exposure to a potentially dangerous 
object, but some operations on these un-started jobs are still perfectly 
reasonable, like cancellation. Even things like "set speed" are 
perfectly reasonable on an unstarted job.

I'd rather just verify that having an accessible job cannot be operated 
on unsafely via the public interface, even though that's more work.

So here's the list:

-block_job_next: Harmless.

-block_job_get: Harmless.

-block_job_set_speed: Depends on the set_speed implementation, but 
backup_set_speed makes no assumptions and that's the only job I am 
attempting to convert in this series.

-block_job_cancel: Edited to specifically support pre-started jobs in 
this patch.

-block_job_complete: Edited to check for a coroutine specifically, but 
even without, a job will not be able to become ready without running first.

-block_job_query: Harmless* (*I could probably expose a 'started' 
variable so that libvirt didn't get confused as to why there are jobs 
that exist but are not busy nor paused.)

-block_job_pause: Harmless**

-block_job_user_pause: Harmless**

-block_job_user_paused: Harmless, if meaningless.

-block_job_resume: **We will attempt to call block_job_enter, but 
conditional on job->co, we do nothing, so it's harmless if not 
misleading that you can pause/resume to your heart's content.

-block_job_user_resume: ^ http://i.imgur.com/2zYxrIe.png ^

-block_job_cancel_sync: Safe, due to edits to block_job_cancel. Polling 
loop WILL complete as normal, because all jobs will finish through 
block_job_completed one way or another.

-block_job_cancel_sync_all: As safe as the above.

-block_job_complete_sync: Safe, complete will return error for unstarted 
jobs.

-block_job_iostatus_reset: Harmless, I think -- backup does not 
implement this method. (Huh, *no* jobs implement iostatus_reset at the 
moment...)

-block_job_txn_new: Doesn't operate on jobs.

-block_job_txn_unref: Also doesn't operate on jobs.

-block_job_get_aio_context: Harmless.

-block_job_txn_add_job: Safe and intended! There is trickery here, 
though, as once a job is introduced into transactions it opens it up to 
the private interface. This adds the following functions to considerations:

-block_job_completed: Safe, does not assume a coroutine anywhere.

-block_job_completed_single: Specifically updated to be context-aware of 
if we are pre-started or not. This is the "real" completion mechanism 
for BlockJobs that gets run for transactional OR individual jobs.

-block_job_completed_txn_abort: ***BUG***! The problem with the patch as 
I've sent it is that cancel calls completed (under the premise that 
nobody else would ever get to be able to), but we call both cancel AND 
completed from this function, which will cause us to call completed 
twice on pre-started jobs. I will fix this for the next version.

-block_job_completed_txn_success: Should never be called; if it IS, the 
presence of an unstarted job in the transaction will cause an early 
return. And even if I am utterly wrong and every job in the transaction 
completes successfully (somehow?) calling block_job_completed_single is 
perfectly safe by design.


Everything else is either internal to block job instances or the 
blockjob core.


There may be:
(A) Bugs in my code/thinking, or
(B) Improvements we can make to the readability,

but I believe that this is (Apart from the mentioned bug) not dangerous.

--js

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/11] blockjobs: Always use block_job_get_aio_context
  2016-10-06 20:22     ` John Snow
@ 2016-10-07  7:49       ` Paolo Bonzini
  2016-10-13  0:49         ` John Snow
  0 siblings, 1 reply; 51+ messages in thread
From: Paolo Bonzini @ 2016-10-07  7:49 UTC (permalink / raw)
  To: John Snow, Kevin Wolf; +Cc: vsementsov, famz, qemu-block, qemu-devel, stefanha



On 06/10/2016 22:22, John Snow wrote:
> Calls .complete(), for which the only implementation is
> mirror_complete. Uh, this actually seems messy. Looks like there's
> nothing to prevent us from calling this after we've already told it to
> complete once.

Yeah, it should have an

    if (s->should_complete) {
        return;
    }

at the beginning.  I have other mirror.c patches in my queue so I can
take care of that.

> block_job_cancel and block_job_complete are different.
> 
> block_job_cancel is called in many places, but we can just add a similar
> block_job_user_cancel if we wanted a version which takes care to acquire
> context and one that does not. (Or we could just acquire the context
> regardless, but Paolo warned me ominously that recursive locks are EVIL.
> He sounded serious.)

Not that many places:

- block_job_finish_sync calls it, and you can just release/reacquire
around the call to "finish(job, &local_err)".

- there are two callers in blockdev.c, and you can just remove the
acquire/release from blockdev.c if you push it in block_job_cancel.

As to block_job_cancel_sync:

- replication_stop is not acquiring s->secondary_disk->bs's AioContext.

- there is no need to hold the AioContext between ->prepare and ->clean.
 My suggestion is to ref the blockjob after storing it in state->job
(you probably should do that anyway) and unref'ing it in ->clean.  Then
you can call again let block_job_cancel_sync(bs->job) take the
AioContext, which it will do in block_job_finish_sync.

> block_job_complete has no direct callers outside of QMP, but it is also
> used as a callback by block_job_complete_sync, used in qemu-img for
> run_block_job. I can probably rewrite qemu_img to avoid this usage.

No need to: qemu-img is not acquiring the AioContext, so it's okay to
let block_job_complete do that (like block_job_cancel,
block_job_complete will be called by block_job_finish_sync without the
AioContext acquired).

Paolo

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/11] blockjob: refactor backup_start as backup_job_create
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 10/11] blockjob: refactor backup_start as backup_job_create John Snow
@ 2016-10-07 18:39   ` John Snow
  2016-10-10  8:57     ` Kevin Wolf
  0 siblings, 1 reply; 51+ messages in thread
From: John Snow @ 2016-10-07 18:39 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, famz, jcody, qemu-devel, stefanha



On 09/30/2016 06:00 PM, John Snow wrote:
> Refactor backup_start as backup_job_create, which only creates the job,
> but does not automatically start it. The old interface, 'backup_start',
> is not kept in favor of limiting the number of nearly-identical iterfaces
> that would have to be edited to keep up with QAPI changes in the future.
>
> Callers that wish to synchronously start the backup_block_job can
> instead just call block_job_start immediately after calling
> backup_job_create.
>
> Transactions are updated to use the new interface, calling block_job_start
> only during the .commit phase, which helps prevent race conditions where
> jobs may finish before we even finish building the transaction. This may
> happen, for instance, during empty block backup jobs.
>

Sadly for me, I realized this patch has a potential problem. When we 
were adding the bitmap operations, it became clear that the atomicity 
point was during .prepare, not .commit.

e.g. the bitmap is cleared or created during prepare, and backup_run 
installs its Write Notifier at that point in time, too.

By changing BlockJobs to only run on commit, we've severed the atomicity 
point such that some actions will take effect during prepare, and others 
at commit.

I still think it's the correct thing to do to delay the BlockJobs until 
the commit phase, so I will start auditing the code to see how hard it 
is to shift the atomicity point to commit instead. If it's possible to 
do that, I think from the POV of the managing application, having the 
atomicity point be

Feel free to chime in with suggestions and counterpoints until then.

--js

> Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  block/backup.c            | 26 ++++++++-------
>  block/replication.c       | 11 ++++---
>  blockdev.c                | 81 +++++++++++++++++++++++++++++++----------------
>  include/block/block_int.h | 21 ++++++------
>  4 files changed, 86 insertions(+), 53 deletions(-)
>
> diff --git a/block/backup.c b/block/backup.c
> index 7294169..aad69eb 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -527,7 +527,7 @@ static const BlockJobDriver backup_job_driver = {
>      .attached_aio_context   = backup_attached_aio_context,
>  };
>
> -void backup_start(const char *job_id, BlockDriverState *bs,
> +BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>                    BlockDriverState *target, int64_t speed,
>                    MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
>                    bool compress,
> @@ -546,52 +546,52 @@ void backup_start(const char *job_id, BlockDriverState *bs,
>
>      if (bs == target) {
>          error_setg(errp, "Source and target cannot be the same");
> -        return;
> +        return NULL;
>      }
>
>      if (!bdrv_is_inserted(bs)) {
>          error_setg(errp, "Device is not inserted: %s",
>                     bdrv_get_device_name(bs));
> -        return;
> +        return NULL;
>      }
>
>      if (!bdrv_is_inserted(target)) {
>          error_setg(errp, "Device is not inserted: %s",
>                     bdrv_get_device_name(target));
> -        return;
> +        return NULL;
>      }
>
>      if (compress && target->drv->bdrv_co_pwritev_compressed == NULL) {
>          error_setg(errp, "Compression is not supported for this drive %s",
>                     bdrv_get_device_name(target));
> -        return;
> +        return NULL;
>      }
>
>      if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) {
> -        return;
> +        return NULL;
>      }
>
>      if (bdrv_op_is_blocked(target, BLOCK_OP_TYPE_BACKUP_TARGET, errp)) {
> -        return;
> +        return NULL;
>      }
>
>      if (sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
>          if (!sync_bitmap) {
>              error_setg(errp, "must provide a valid bitmap name for "
>                               "\"incremental\" sync mode");
> -            return;
> +            return NULL;
>          }
>
>          /* Create a new bitmap, and freeze/disable this one. */
>          if (bdrv_dirty_bitmap_create_successor(bs, sync_bitmap, errp) < 0) {
> -            return;
> +            return NULL;
>          }
>      } else if (sync_bitmap) {
>          error_setg(errp,
>                     "a sync_bitmap was provided to backup_run, "
>                     "but received an incompatible sync_mode (%s)",
>                     MirrorSyncMode_lookup[sync_mode]);
> -        return;
> +        return NULL;
>      }
>
>      len = bdrv_getlength(bs);
> @@ -638,8 +638,8 @@ void backup_start(const char *job_id, BlockDriverState *bs,
>      bdrv_op_block_all(target, job->common.blocker);
>      job->common.len = len;
>      block_job_txn_add_job(txn, &job->common);
> -    block_job_start(&job->common);
> -    return;
> +
> +    return &job->common;
>
>   error:
>      if (sync_bitmap) {
> @@ -649,4 +649,6 @@ void backup_start(const char *job_id, BlockDriverState *bs,
>          blk_unref(job->target);
>          block_job_unref(&job->common);
>      }
> +
> +    return NULL;
>  }
> diff --git a/block/replication.c b/block/replication.c
> index b604b93..d9cdc36 100644
> --- a/block/replication.c
> +++ b/block/replication.c
> @@ -409,6 +409,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
>      int64_t active_length, hidden_length, disk_length;
>      AioContext *aio_context;
>      Error *local_err = NULL;
> +    BlockJob *job;
>
>      aio_context = bdrv_get_aio_context(bs);
>      aio_context_acquire(aio_context);
> @@ -496,16 +497,18 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
>          bdrv_op_block_all(top_bs, s->blocker);
>          bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
>
> -        backup_start("replication-backup", s->secondary_disk->bs,
> -                     s->hidden_disk->bs, 0, MIRROR_SYNC_MODE_NONE, NULL, false,
> -                     BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
> -                     backup_job_completed, s, NULL, &local_err);
> +        job = backup_job_create("replication-backup", s->secondary_disk->bs,
> +                                s->hidden_disk->bs, 0, MIRROR_SYNC_MODE_NONE,
> +                                NULL, false, BLOCKDEV_ON_ERROR_REPORT,
> +                                BLOCKDEV_ON_ERROR_REPORT, backup_job_completed,
> +                                s, NULL, &local_err);
>          if (local_err) {
>              error_propagate(errp, local_err);
>              backup_job_cleanup(s);
>              aio_context_release(aio_context);
>              return;
>          }
> +        block_job_start(job);
>          break;
>      default:
>          aio_context_release(aio_context);
> diff --git a/blockdev.c b/blockdev.c
> index 0ac507f..37d78d3 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1866,7 +1866,7 @@ typedef struct DriveBackupState {
>      BlockJob *job;
>  } DriveBackupState;
>
> -static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
> +static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
>                              Error **errp);
>
>  static void drive_backup_prepare(BlkActionState *common, Error **errp)
> @@ -1890,23 +1890,27 @@ static void drive_backup_prepare(BlkActionState *common, Error **errp)
>      bdrv_drained_begin(bs);
>      state->bs = bs;
>
> -    do_drive_backup(backup, common->block_job_txn, &local_err);
> +    state->job = do_drive_backup(backup, common->block_job_txn, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
>          return;
>      }
> +}
>
> -    state->job = state->bs->job;
> +static void drive_backup_commit(BlkActionState *common)
> +{
> +    DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
> +    if (state->job) {
> +        block_job_start(state->job);
> +    }
>  }
>
>  static void drive_backup_abort(BlkActionState *common)
>  {
>      DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
> -    BlockDriverState *bs = state->bs;
>
> -    /* Only cancel if it's the job we started */
> -    if (bs && bs->job && bs->job == state->job) {
> -        block_job_cancel_sync(bs->job);
> +    if (state->job) {
> +        block_job_cancel_sync(state->job);
>      }
>  }
>
> @@ -1927,8 +1931,8 @@ typedef struct BlockdevBackupState {
>      AioContext *aio_context;
>  } BlockdevBackupState;
>
> -static void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
> -                               Error **errp);
> +static BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
> +                                    Error **errp);
>
>  static void blockdev_backup_prepare(BlkActionState *common, Error **errp)
>  {
> @@ -1961,23 +1965,27 @@ static void blockdev_backup_prepare(BlkActionState *common, Error **errp)
>      state->bs = bs;
>      bdrv_drained_begin(state->bs);
>
> -    do_blockdev_backup(backup, common->block_job_txn, &local_err);
> +    state->job = do_blockdev_backup(backup, common->block_job_txn, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
>          return;
>      }
> +}
>
> -    state->job = state->bs->job;
> +static void blockdev_backup_commit(BlkActionState *common)
> +{
> +    BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, common);
> +    if (state->job) {
> +        block_job_start(state->job);
> +    }
>  }
>
>  static void blockdev_backup_abort(BlkActionState *common)
>  {
>      BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, common);
> -    BlockDriverState *bs = state->bs;
>
> -    /* Only cancel if it's the job we started */
> -    if (bs && bs->job && bs->job == state->job) {
> -        block_job_cancel_sync(bs->job);
> +    if (state->job) {
> +        block_job_cancel_sync(state->job);
>      }
>  }
>
> @@ -2127,12 +2135,14 @@ static const BlkActionOps actions[] = {
>      [TRANSACTION_ACTION_KIND_DRIVE_BACKUP] = {
>          .instance_size = sizeof(DriveBackupState),
>          .prepare = drive_backup_prepare,
> +        .commit = drive_backup_commit,
>          .abort = drive_backup_abort,
>          .clean = drive_backup_clean,
>      },
>      [TRANSACTION_ACTION_KIND_BLOCKDEV_BACKUP] = {
>          .instance_size = sizeof(BlockdevBackupState),
>          .prepare = blockdev_backup_prepare,
> +        .commit = blockdev_backup_commit,
>          .abort = blockdev_backup_abort,
>          .clean = blockdev_backup_clean,
>      },
> @@ -3126,11 +3136,13 @@ out:
>      aio_context_release(aio_context);
>  }
>
> -static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
> +static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
> +                                 Error **errp)
>  {
>      BlockDriverState *bs;
>      BlockDriverState *target_bs;
>      BlockDriverState *source = NULL;
> +    BlockJob *job = NULL;
>      BdrvDirtyBitmap *bmap = NULL;
>      AioContext *aio_context;
>      QDict *options = NULL;
> @@ -3159,7 +3171,7 @@ static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
>
>      bs = qmp_get_root_bs(backup->device, errp);
>      if (!bs) {
> -        return;
> +        return NULL;
>      }
>
>      aio_context = bdrv_get_aio_context(bs);
> @@ -3233,9 +3245,10 @@ static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
>          }
>      }
>
> -    backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
> -                 bmap, backup->compress, backup->on_source_error,
> -                 backup->on_target_error, NULL, bs, txn, &local_err);
> +    job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
> +                            backup->sync, bmap, backup->compress,
> +                            backup->on_source_error, backup->on_target_error,
> +                            NULL, bs, txn, &local_err);
>      bdrv_unref(target_bs);
>      if (local_err != NULL) {
>          error_propagate(errp, local_err);
> @@ -3244,11 +3257,17 @@ static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
>
>  out:
>      aio_context_release(aio_context);
> +    return job;
>  }
>
>  void qmp_drive_backup(DriveBackup *arg, Error **errp)
>  {
> -    return do_drive_backup(arg, NULL, errp);
> +
> +    BlockJob *job;
> +    job = do_drive_backup(arg, NULL, errp);
> +    if (job) {
> +        block_job_start(job);
> +    }
>  }
>
>  BlockDeviceInfoList *qmp_query_named_block_nodes(Error **errp)
> @@ -3256,12 +3275,14 @@ BlockDeviceInfoList *qmp_query_named_block_nodes(Error **errp)
>      return bdrv_named_nodes_list(errp);
>  }
>
> -void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
> +BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
> +                             Error **errp)
>  {
>      BlockDriverState *bs;
>      BlockDriverState *target_bs;
>      Error *local_err = NULL;
>      AioContext *aio_context;
> +    BlockJob *job = NULL;
>
>      if (!backup->has_speed) {
>          backup->speed = 0;
> @@ -3281,7 +3302,7 @@ void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
>
>      bs = qmp_get_root_bs(backup->device, errp);
>      if (!bs) {
> -        return;
> +        return NULL;
>      }
>
>      aio_context = bdrv_get_aio_context(bs);
> @@ -3303,19 +3324,25 @@ void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
>              goto out;
>          }
>      }
> -    backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
> -                 NULL, backup->compress, backup->on_source_error,
> -                 backup->on_target_error, NULL, bs, txn, &local_err);
> +    job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
> +                            backup->sync, NULL, backup->compress,
> +                            backup->on_source_error, backup->on_target_error,
> +                            NULL, bs, txn, &local_err);
>      if (local_err != NULL) {
>          error_propagate(errp, local_err);
>      }
>  out:
>      aio_context_release(aio_context);
> +    return job;
>  }
>
>  void qmp_blockdev_backup(BlockdevBackup *arg, Error **errp)
>  {
> -    do_blockdev_backup(arg, NULL, errp);
> +    BlockJob *job;
> +    job = do_blockdev_backup(arg, NULL, errp);
> +    if (job) {
> +        block_job_start(job);
> +    }
>  }
>
>  /* Parameter check and block job starting for drive mirroring.
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 686f6a8..738e4b4 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -737,7 +737,7 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
>                    void *opaque, Error **errp);
>
>  /*
> - * backup_start:
> + * backup_job_create:
>   * @job_id: The id of the newly-created job, or %NULL to use the
>   * device name of @bs.
>   * @bs: Block device to operate on.
> @@ -751,17 +751,18 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
>   * @opaque: Opaque pointer value passed to @cb.
>   * @txn: Transaction that this job is part of (may be NULL).
>   *
> - * Start a backup operation on @bs.  Clusters in @bs are written to @target
> + * Create a backup operation on @bs.  Clusters in @bs are written to @target
>   * until the job is cancelled or manually completed.
>   */
> -void backup_start(const char *job_id, BlockDriverState *bs,
> -                  BlockDriverState *target, int64_t speed,
> -                  MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
> -                  bool compress,
> -                  BlockdevOnError on_source_error,
> -                  BlockdevOnError on_target_error,
> -                  BlockCompletionFunc *cb, void *opaque,
> -                  BlockJobTxn *txn, Error **errp);
> +BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
> +                            BlockDriverState *target, int64_t speed,
> +                            MirrorSyncMode sync_mode,
> +                            BdrvDirtyBitmap *sync_bitmap,
> +                            bool compress,
> +                            BlockdevOnError on_source_error,
> +                            BlockdevOnError on_target_error,
> +                            BlockCompletionFunc *cb, void *opaque,
> +                            BlockJobTxn *txn, Error **errp);
>
>  /**
>   * block_job_start:
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/11] blockjob: refactor backup_start as backup_job_create
  2016-10-07 18:39   ` John Snow
@ 2016-10-10  8:57     ` Kevin Wolf
  2016-10-10 22:51       ` John Snow
  0 siblings, 1 reply; 51+ messages in thread
From: Kevin Wolf @ 2016-10-10  8:57 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, vsementsov, famz, jcody, qemu-devel, stefanha

Am 07.10.2016 um 20:39 hat John Snow geschrieben:
> On 09/30/2016 06:00 PM, John Snow wrote:
> >Refactor backup_start as backup_job_create, which only creates the job,
> >but does not automatically start it. The old interface, 'backup_start',
> >is not kept in favor of limiting the number of nearly-identical iterfaces
> >that would have to be edited to keep up with QAPI changes in the future.
> >
> >Callers that wish to synchronously start the backup_block_job can
> >instead just call block_job_start immediately after calling
> >backup_job_create.
> >
> >Transactions are updated to use the new interface, calling block_job_start
> >only during the .commit phase, which helps prevent race conditions where
> >jobs may finish before we even finish building the transaction. This may
> >happen, for instance, during empty block backup jobs.
> >
> 
> Sadly for me, I realized this patch has a potential problem. When we
> were adding the bitmap operations, it became clear that the
> atomicity point was during .prepare, not .commit.
> 
> e.g. the bitmap is cleared or created during prepare, and backup_run
> installs its Write Notifier at that point in time, too.

Strictly speaking that's wrong then.

The write notifier doesn't really hurt because it is never triggered
between prepare and commit (we're holding the lock) and it can just be
removed again.

Clearing the bitmap is a bug because the caller could expect that the
bitmap is in its original state if the transaction fails. I doubt this
is a problem in practice, but we should fix it anyway.

By the way, why did we allow to add a 'bitmap' option for DriveBackup
without adding it to BlockdevBackup at the same time?

> By changing BlockJobs to only run on commit, we've severed the
> atomicity point such that some actions will take effect during
> prepare, and others at commit.
> 
> I still think it's the correct thing to do to delay the BlockJobs
> until the commit phase, so I will start auditing the code to see how
> hard it is to shift the atomicity point to commit instead. If it's
> possible to do that, I think from the POV of the managing
> application, having the atomicity point be
> 
> Feel free to chime in with suggestions and counterpoints until then.

I agree that jobs have to be started only at commit. There may be other
things that are currently happening in prepare that really should be
moved as well, but unless moving one thing but not the other doesn't
break anything that was working, we can fix one thing at a time.

Kevin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions
  2016-10-05 21:00         ` John Snow
@ 2016-10-10 16:45           ` Kashyap Chamarthy
  2016-10-10 18:36             ` John Snow
  0 siblings, 1 reply; 51+ messages in thread
From: Kashyap Chamarthy @ 2016-10-10 16:45 UTC (permalink / raw)
  To: John Snow
  Cc: Eric Blake, Kevin Wolf, vsementsov, famz, qemu-block, jcody,
	qemu-devel, stefanha

On Wed, Oct 05, 2016 at 05:00:29PM -0400, John Snow wrote:

[Arbitrarily chiming here, and still catching up with the details of the
thread.]
 
> On 10/05/2016 03:24 PM, Eric Blake wrote:
> > On 10/05/2016 01:49 PM, John Snow wrote:

[...]

> > > Hmm, do we want to make it so some jobs are invisible and others are
> > > not? Because as it stands right now, neither case is strictly true. We
> > > only emit cancelled/completed events if it was started via QMP, however
> > > we do emit events for error and ready regardless of who started the job.
> > 
> > Libvirt tries to mirror any block job event it receives to upper layers.
> >  But if it cannot figure out which upper-layer disk the event is
> > associated with, it just drops the event.  So I think that from the
> > libvirt perspective, you are okay if if you always report job events,
> > even for internal jobs.  (Do we have a quick and easy way to set up an
> > internal job event, to double-check if I can produce some sort of
> > libvirt scenario to trigger the event and see that it gets safely ignored?)
> > 
> 
> Not in a QEMU release yet, I think.

If not from an official QEMU release, it'd still be useful to work out a
a way to reproduce what Eric asked even from upstream Git master.

> > > That didn't seem particularly consistent to me; either all events should
> > > be controlled by the job layer itself or none of them should be.
> > > 
> > > I opted for "all."
> > > 
> > > For "internal" jobs that did not previously emit any events, is it not
> > > true that these jobs still appear in the block job list and are
> > > effectively public regardless? I'd argue that these messages may be of
> > > value for management utilities who are still blocked by these jobs
> > > whether or not they are 'internal' or not.

It'd certainly be useful durign debugging (for the said management
utilities), if it's possible to distinguish an event that was triggerred
by an internal block job vs. an event emitted by a job explicitly
triggerred by a user action.

For example, OpenStack Nova calls libvirt API virDomainBlockRebase(),
which internally calls QMP `drive-mirror` that emits events.  An "event
origin classification" system, if were to exist, allows one to pay
attention to only those events that are emitted due to an explicit
action and ignore all the rest ('internal').

But I'm not quite sure if it's desirable to have this event
classification for cleanliness reasons as Eric points out below.

> > > I'll push for keeping it mandatory and explicit. If it becomes a
> > > problem, we can always add a 'silent' job property that silences ALL qmp
> > > events, including all completion, error, and ready notices.
> > 
> > Completely silencing an internal job seems a little cleaner than having
> > events for the job but not being able to query it.  But if nothing
> > breaks by exposing the internal jobs, that seems even easier than trying
> > to decide which jobs are internal and hidden vs. user-requested and public.
> >
> 
> Well, at the moment anything requested directly via blockdev.c is "public."
> Before 2.8, all jobs were public ones, with the exception of those in
> qemu-img which is a bit of a different/special case.
> 
> We have this block/replication.c beast now, though, and it uses backup_start
> and commit_active_start as it sees fit without direct user intervention.
> 
> As it stands, I believe the jobs that replication creates are user-visible
> via query, will not issue cancellation or completion events, but WILL emit
> error events. It may emit ready events for the mirror job it uses, but I
> haven't traced that. (It could, at least.)

Thanks, the above is useful to know. 

-- 
/kashyap

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions
  2016-10-10 16:45           ` Kashyap Chamarthy
@ 2016-10-10 18:36             ` John Snow
  2016-10-10 19:28               ` Eric Blake
  0 siblings, 1 reply; 51+ messages in thread
From: John Snow @ 2016-10-10 18:36 UTC (permalink / raw)
  To: Kashyap Chamarthy
  Cc: Eric Blake, Kevin Wolf, vsementsov, famz, qemu-block, jcody,
	qemu-devel, stefanha



On 10/10/2016 12:45 PM, Kashyap Chamarthy wrote:
> On Wed, Oct 05, 2016 at 05:00:29PM -0400, John Snow wrote:
>
> [Arbitrarily chiming here, and still catching up with the details of the
> thread.]
>
>> On 10/05/2016 03:24 PM, Eric Blake wrote:
>>> On 10/05/2016 01:49 PM, John Snow wrote:
>
> [...]
>
>>>> Hmm, do we want to make it so some jobs are invisible and others are
>>>> not? Because as it stands right now, neither case is strictly true. We
>>>> only emit cancelled/completed events if it was started via QMP, however
>>>> we do emit events for error and ready regardless of who started the job.
>>>
>>> Libvirt tries to mirror any block job event it receives to upper layers.
>>>  But if it cannot figure out which upper-layer disk the event is
>>> associated with, it just drops the event.  So I think that from the
>>> libvirt perspective, you are okay if if you always report job events,
>>> even for internal jobs.  (Do we have a quick and easy way to set up an
>>> internal job event, to double-check if I can produce some sort of
>>> libvirt scenario to trigger the event and see that it gets safely ignored?)
>>>
>>
>> Not in a QEMU release yet, I think.
>
> If not from an official QEMU release, it'd still be useful to work out a
> a way to reproduce what Eric asked even from upstream Git master.
>

I'll be honest that I don't know; this is related to Replication which I 
know reasonably little about overall. It got added in the 2.8 timeframe, 
so the behavior it currently exhibits is not a good or meaningful 
reference for maintaining compatibility.

We can /change/ the behavior before releases with no love lost.

>>>> That didn't seem particularly consistent to me; either all events should
>>>> be controlled by the job layer itself or none of them should be.
>>>>
>>>> I opted for "all."
>>>>
>>>> For "internal" jobs that did not previously emit any events, is it not
>>>> true that these jobs still appear in the block job list and are
>>>> effectively public regardless? I'd argue that these messages may be of
>>>> value for management utilities who are still blocked by these jobs
>>>> whether or not they are 'internal' or not.
>
> It'd certainly be useful durign debugging (for the said management
> utilities), if it's possible to distinguish an event that was triggerred
> by an internal block job vs. an event emitted by a job explicitly
> triggerred by a user action.
>

Or, what if you just didn't get events for internal jobs? Are events for 
un-managed jobs useful in any sense beyond understanding the stateful 
availability of the drive to participate in a new job?

> For example, OpenStack Nova calls libvirt API virDomainBlockRebase(),
> which internally calls QMP `drive-mirror` that emits events.  An "event
> origin classification" system, if were to exist, allows one to pay
> attention to only those events that are emitted due to an explicit
> action and ignore all the rest ('internal').
>
> But I'm not quite sure if it's desirable to have this event
> classification for cleanliness reasons as Eric points out below.
>
>>>> I'll push for keeping it mandatory and explicit. If it becomes a
>>>> problem, we can always add a 'silent' job property that silences ALL qmp
>>>> events, including all completion, error, and ready notices.
>>>
>>> Completely silencing an internal job seems a little cleaner than having
>>> events for the job but not being able to query it.  But if nothing
>>> breaks by exposing the internal jobs, that seems even easier than trying
>>> to decide which jobs are internal and hidden vs. user-requested and public.
>>>
>>
>> Well, at the moment anything requested directly via blockdev.c is "public."
>> Before 2.8, all jobs were public ones, with the exception of those in
>> qemu-img which is a bit of a different/special case.
>>
>> We have this block/replication.c beast now, though, and it uses backup_start
>> and commit_active_start as it sees fit without direct user intervention.
>>
>> As it stands, I believe the jobs that replication creates are user-visible
>> via query, will not issue cancellation or completion events, but WILL emit
>> error events. It may emit ready events for the mirror job it uses, but I
>> haven't traced that. (It could, at least.)
>
> Thanks, the above is useful to know.
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions
  2016-10-10 18:36             ` John Snow
@ 2016-10-10 19:28               ` Eric Blake
  2016-10-11 13:32                 ` Kashyap Chamarthy
  0 siblings, 1 reply; 51+ messages in thread
From: Eric Blake @ 2016-10-10 19:28 UTC (permalink / raw)
  To: John Snow, Kashyap Chamarthy
  Cc: Kevin Wolf, vsementsov, famz, qemu-block, jcody, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 2436 bytes --]

On 10/10/2016 01:36 PM, John Snow wrote:
>>>>  But if it cannot figure out which upper-layer disk the event is
>>>> associated with, it just drops the event.  So I think that from the
>>>> libvirt perspective, you are okay if if you always report job events,
>>>> even for internal jobs.  (Do we have a quick and easy way to set up an
>>>> internal job event, to double-check if I can produce some sort of
>>>> libvirt scenario to trigger the event and see that it gets safely
>>>> ignored?)
>>>>
>>>
>>> Not in a QEMU release yet, I think.
>>
>> If not from an official QEMU release, it'd still be useful to work out a
>> a way to reproduce what Eric asked even from upstream Git master.
>>
> 
> I'll be honest that I don't know; this is related to Replication which I
> know reasonably little about overall. It got added in the 2.8 timeframe,
> so the behavior it currently exhibits is not a good or meaningful
> reference for maintaining compatibility.
> 
> We can /change/ the behavior before releases with no love lost.

And if Replication is the only way to trigger internal use of jobs, then
we aren't breaking libvirt (which doesn't know how to drive replication
yet) by changing anything on that front.

> 
> Or, what if you just didn't get events for internal jobs? Are events for
> un-managed jobs useful in any sense beyond understanding the stateful
> availability of the drive to participate in a new job?

If libvirt isn't driving replication, then it's a moot point. And even
though replication in libvirt is not supported yet, I suspect that down
the road when support is added, the easiest thing for libvirt will be to
state that replication and libvirt-controlled block jobs are mutually
exclusive; there's probably enough lurking dragons that if your system
MUST be high-reliance by replication, you probably don't want to be
confusing things by changing the backing file depth manually with
streams, pulls, or other manual actions at the same time as replication
is managing the system, because how can you guarantee that both primary
and secondary see the same manual actions at all the right times?

At any rate, not seeing internal-only jobs is probably the most
reasonable, even if it means an internal-only job can block the attempt
to do a manual job.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/11] blockjob: refactor backup_start as backup_job_create
  2016-10-10  8:57     ` Kevin Wolf
@ 2016-10-10 22:51       ` John Snow
  2016-10-11  8:56         ` Paolo Bonzini
  2016-10-11  9:35         ` Kevin Wolf
  0 siblings, 2 replies; 51+ messages in thread
From: John Snow @ 2016-10-10 22:51 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, vsementsov, famz, jcody, qemu-devel, stefanha, Paolo Bonzini



On 10/10/2016 04:57 AM, Kevin Wolf wrote:
> Am 07.10.2016 um 20:39 hat John Snow geschrieben:
>> On 09/30/2016 06:00 PM, John Snow wrote:
>>> Refactor backup_start as backup_job_create, which only creates the job,
>>> but does not automatically start it. The old interface, 'backup_start',
>>> is not kept in favor of limiting the number of nearly-identical iterfaces

(Ah yes, 'iterfaces.')

>>> that would have to be edited to keep up with QAPI changes in the future.
>>>
>>> Callers that wish to synchronously start the backup_block_job can
>>> instead just call block_job_start immediately after calling
>>> backup_job_create.
>>>
>>> Transactions are updated to use the new interface, calling block_job_start
>>> only during the .commit phase, which helps prevent race conditions where
>>> jobs may finish before we even finish building the transaction. This may
>>> happen, for instance, during empty block backup jobs.
>>>
>>
>> Sadly for me, I realized this patch has a potential problem. When we
>> were adding the bitmap operations, it became clear that the
>> atomicity point was during .prepare, not .commit.
>>
>> e.g. the bitmap is cleared or created during prepare, and backup_run
>> installs its Write Notifier at that point in time, too.
>
> Strictly speaking that's wrong then.
>

I agree, though I do remember this coming up during the bitmap review 
process that the current point-in-time spot is during prepare at the moment.

I do think that while it's at least a consistent model (The model where 
we do in fact commit during .prepare(), and simply undo or revert during 
.abort(), and only clean or remove undo-cache in .commit()) it certainly 
violates the principle of least surprise and is a little rude...

> The write notifier doesn't really hurt because it is never triggered
> between prepare and commit (we're holding the lock) and it can just be
> removed again.
>
> Clearing the bitmap is a bug because the caller could expect that the
> bitmap is in its original state if the transaction fails. I doubt this
> is a problem in practice, but we should fix it anyway.
>

We make a backup to undo the process if it fails. I only mention it to 
emphasize that the atomic point appears to be during prepare. In 
practice we hold the locks for the whole process, but... I think Paolo 
may be actively trying to change that.

> By the way, why did we allow to add a 'bitmap' option for DriveBackup
> without adding it to BlockdevBackup at the same time?
>

I don't remember. I'm not sure anyone ever audited it to convince 
themselves it was a useful or safe thing to do. I believe at the time I 
was pushing for bitmaps in DriveBackup, Fam was still authoring the 
BlockdevBackup interface.

>> By changing BlockJobs to only run on commit, we've severed the
>> atomicity point such that some actions will take effect during
>> prepare, and others at commit.
>>
>> I still think it's the correct thing to do to delay the BlockJobs
>> until the commit phase, so I will start auditing the code to see how
>> hard it is to shift the atomicity point to commit instead. If it's
>> possible to do that, I think from the POV of the managing
>> application, having the atomicity point be
>>
>> Feel free to chime in with suggestions and counterpoints until then.
>
> I agree that jobs have to be started only at commit. There may be other
> things that are currently happening in prepare that really should be
> moved as well, but unless moving one thing but not the other doesn't
> break anything that was working, we can fix one thing at a time.
>
> Kevin
>

Alright, let's give this a whirl.

We have 8 transaction actions:

drive_backup
blockdev_backup
block_dirty_bitmap_add
block_dirty_bitmap_clear
abort
blockdev_snapshot
blockdev_snapshot_sync
blockdev_snapshot_internal_sync

Drive and Blockdev backup are already modified to behave point-in-time 
at time of .commit() by changing them to only begin running once the 
commit phase occurs.

Bitmap add and clear are trivial to rework; clear just moves the call to 
clear in commit, with possibly some action taken to prevent the bitmap 
from become used by some other process in the meantime. Add is easy to 
rework too, we can create it during prepare but reset it back to zero 
during commit if necessary.

Abort needs no changes.

blockdev_snapshot[_sync] actually appears to already be doing the right 
thing, by only installing the new top layer during commit, which makes 
this action inconsistent by current semantics, but requires no changes 
to move to the desired new semantics.

That leaves only the internal snapshot to worry about, which does 
admittedly look like quite the yak to shave. It's a bit out of scope for 
me, but Kevin, do you think this is possible?

Looks like implementations are qcow2, rbd, and sheepdog. I imagine this 
would need to be split into prepare and commit semantics to accommodate 
this change... though we don't have any meaningful control over the rbd 
implementation.

Any thoughts? I could conceivably just change everything over to working 
primarily during .commit(), and just argue that the locks held for the 
transaction are sufficient to leave the internal snapshot alone "for 
now," ...

--js

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/11] blockjob: refactor backup_start as backup_job_create
  2016-10-10 22:51       ` John Snow
@ 2016-10-11  8:56         ` Paolo Bonzini
  2016-10-11  9:35         ` Kevin Wolf
  1 sibling, 0 replies; 51+ messages in thread
From: Paolo Bonzini @ 2016-10-11  8:56 UTC (permalink / raw)
  To: John Snow, Kevin Wolf
  Cc: qemu-block, vsementsov, famz, jcody, qemu-devel, stefanha



On 11/10/2016 00:51, John Snow wrote:
>> Clearing the bitmap is a bug because the caller could expect that the
>> bitmap is in its original state if the transaction fails. I doubt this
>> is a problem in practice, but we should fix it anyway.
> 
> We make a backup to undo the process if it fails. I only mention it to
> emphasize that the atomic point appears to be during prepare. In
> practice we hold the locks for the whole process, but... I think Paolo
> may be actively trying to change that.

Even now, atomicity must be ensured with bdrv_drained_begin/end.  The
AioContext lock does not promise atomicity.

Paolo

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/11] blockjob: refactor backup_start as backup_job_create
  2016-10-10 22:51       ` John Snow
  2016-10-11  8:56         ` Paolo Bonzini
@ 2016-10-11  9:35         ` Kevin Wolf
  2016-10-17  8:59           ` Fam Zheng
  1 sibling, 1 reply; 51+ messages in thread
From: Kevin Wolf @ 2016-10-11  9:35 UTC (permalink / raw)
  To: John Snow
  Cc: qemu-block, vsementsov, famz, jcody, qemu-devel, stefanha, Paolo Bonzini

Am 11.10.2016 um 00:51 hat John Snow geschrieben:
> >>Sadly for me, I realized this patch has a potential problem. When we
> >>were adding the bitmap operations, it became clear that the
> >>atomicity point was during .prepare, not .commit.
> >>
> >>e.g. the bitmap is cleared or created during prepare, and backup_run
> >>installs its Write Notifier at that point in time, too.
> >
> >Strictly speaking that's wrong then.
> >
> 
> I agree, though I do remember this coming up during the bitmap
> review process that the current point-in-time spot is during prepare
> at the moment.
> 
> I do think that while it's at least a consistent model (The model
> where we do in fact commit during .prepare(), and simply undo or
> revert during .abort(), and only clean or remove undo-cache in
> .commit()) it certainly violates the principle of least surprise and
> is a little rude...

As long as we can reliably undo things in .abort (i.e. use operations
that can't fail) and keep the locks and the device drained, we should be
okay in terms of atomicity.

I think it's still nicer if we can enable things only in .commit, but
sometimes we have to use operations that could fail, so we have to do
them in .prepare.

The exact split between .prepare/.commit/.abort isn't visible on the
external interfaces as long as it's done correctly, so it doesn't
necessarily have to be the same for all commands.

> >The write notifier doesn't really hurt because it is never triggered
> >between prepare and commit (we're holding the lock) and it can just be
> >removed again.
> >
> >Clearing the bitmap is a bug because the caller could expect that the
> >bitmap is in its original state if the transaction fails. I doubt this
> >is a problem in practice, but we should fix it anyway.
> 
> We make a backup to undo the process if it fails. I only mention it
> to emphasize that the atomic point appears to be during prepare. In
> practice we hold the locks for the whole process, but... I think
> Paolo may be actively trying to change that.

Well, the whole .prepare/.commit or .prepare/.abort sequence is supposed
to be atomic, so it's really the same thing. Changing this would break
the transactional behaviour, so that's not possible anyway.

> >By the way, why did we allow to add a 'bitmap' option for DriveBackup
> >without adding it to BlockdevBackup at the same time?
> 
> I don't remember. I'm not sure anyone ever audited it to convince
> themselves it was a useful or safe thing to do. I believe at the
> time I was pushing for bitmaps in DriveBackup, Fam was still
> authoring the BlockdevBackup interface.

Hm, maybe that's why. I checked the commit dates of both (and there
BlockdevBackup was earlier), but I didn't check the development history.

Should we add it now or is it a bad idea?

> >>By changing BlockJobs to only run on commit, we've severed the
> >>atomicity point such that some actions will take effect during
> >>prepare, and others at commit.
> >>
> >>I still think it's the correct thing to do to delay the BlockJobs
> >>until the commit phase, so I will start auditing the code to see how
> >>hard it is to shift the atomicity point to commit instead. If it's
> >>possible to do that, I think from the POV of the managing
> >>application, having the atomicity point be
> >>
> >>Feel free to chime in with suggestions and counterpoints until then.
> >
> >I agree that jobs have to be started only at commit. There may be other
> >things that are currently happening in prepare that really should be
> >moved as well, but unless moving one thing but not the other doesn't
> >break anything that was working, we can fix one thing at a time.
> >
> >Kevin
> >
> 
> Alright, let's give this a whirl.
> 
> We have 8 transaction actions:
> 
> drive_backup
> blockdev_backup
> block_dirty_bitmap_add
> block_dirty_bitmap_clear
> abort
> blockdev_snapshot
> blockdev_snapshot_sync
> blockdev_snapshot_internal_sync
> 
> Drive and Blockdev backup are already modified to behave
> point-in-time at time of .commit() by changing them to only begin
> running once the commit phase occurs.
> 
> Bitmap add and clear are trivial to rework; clear just moves the
> call to clear in commit, with possibly some action taken to prevent
> the bitmap from become used by some other process in the meantime.
> Add is easy to rework too, we can create it during prepare but reset
> it back to zero during commit if necessary.
> 
> Abort needs no changes.
> 
> blockdev_snapshot[_sync] actually appears to already be doing the
> right thing, by only installing the new top layer during commit,
> which makes this action inconsistent by current semantics, but
> requires no changes to move to the desired new semantics.

This doesn't sound too bad.

> That leaves only the internal snapshot to worry about, which does
> admittedly look like quite the yak to shave. It's a bit out of scope
> for me, but Kevin, do you think this is possible?
> 
> Looks like implementations are qcow2, rbd, and sheepdog. I imagine
> this would need to be split into prepare and commit semantics to
> accommodate this change... though we don't have any meaningful
> control over the rbd implementation.
> 
> Any thoughts? I could conceivably just change everything over to
> working primarily during .commit(), and just argue that the locks
> held for the transaction are sufficient to leave the internal
> snapshot alone "for now," ...

Leave them alone. We don't really support atomic internal snapshots. We
could make some heavy refactoring in order to split the BlockDriver
callbacks into prepare/commit/abort, but that's probably not worth the
effort and would make some code that already isn't tested much a lot
more complex.

If we ever decided to get serious about internal snapshots, we could
still do this. I kind of like internal snapshots, but I doubt it will
happen.

Kevin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions
  2016-10-05 18:49     ` John Snow
  2016-10-05 19:24       ` Eric Blake
  2016-10-06  7:44       ` Kevin Wolf
@ 2016-10-11  9:50       ` Markus Armbruster
  2 siblings, 0 replies; 51+ messages in thread
From: Markus Armbruster @ 2016-10-11  9:50 UTC (permalink / raw)
  To: John Snow
  Cc: Kevin Wolf, vsementsov, famz, qemu-block, jcody, qemu-devel, stefanha

John Snow <jsnow@redhat.com> writes:

> On 10/05/2016 09:43 AM, Kevin Wolf wrote:
[...]
>> Here we have an additional caller in block/replication.c and qemu-img,
>> so the parameters must stay. For qemu-img, nothing changes. For
>> replication, the block job events are added as a side effect.
>>
>> Not sure if we want to emit such events for an internal block job, but
>> if we do want the change, it should be explicit.
>>
>
> Hmm, do we want to make it so some jobs are invisible and others are
> not? Because as it stands right now, neither case is strictly true. We
> only emit cancelled/completed events if it was started via QMP,
> however we do emit events for error and ready regardless of who
> started the job.
>
> That didn't seem particularly consistent to me; either all events
> should be controlled by the job layer itself or none of them should
> be.
>
> I opted for "all."
>
> For "internal" jobs that did not previously emit any events, is it not
> true that these jobs still appear in the block job list and are
> effectively public regardless? I'd argue that these messages may be of
> value for management utilities who are still blocked by these jobs
> whether or not they are 'internal' or not.
>
> I'll push for keeping it mandatory and explicit. If it becomes a
> problem, we can always add a 'silent' job property that silences ALL
> qmp events, including all completion, error, and ready notices.
>
> I've CC'd Wen Congyang and Eric Blake to talk me down if they wish.

Having read the thread so far, I have two high-level remarks:

1. We should expose a job externally either completely (all queries show
it, all events get sent, any non-query command works normally as far as
it makes sense) or not at all.

2. Exposing internal jobs risks making them ABI.  Implementation details
need to be kept out of ABI.  So the question is whether the job is truly
internal detail, or a bona fide part of the external interface.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions
  2016-10-10 19:28               ` Eric Blake
@ 2016-10-11 13:32                 ` Kashyap Chamarthy
  0 siblings, 0 replies; 51+ messages in thread
From: Kashyap Chamarthy @ 2016-10-11 13:32 UTC (permalink / raw)
  To: Eric Blake
  Cc: John Snow, Kevin Wolf, vsementsov, famz, qemu-block, jcody,
	qemu-devel, stefanha

On Mon, Oct 10, 2016 at 02:28:52PM -0500, Eric Blake wrote:
> On 10/10/2016 01:36 PM, John Snow wrote:

[...]

> > I'll be honest that I don't know; this is related to Replication which I
> > know reasonably little about overall. It got added in the 2.8 timeframe,
> > so the behavior it currently exhibits is not a good or meaningful
> > reference for maintaining compatibility.
> > 
> > We can /change/ the behavior before releases with no love lost.
> 
> And if Replication is the only way to trigger internal use of jobs, then
> we aren't breaking libvirt (which doesn't know how to drive replication
> yet) by changing anything on that front.

Exactly.

> > Or, what if you just didn't get events for internal jobs? Are events for
> > un-managed jobs useful in any sense beyond understanding the stateful
> > availability of the drive to participate in a new job?
> 
> If libvirt isn't driving replication, then it's a moot point. And even
> though replication in libvirt is not supported yet, I suspect that down
> the road when support is added, the easiest thing for libvirt will be to
> state that replication and libvirt-controlled block jobs are mutually
> exclusive; there's probably enough lurking dragons that if your system
> MUST be high-reliance by replication, you probably don't want to be
> confusing things by changing the backing file depth manually with
> streams, pulls, or other manual actions at the same time as replication
> is managing the system, because how can you guarantee that both primary
> and secondary see the same manual actions at all the right times?

Very nice argument for making them mutually exclusive, from a libvirt
POV.

> At any rate, not seeing internal-only jobs is probably the most
> reasonable, even if it means an internal-only job can block the attempt
> to do a manual job.

FWIW, I agree, if only as a user / observer of these events during
debugging.

[...]

-- 
/kashyap

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/11] blockjob: add .clean property
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 07/11] blockjob: add .clean property John Snow
@ 2016-10-12 11:11   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2016-10-12 11:11 UTC (permalink / raw)
  To: John Snow, qemu-block; +Cc: kwolf, famz, stefanha, jcody, eblake, qemu-devel

On 01.10.2016 01:00, John Snow wrote:
> Cleaning up after we have deferred to the main thread but before the
> transaction has converged can be dangerous and result in deadlocks
> if the job cleanup invokes any BH polling loops.
>
> A job may attempt to begin cleaning up, but may induce another job to
> enter its cleanup routine. The second job, part of our same transaction,
> will block waiting for the first job to finish, so neither job may now
> make progress.
>
> To rectify this, allow jobs to register a cleanup operation that will
> always run regardless of if the job was in a transaction or not, and
> if the transaction job group completed successfully or not.
>
> Move sensitive cleanup to this callback instead which is guaranteed to
> be run only after the transaction has converged, which removes sensitive
> timing constraints from said cleanup.
>
> Furthermore, in future patches these cleanup operations will be performed
> regardless of whether or not we actually started the job. Therefore,
> cleanup callbacks should essentially confine themselves to undoing create
> operations, e.g. setup actions taken in what is now backup_run.
>
> Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>   block/backup.c               | 11 ++++++++---
>   blockjob.c                   |  5 +++--
>   include/block/blockjob_int.h |  8 ++++++++
>   3 files changed, 19 insertions(+), 5 deletions(-)
>
> diff --git a/block/backup.c b/block/backup.c
> index d667482..42ff4c0 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -242,6 +242,13 @@ static void backup_abort(BlockJob *job)
>       }
>   }
>   
> +static void backup_clean(BlockJob *job)
> +{
> +    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
> +    assert(s->target);
> +    blk_unref(s->target);
> +}
> +
>   static void backup_attached_aio_context(BlockJob *job, AioContext *aio_context)
>   {
>       BackupBlockJob *s = container_of(job, BackupBlockJob, common);
> @@ -306,6 +313,7 @@ static const BlockJobDriver backup_job_driver = {
>       .set_speed              = backup_set_speed,
>       .commit                 = backup_commit,
>       .abort                  = backup_abort,
> +    .clean                  = backup_clean,
>       .attached_aio_context   = backup_attached_aio_context,
>   };
>   
> @@ -327,11 +335,8 @@ typedef struct {
>   
>   static void backup_complete(BlockJob *job, void *opaque)
>   {
> -    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
>       BackupCompleteData *data = opaque;
>   
> -    blk_unref(s->target);
> -
>       block_job_completed(job, data->ret);
>       g_free(data);
>   }
> diff --git a/blockjob.c b/blockjob.c
> index 09fb602..44cbf6c 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -217,7 +217,9 @@ static void block_job_completed_single(BlockJob *job)
>               job->driver->abort(job);
>           }
>       }
> -

I think empty line is removed by mistake, it returns back in 09

> +    if (job->driver->clean) {
> +        job->driver->clean(job);
> +    }
>       if (job->cb) {
>           job->cb(job->opaque, job->ret);
>       }
> @@ -230,7 +232,6 @@ static void block_job_completed_single(BlockJob *job)
>           }
>           block_job_event_completed(job, msg);
>       }
> -

and here

>       if (job->txn) {
>           QLIST_REMOVE(job, txn_list);
>           block_job_txn_unref(job->txn);
> diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
> index c6da7e4..b7aeaef 100644
> --- a/include/block/blockjob_int.h
> +++ b/include/block/blockjob_int.h
> @@ -74,6 +74,14 @@ struct BlockJobDriver {
>       void (*abort)(BlockJob *job);
>   
>       /**
> +     * If the callback is not NULL, it will be invoked after a call to either
> +     * .commit() or .abort(). Regardless of which callback is invoked after
> +     * completion, .clean() will always be called, even if the job does not
> +     * belong to a transaction group.
> +     */
> +    void (*clean)(BlockJob *job);
> +
> +    /**
>        * If the callback is not NULL, it will be invoked when the job transitions
>        * into the paused state.  Paused jobs must not perform any asynchronous
>        * I/O or event loop activity.  This callback is used to quiesce jobs.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/11] iotests: add transactional failure race test
  2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 11/11] iotests: add transactional failure race test John Snow
@ 2016-10-12 11:26   ` Vladimir Sementsov-Ogievskiy
  2016-10-12 16:09     ` John Snow
  0 siblings, 1 reply; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2016-10-12 11:26 UTC (permalink / raw)
  To: John Snow, qemu-block; +Cc: kwolf, famz, stefanha, jcody, eblake, qemu-devel

it is almost a duplication of test_transaction_failure, I think it would 
be better to make separate do_test_transaction_failure with parameter 
and two wrappers

On 01.10.2016 01:00, John Snow wrote:
> Add a regression test for the case found by Vladimir.
>
> Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>   tests/qemu-iotests/124     | 91 ++++++++++++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/124.out |  4 +-
>   2 files changed, 93 insertions(+), 2 deletions(-)
>
> diff --git a/tests/qemu-iotests/124 b/tests/qemu-iotests/124
> index 2f0bc24..b8f7dad 100644
> --- a/tests/qemu-iotests/124
> +++ b/tests/qemu-iotests/124
> @@ -512,6 +512,97 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
>           self.check_backups()
>   
>   
> +    def test_transaction_failure_race(self):
> +        '''Test: Verify that transactions with jobs that have no data to
> +        transfer do not cause race conditions in the cancellation of the entire
> +        transaction job group.
> +        '''
> +
> +        # Create a second drive, with pattern:
> +        drive1 = self.add_node('drive1')
> +        self.img_create(drive1['file'], drive1['fmt'])
> +        io_write_patterns(drive1['file'], (('0x14', 0, 512),
> +                                           ('0x5d', '1M', '32k'),
> +                                           ('0xcd', '32M', '124k')))
> +
> +        # Create a blkdebug interface to this img as 'drive1'
> +        result = self.vm.qmp('blockdev-add', options={
> +            'node-name': drive1['id'],
> +            'driver': drive1['fmt'],
> +            'file': {
> +                'driver': 'blkdebug',
> +                'image': {
> +                    'driver': 'file',
> +                    'filename': drive1['file']
> +                },
> +                'set-state': [{
> +                    'event': 'flush_to_disk',
> +                    'state': 1,
> +                    'new_state': 2
> +                }],
> +                'inject-error': [{
> +                    'event': 'read_aio',
> +                    'errno': 5,
> +                    'state': 2,
> +                    'immediately': False,
> +                    'once': True
> +                }],
> +            }
> +        })
> +        self.assert_qmp(result, 'return', {})
> +
> +        # Create bitmaps and full backups for both drives
> +        drive0 = self.drives[0]
> +        dr0bm0 = self.add_bitmap('bitmap0', drive0)
> +        dr1bm0 = self.add_bitmap('bitmap0', drive1)
> +        self.create_anchor_backup(drive0)
> +        self.create_anchor_backup(drive1)
> +        self.assert_no_active_block_jobs()
> +        self.assertFalse(self.vm.get_qmp_events(wait=False))
> +
> +        # Emulate some writes
> +        self.hmp_io_writes(drive1['id'], (('0xba', 0, 512),
> +                                          ('0xef', '16M', '256k'),
> +                                          ('0x46', '32736k', '64k')))
> +
> +        # Create incremental backup targets
> +        target0 = self.prepare_backup(dr0bm0)
> +        target1 = self.prepare_backup(dr1bm0)
> +
> +        # Ask for a new incremental backup per-each drive, expecting drive1's
> +        # backup to fail and attempt to cancel the empty drive0 job.
> +        transaction = [
> +            transaction_drive_backup(drive0['id'], target0, sync='incremental',
> +                                     format=drive0['fmt'], mode='existing',
> +                                     bitmap=dr0bm0.name),
> +            transaction_drive_backup(drive1['id'], target1, sync='incremental',
> +                                     format=drive1['fmt'], mode='existing',
> +                                     bitmap=dr1bm0.name)
> +        ]
> +        result = self.vm.qmp('transaction', actions=transaction,
> +                             properties={'completion-mode': 'grouped'} )
> +        self.assert_qmp(result, 'return', {})
> +
> +        # Observe that drive0's backup is cancelled and drive1 completes with
> +        # an error.
> +        self.wait_qmp_backup_cancelled(drive0['id'])
> +        self.assertFalse(self.wait_qmp_backup(drive1['id']))
> +        error = self.vm.event_wait('BLOCK_JOB_ERROR')
> +        self.assert_qmp(error, 'data', {'device': drive1['id'],
> +                                        'action': 'report',
> +                                        'operation': 'read'})
> +        self.assertFalse(self.vm.get_qmp_events(wait=False))
> +        self.assert_no_active_block_jobs()
> +
> +        # Delete drive0's successful target and eliminate our record of the
> +        # unsuccessful drive1 target.
> +        dr0bm0.del_target()
> +        dr1bm0.del_target()
> +
> +        self.vm.shutdown()
> +
> +
> +
>       def test_sync_dirty_bitmap_missing(self):
>           self.assert_no_active_block_jobs()
>           self.files.append(self.err_img)
> diff --git a/tests/qemu-iotests/124.out b/tests/qemu-iotests/124.out
> index 36376be..e56cae0 100644
> --- a/tests/qemu-iotests/124.out
> +++ b/tests/qemu-iotests/124.out
> @@ -1,5 +1,5 @@
> -..........
> +...........
>   ----------------------------------------------------------------------
> -Ran 10 tests
> +Ran 11 tests
>   
>   OK


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/11] iotests: add transactional failure race test
  2016-10-12 11:26   ` Vladimir Sementsov-Ogievskiy
@ 2016-10-12 16:09     ` John Snow
  0 siblings, 0 replies; 51+ messages in thread
From: John Snow @ 2016-10-12 16:09 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, famz, jcody, qemu-devel, stefanha



On 10/12/2016 07:26 AM, Vladimir Sementsov-Ogievskiy wrote:
> it is almost a duplication of test_transaction_failure, I think it would
> be better to make separate do_test_transaction_failure with parameter
> and two wrappers
>

Yes, sorry -- I missed that for this iteration, but I'll act on it for 
the next iteration.

> On 01.10.2016 01:00, John Snow wrote:
>> Add a regression test for the case found by Vladimir.
>>
>> Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>   tests/qemu-iotests/124     | 91
>> ++++++++++++++++++++++++++++++++++++++++++++++
>>   tests/qemu-iotests/124.out |  4 +-
>>   2 files changed, 93 insertions(+), 2 deletions(-)
>>
>> diff --git a/tests/qemu-iotests/124 b/tests/qemu-iotests/124
>> index 2f0bc24..b8f7dad 100644
>> --- a/tests/qemu-iotests/124
>> +++ b/tests/qemu-iotests/124
>> @@ -512,6 +512,97 @@ class
>> TestIncrementalBackup(TestIncrementalBackupBase):
>>           self.check_backups()
>>     +    def test_transaction_failure_race(self):
>> +        '''Test: Verify that transactions with jobs that have no data to
>> +        transfer do not cause race conditions in the cancellation of
>> the entire
>> +        transaction job group.
>> +        '''
>> +
>> +        # Create a second drive, with pattern:
>> +        drive1 = self.add_node('drive1')
>> +        self.img_create(drive1['file'], drive1['fmt'])
>> +        io_write_patterns(drive1['file'], (('0x14', 0, 512),
>> +                                           ('0x5d', '1M', '32k'),
>> +                                           ('0xcd', '32M', '124k')))
>> +
>> +        # Create a blkdebug interface to this img as 'drive1'
>> +        result = self.vm.qmp('blockdev-add', options={
>> +            'node-name': drive1['id'],
>> +            'driver': drive1['fmt'],
>> +            'file': {
>> +                'driver': 'blkdebug',
>> +                'image': {
>> +                    'driver': 'file',
>> +                    'filename': drive1['file']
>> +                },
>> +                'set-state': [{
>> +                    'event': 'flush_to_disk',
>> +                    'state': 1,
>> +                    'new_state': 2
>> +                }],
>> +                'inject-error': [{
>> +                    'event': 'read_aio',
>> +                    'errno': 5,
>> +                    'state': 2,
>> +                    'immediately': False,
>> +                    'once': True
>> +                }],
>> +            }
>> +        })
>> +        self.assert_qmp(result, 'return', {})
>> +
>> +        # Create bitmaps and full backups for both drives
>> +        drive0 = self.drives[0]
>> +        dr0bm0 = self.add_bitmap('bitmap0', drive0)
>> +        dr1bm0 = self.add_bitmap('bitmap0', drive1)
>> +        self.create_anchor_backup(drive0)
>> +        self.create_anchor_backup(drive1)
>> +        self.assert_no_active_block_jobs()
>> +        self.assertFalse(self.vm.get_qmp_events(wait=False))
>> +
>> +        # Emulate some writes
>> +        self.hmp_io_writes(drive1['id'], (('0xba', 0, 512),
>> +                                          ('0xef', '16M', '256k'),
>> +                                          ('0x46', '32736k', '64k')))
>> +
>> +        # Create incremental backup targets
>> +        target0 = self.prepare_backup(dr0bm0)
>> +        target1 = self.prepare_backup(dr1bm0)
>> +
>> +        # Ask for a new incremental backup per-each drive, expecting
>> drive1's
>> +        # backup to fail and attempt to cancel the empty drive0 job.
>> +        transaction = [
>> +            transaction_drive_backup(drive0['id'], target0,
>> sync='incremental',
>> +                                     format=drive0['fmt'],
>> mode='existing',
>> +                                     bitmap=dr0bm0.name),
>> +            transaction_drive_backup(drive1['id'], target1,
>> sync='incremental',
>> +                                     format=drive1['fmt'],
>> mode='existing',
>> +                                     bitmap=dr1bm0.name)
>> +        ]
>> +        result = self.vm.qmp('transaction', actions=transaction,
>> +                             properties={'completion-mode': 'grouped'} )
>> +        self.assert_qmp(result, 'return', {})
>> +
>> +        # Observe that drive0's backup is cancelled and drive1
>> completes with
>> +        # an error.
>> +        self.wait_qmp_backup_cancelled(drive0['id'])
>> +        self.assertFalse(self.wait_qmp_backup(drive1['id']))
>> +        error = self.vm.event_wait('BLOCK_JOB_ERROR')
>> +        self.assert_qmp(error, 'data', {'device': drive1['id'],
>> +                                        'action': 'report',
>> +                                        'operation': 'read'})
>> +        self.assertFalse(self.vm.get_qmp_events(wait=False))
>> +        self.assert_no_active_block_jobs()
>> +
>> +        # Delete drive0's successful target and eliminate our record
>> of the
>> +        # unsuccessful drive1 target.
>> +        dr0bm0.del_target()
>> +        dr1bm0.del_target()
>> +
>> +        self.vm.shutdown()
>> +
>> +
>> +
>>       def test_sync_dirty_bitmap_missing(self):
>>           self.assert_no_active_block_jobs()
>>           self.files.append(self.err_img)
>> diff --git a/tests/qemu-iotests/124.out b/tests/qemu-iotests/124.out
>> index 36376be..e56cae0 100644
>> --- a/tests/qemu-iotests/124.out
>> +++ b/tests/qemu-iotests/124.out
>> @@ -1,5 +1,5 @@
>> -..........
>> +...........
>>   ----------------------------------------------------------------------
>> -Ran 10 tests
>> +Ran 11 tests
>>     OK
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/11] blockjobs: Always use block_job_get_aio_context
  2016-10-07  7:49       ` Paolo Bonzini
@ 2016-10-13  0:49         ` John Snow
  2016-10-13  9:03           ` Paolo Bonzini
  0 siblings, 1 reply; 51+ messages in thread
From: John Snow @ 2016-10-13  0:49 UTC (permalink / raw)
  To: Paolo Bonzini, Kevin Wolf
  Cc: stefanha, vsementsov, famz, qemu-devel, qemu-block

As context to everyone else as to why I'm going down the rabbit hole of 
trying to remove external references to AioContext at all, see 
https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg00795.html

On 10/07/2016 03:49 AM, Paolo Bonzini wrote:
>
>
> On 06/10/2016 22:22, John Snow wrote:
>> Calls .complete(), for which the only implementation is
>> mirror_complete. Uh, this actually seems messy. Looks like there's
>> nothing to prevent us from calling this after we've already told it to
>> complete once.
>
> Yeah, it should have an
>
>     if (s->should_complete) {
>         return;
>     }
>
> at the beginning.  I have other mirror.c patches in my queue so I can
> take care of that.
>

Or something up the stack at block_job_complete so it's not up to job 
implementations? What if the next implementer "forgets."

>> block_job_cancel and block_job_complete are different.
>>
>> block_job_cancel is called in many places, but we can just add a similar
>> block_job_user_cancel if we wanted a version which takes care to acquire
>> context and one that does not. (Or we could just acquire the context
>> regardless, but Paolo warned me ominously that recursive locks are EVIL.
>> He sounded serious.)
>
> Not that many places:
>
> - block_job_finish_sync calls it, and you can just release/reacquire
> around the call to "finish(job, &local_err)".
>

This makes me a little nervous because we went through the trouble of 
creating this callback, but we're going to assume we know that it's a 
public interface that will take the lock for itself (or otherwise does 
not require a lock.)

In practice it works, but it seems needlessly mystifying in terms of 
proving correctness.

> - there are two callers in blockdev.c, and you can just remove the
> acquire/release from blockdev.c if you push it in block_job_cancel.
>

Makes sense; I don't like the association of (bs :: job) here anyway. 
Again we're grabbing context for a job where that job may not even be 
running.

> As to block_job_cancel_sync:
>

Which I didn't audit, because no callers use job->blk to get the 
AioContext before calling this; they use bs if bs->job is present.

> - replication_stop is not acquiring s->secondary_disk->bs's AioContext.
>

Seems like a bug on their part. Would be fixed by having cancel acquire 
context for itself.

> - there is no need to hold the AioContext between ->prepare and ->clean.
>  My suggestion is to ref the blockjob after storing it in state->job
> (you probably should do that anyway) and unref'ing it in ->clean.  Then
> you can call again let block_job_cancel_sync(bs->job) take the
> AioContext, which it will do in block_job_finish_sync.

Yeah, I should be reffing it anyway.

The rest of this... What I think you mean is acquiring and releasing the 
context as needed for EACH of prepare, commit, abort, and clean as 
necessary, right?

And then in this case, it simply wouldn't be necessary for abort, as the 
sync cancel would do it for us.

>
>> block_job_complete has no direct callers outside of QMP, but it is also
>> used as a callback by block_job_complete_sync, used in qemu-img for
>> run_block_job. I can probably rewrite qemu_img to avoid this usage.
>
> No need to: qemu-img is not acquiring the AioContext, so it's okay to
> let block_job_complete do that (like block_job_cancel,
> block_job_complete will be called by block_job_finish_sync without the
> AioContext acquired).
>

Eh? Oh, you're right, it just gets it for the sake of aio_poll.

> Paolo
>


Alright.


Say I *do* push the acquisitions down into blockjob.c. What benefit does 
that provide? Won't I still need the block_job_get_aio_context() 
function (At least internally) to know which context to acquire? This 
would preclude you from deleting it.

Plus... we remove some fairly simple locking mechanisms and then inflate 
it tenfold. I'm not convinced this is an improvement.

As context and a refresher (for me when I re-read this email in 12 
hours,) there are three places externally that are using an AioContext 
lock as acquired from *within* a BlockJob, excluding those that acquire 
a context separately from a Job and use that to reason that accesses to 
the job are safe (For example, blockdev_mark_auto_del.)

(1) QMP interface for job management
(2) bdrv_drain_all, in block/io.c


(1) AFAICT, the QMP interface is concerned with assuring itself it has 
unique access to the BlockJob structure itself, and it doesn't really 
authentically care about the AIOContext itself -- just race-free access 
to the Job.

This is not necessarily buggy today because, even though we grab the 
BlockBackend's context unconditionally, we already know the main/monitor 
thread is not accessing the blockjob. It's still silly, though.

(2) bdrv_drain_all appears to be worried about the same thing; we just 
need to safely deliver pause/resume messages.

I'm less sure about where this can run from, and suspect that if the job 
has deferred to main that this could be buggy. If bdrv_drain_all is 
called from context A and the job is running on context M having 
deferred from B, we may lock against context B (from A) and have racey 
access from between A/M. (Maybe?)



It looks like all of these usages don't actually really care what 
context they're getting, just that they're getting safe access to the 
job itself. If you want to decouple jobs from their BlockBackend's AIO 
Context, perhaps we just need to replace these by a simple mutex...?


As it stands, though, pushing AIOContext acquisitions down into 
blockjob.c isn't actually going to fix anything, is it? Why not just 
leave it be for right now?

Would you be terribly offended if I left this patch as-is for now and we 
can work on removing the AioContext locks afterwards, or are you adamant 
that I cooperate in getting block_job_get_aio_context removed before I 
add more usages?


Sorry I'm being so obtuse about this all.

--js

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/11] blockjobs: Always use block_job_get_aio_context
  2016-10-13  0:49         ` John Snow
@ 2016-10-13  9:03           ` Paolo Bonzini
  0 siblings, 0 replies; 51+ messages in thread
From: Paolo Bonzini @ 2016-10-13  9:03 UTC (permalink / raw)
  To: John Snow, Kevin Wolf; +Cc: stefanha, vsementsov, famz, qemu-devel, qemu-block



On 13/10/2016 02:49, John Snow wrote:
> As context to everyone else as to why I'm going down the rabbit hole of
> trying to remove external references to AioContext at all, see
> https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg00795.html
> 
> On 10/07/2016 03:49 AM, Paolo Bonzini wrote:
>>
>>
>> On 06/10/2016 22:22, John Snow wrote:
>>> Calls .complete(), for which the only implementation is
>>> mirror_complete. Uh, this actually seems messy. Looks like there's
>>> nothing to prevent us from calling this after we've already told it to
>>> complete once.
>>
>> Yeah, it should have an
>>
>>     if (s->should_complete) {
>>         return;
>>     }
>>
>> at the beginning.  I have other mirror.c patches in my queue so I can
>> take care of that.
>>
> 
> Or something up the stack at block_job_complete so it's not up to job
> implementations? What if the next implementer "forgets."

Yeah, that would require moving s->should_complete up to BlockJob.

>>> block_job_cancel and block_job_complete are different.
>>>
>>> block_job_cancel is called in many places, but we can just add a similar
>>> block_job_user_cancel if we wanted a version which takes care to acquire
>>> context and one that does not. (Or we could just acquire the context
>>> regardless, but Paolo warned me ominously that recursive locks are EVIL.
>>> He sounded serious.)
>>
>> Not that many places:
>>
>> - block_job_finish_sync calls it, and you can just release/reacquire
>> around the call to "finish(job, &local_err)".
> 
> This makes me a little nervous because we went through the trouble of
> creating this callback, but we're going to assume we know that it's a
> public interface that will take the lock for itself (or otherwise does
> not require a lock.)
> 
> In practice it works, but it seems needlessly mystifying in terms of
> proving correctness.

It's _much_ easier to assume that all callbacks take the lock
themselves.  It's counterintuitive (just like the idea that recursive
locks are bad :)), but the point is that if you second guess the
callbacks and invoke them you might get the locking order wrong.  This
ends up with ugly AB-BA deadlocks.

>> - replication_stop is not acquiring s->secondary_disk->bs's AioContext.
> 
> Seems like a bug on their part. Would be fixed by having cancel acquire
> context for itself.

Yep.

> Yeah, I should be reffing it anyway.
> 
> The rest of this... What I think you mean is acquiring and releasing the
> context as needed for EACH of prepare, commit, abort, and clean as
> necessary, right?
> 
> And then in this case, it simply wouldn't be necessary for abort, as the
> sync cancel would do it for us.

Right.

> Alright.
> 
> Say I *do* push the acquisitions down into blockjob.c. What benefit does
> that provide? Won't I still need the block_job_get_aio_context()
> function (At least internally) to know which context to acquire? This
> would preclude you from deleting it.
> 
> Plus... we remove some fairly simple locking mechanisms and then inflate
> it tenfold. I'm not convinced this is an improvement.

The improvement is that you can now replace the huge lock with a smaller
one (you don't have to do it now, but you could).  Furthermore, the
small lock is a) non-recursive b) a leaf lock.  This means that you've
removed a lot of opportunities for deadlock, and generally made things
easier to reason about.

Furthermore, with large locks you never know what they actually protect;
cue the confusion between aio_context_acquire/release and
bdrv_drained_begin/end.  And it's easier to forget them if you force the
caller to do it; and we have an example with replication.

Again, it can be counterintuitive that it's better, but it is. :)

> (1) QMP interface for job management
> (2) bdrv_drain_all, in block/io.c
> 
> (1) AFAICT, the QMP interface is concerned with assuring itself it has
> unique access to the BlockJob structure itself, and it doesn't really
> authentically care about the AIOContext itself -- just race-free access
> to the Job.

Yes.  The protection you need here is mostly against concurrent
completion of the job (and concurrent manipulation of fields such as
job->busy, job->paused, etc.

> This is not necessarily buggy today because, even though we grab the
> BlockBackend's context unconditionally, we already know the main/monitor
> thread is not accessing the blockjob. It's still silly, though.
> 
> (2) bdrv_drain_all appears to be worried about the same thing; we just
> need to safely deliver pause/resume messages.
> 
> I'm less sure about where this can run from, and suspect that if the job
> has deferred to main that this could be buggy. If bdrv_drain_all is
> called from context A and the job is running on context M having
> deferred from B, we may lock against context B (from A) and have racey
> access from between A/M. (Maybe?)

bdrv_drain_all is BQL-only (because it acquires more than one AioContext).

> Would you be terribly offended if I left this patch as-is for now and we
> can work on removing the AioContext locks afterwards, or are you adamant
> that I cooperate in getting block_job_get_aio_context removed before I
> add more usages?

Removing block_job_get_aio_context is necessary to fix dataplane bugs,
so I'd really prefer to have that in 2.8.  But why do you actually need
this patch at all?  You can do neither thing---not push the lock down
and not add more usages---which is always the best. :)

Paolo

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/11] blockjob: refactor backup_start as backup_job_create
  2016-10-11  9:35         ` Kevin Wolf
@ 2016-10-17  8:59           ` Fam Zheng
  0 siblings, 0 replies; 51+ messages in thread
From: Fam Zheng @ 2016-10-17  8:59 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: John Snow, qemu-block, vsementsov, jcody, qemu-devel, stefanha,
	Paolo Bonzini

On Tue, 10/11 11:35, Kevin Wolf wrote:
> > >By the way, why did we allow to add a 'bitmap' option for DriveBackup
> > >without adding it to BlockdevBackup at the same time?
> > 
> > I don't remember. I'm not sure anyone ever audited it to convince
> > themselves it was a useful or safe thing to do. I believe at the
> > time I was pushing for bitmaps in DriveBackup, Fam was still
> > authoring the BlockdevBackup interface.
> 
> Hm, maybe that's why. I checked the commit dates of both (and there
> BlockdevBackup was earlier), but I didn't check the development history.
> 
> Should we add it now or is it a bad idea?

Yes, we should add it. I'll send a separate patch. Thanks for catching that.

Fam

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/11] blockjob: add block_job_start
  2016-10-06 22:44     ` John Snow
@ 2016-10-17 18:00       ` John Snow
  0 siblings, 0 replies; 51+ messages in thread
From: John Snow @ 2016-10-17 18:00 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, vsementsov, famz, stefanha, jcody, eblake, qemu-devel



On 10/06/2016 06:44 PM, John Snow wrote:
>
>
> On 10/05/2016 11:17 AM, Kevin Wolf wrote:
>> Am 01.10.2016 um 00:00 hat John Snow geschrieben:
>>> Instead of automatically starting jobs at creation time via backup_start
>>> et al, we'd like to return a job object pointer that can be started
>>> manually at later point in time.
>>>
>>> For now, add the block_job_start mechanism and start the jobs
>>> automatically as we have been doing, with conversions job-by-job coming
>>> in later patches.
>>>
>>> Of note: cancellation of unstarted jobs will perform all the normal
>>> cleanup as if the job had started, particularly abort and clean. The
>>> only difference is that we will not emit any events, because the job
>>> never actually started.
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>
>> Should we make sure that jobs are only added to the block_jobs list once
>> they are started? It doesn't sound like a good idea to make a job
>> without a coroutine user-accessible.
>>
>> Kevin
>>
>
> That would certainly help limit exposure to a potentially dangerous
> object, but some operations on these un-started jobs are still perfectly
> reasonable, like cancellation. Even things like "set speed" are
> perfectly reasonable on an unstarted job.
>
> I'd rather just verify that having an accessible job cannot be operated
> on unsafely via the public interface, even though that's more work.
>
> So here's the list:
>
> -block_job_next: Harmless.
>
> -block_job_get: Harmless.
>
> -block_job_set_speed: Depends on the set_speed implementation, but
> backup_set_speed makes no assumptions and that's the only job I am
> attempting to convert in this series.
>
> -block_job_cancel: Edited to specifically support pre-started jobs in
> this patch.
>
> -block_job_complete: Edited to check for a coroutine specifically, but
> even without, a job will not be able to become ready without running first.
>
> -block_job_query: Harmless* (*I could probably expose a 'started'
> variable so that libvirt didn't get confused as to why there are jobs
> that exist but are not busy nor paused.)
>
> -block_job_pause: Harmless**
>
> -block_job_user_pause: Harmless**
>
> -block_job_user_paused: Harmless, if meaningless.
>
> -block_job_resume: **We will attempt to call block_job_enter, but
> conditional on job->co, we do nothing, so it's harmless if not
> misleading that you can pause/resume to your heart's content.
>
> -block_job_user_resume: ^ http://i.imgur.com/2zYxrIe.png ^
>
> -block_job_cancel_sync: Safe, due to edits to block_job_cancel. Polling
> loop WILL complete as normal, because all jobs will finish through
> block_job_completed one way or another.
>
> -block_job_cancel_sync_all: As safe as the above.
>
> -block_job_complete_sync: Safe, complete will return error for unstarted
> jobs.
>
> -block_job_iostatus_reset: Harmless, I think -- backup does not
> implement this method. (Huh, *no* jobs implement iostatus_reset at the
> moment...)
>
> -block_job_txn_new: Doesn't operate on jobs.
>
> -block_job_txn_unref: Also doesn't operate on jobs.
>
> -block_job_get_aio_context: Harmless.
>
> -block_job_txn_add_job: Safe and intended! There is trickery here,
> though, as once a job is introduced into transactions it opens it up to
> the private interface. This adds the following functions to considerations:
>
> -block_job_completed: Safe, does not assume a coroutine anywhere.
>
> -block_job_completed_single: Specifically updated to be context-aware of
> if we are pre-started or not. This is the "real" completion mechanism
> for BlockJobs that gets run for transactional OR individual jobs.
>
> -block_job_completed_txn_abort: ***BUG***! The problem with the patch as
> I've sent it is that cancel calls completed (under the premise that
> nobody else would ever get to be able to), but we call both cancel AND
> completed from this function, which will cause us to call completed
> twice on pre-started jobs. I will fix this for the next version.
>

Actually, I was mistaken. The way I wrote it is fine, and it will work 
like this:

qmp_transaction encounters an error during prepare and calls the abort 
method for each action.

- The abort method calls block_job_cancel on the not-yet-started job,
- Which in turn calls block_job_completed with ret = -ECANCELED

which then either calls:
- block_job_completed_single with a failure mode (which calls .abort and 
.clean) if we are not in a transaction, or
- block_job_completed_txn_abort

txn_abort is going to:

1) Set txn->aborting to true, and
2) Cancel every other job in the transaction using block_job_cancel_sync.

These jobs will also come to txn_abort, but since txn->aborting is true, 
will become a NOP.

Then, finally, we will call block_job_completed_single on every job in 
the transaction which will perform our proper cleanup by calling .abort 
and .clean for each job.

No duplication, I was mistaken...!

> -block_job_completed_txn_success: Should never be called; if it IS, the
> presence of an unstarted job in the transaction will cause an early
> return. And even if I am utterly wrong and every job in the transaction
> completes successfully (somehow?) calling block_job_completed_single is
> perfectly safe by design.
>
>
> Everything else is either internal to block job instances or the
> blockjob core.
>
>
> There may be:
> (A) Bugs in my code/thinking, or
> (B) Improvements we can make to the readability,
>
> but I believe that this is (Apart from the mentioned bug) not dangerous.
>
> --js

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2016-10-17 18:00 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-30 22:00 [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition John Snow
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 01/11] blockjob: fix dead pointer in txn list John Snow
2016-10-05 13:43   ` Kevin Wolf
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions John Snow
2016-10-05 13:43   ` Kevin Wolf
2016-10-05 18:49     ` John Snow
2016-10-05 19:24       ` Eric Blake
2016-10-05 21:00         ` John Snow
2016-10-10 16:45           ` Kashyap Chamarthy
2016-10-10 18:36             ` John Snow
2016-10-10 19:28               ` Eric Blake
2016-10-11 13:32                 ` Kashyap Chamarthy
2016-10-06  7:44       ` Kevin Wolf
2016-10-06 16:57         ` John Snow
2016-10-06 18:16           ` Eric Blake
2016-10-06 18:19             ` John Snow
2016-10-11  9:50       ` Markus Armbruster
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 03/11] Blockjobs: Internalize user_pause logic John Snow
2016-10-04  0:57   ` Jeff Cody
2016-10-04  2:46     ` John Snow
2016-10-04 18:35     ` John Snow
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 04/11] blockjobs: Always use block_job_get_aio_context John Snow
2016-10-05 14:02   ` Kevin Wolf
2016-10-06 20:22     ` John Snow
2016-10-07  7:49       ` Paolo Bonzini
2016-10-13  0:49         ` John Snow
2016-10-13  9:03           ` Paolo Bonzini
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 05/11] blockjobs: split interface into public/private John Snow
2016-10-05 14:17   ` Kevin Wolf
2016-10-05 16:20     ` John Snow
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 06/11] blockjobs: fix documentation John Snow
2016-10-05 15:03   ` Kevin Wolf
2016-10-05 16:22     ` John Snow
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 07/11] blockjob: add .clean property John Snow
2016-10-12 11:11   ` Vladimir Sementsov-Ogievskiy
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 08/11] blockjob: add .start field John Snow
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 09/11] blockjob: add block_job_start John Snow
2016-10-05 15:17   ` Kevin Wolf
2016-10-06 22:44     ` John Snow
2016-10-17 18:00       ` John Snow
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 10/11] blockjob: refactor backup_start as backup_job_create John Snow
2016-10-07 18:39   ` John Snow
2016-10-10  8:57     ` Kevin Wolf
2016-10-10 22:51       ` John Snow
2016-10-11  8:56         ` Paolo Bonzini
2016-10-11  9:35         ` Kevin Wolf
2016-10-17  8:59           ` Fam Zheng
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 11/11] iotests: add transactional failure race test John Snow
2016-10-12 11:26   ` Vladimir Sementsov-Ogievskiy
2016-10-12 16:09     ` John Snow
2016-09-30 22:22 ` [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition no-reply

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.