All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping
@ 2017-10-04  1:52 John Snow
  2017-10-04  1:52 ` [Qemu-devel] [PATCH v2 1/4] blockjob: add persistent property John Snow
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: John Snow @ 2017-10-04  1:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, pkrempa, jtc, qemu-devel, John Snow

For jobs that complete when a monitor isn't looking, there's no way to
tell what the job's final return code was. We need to allow jobs to
remain in the list until queried for reliable management.

V2:
 - Added tests!
 - Changed property name (Jeff, Paolo)

RFC:
The next version will add tests for transactions.
Kevin, can you please take a look at bdrv_is_root_node and how it is
used with respect to do_drive_backup? I suspect that in this case that
"is root" should actually be "true", but a node in use by a job has
two roles; child_root and child_job, so it starts returning false here.

That's fine because we prevent a collision that way, but it makes the
error messages pretty bad and misleading. Do you have a quick suggestion?
(Should I just amend the loop to allow non-root nodes as long as they
happen to be jobs so that the job creation code can check permissions?)

John Snow (4):
  blockjob: add persistent property
  qmp: add block-job-reap command
  blockjob: expose persistent property
  iotests: test manual job reaping

 block/backup.c               |  20 ++--
 block/commit.c               |   2 +-
 block/mirror.c               |   2 +-
 block/replication.c          |   5 +-
 block/stream.c               |   2 +-
 block/trace-events           |   1 +
 blockdev.c                   |  28 +++++-
 blockjob.c                   |  46 ++++++++-
 include/block/block_int.h    |   8 +-
 include/block/blockjob.h     |  21 ++++
 include/block/blockjob_int.h |   2 +-
 qapi/block-core.json         |  49 ++++++++--
 tests/qemu-iotests/056       | 227 +++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/056.out   |   4 +-
 tests/test-blockjob-txn.c    |   2 +-
 tests/test-blockjob.c        |   2 +-
 16 files changed, 384 insertions(+), 37 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v2 1/4] blockjob: add persistent property
  2017-10-04  1:52 [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping John Snow
@ 2017-10-04  1:52 ` John Snow
  2017-10-04  1:52 ` [Qemu-devel] [PATCH v2 2/4] qmp: add block-job-reap command John Snow
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: John Snow @ 2017-10-04  1:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, pkrempa, jtc, qemu-devel, John Snow

Add a persistent (manually reap) property to block jobs that forces
them to linger in the block job list (visible to QMP queries) until
the user explicitly dismisses them via QMP.

The reap command itself is implemented in the next commit, and the
feature is exposed to drive-backup and blockdev-backup in the subsequent
commit.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c               | 20 +++++++++----------
 block/commit.c               |  2 +-
 block/mirror.c               |  2 +-
 block/replication.c          |  5 +++--
 block/stream.c               |  2 +-
 blockdev.c                   |  8 ++++----
 blockjob.c                   | 46 ++++++++++++++++++++++++++++++++++++++++++--
 include/block/block_int.h    |  8 +++++---
 include/block/blockjob.h     | 21 ++++++++++++++++++++
 include/block/blockjob_int.h |  2 +-
 qapi/block-core.json         |  7 ++++---
 tests/test-blockjob-txn.c    |  2 +-
 tests/test-blockjob.c        |  2 +-
 13 files changed, 97 insertions(+), 30 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 517c300..93ac194 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -532,15 +532,15 @@ static const BlockJobDriver backup_job_driver = {
     .drain                  = backup_drain,
 };
 
-BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
-                  BlockDriverState *target, int64_t speed,
-                  MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
-                  bool compress,
-                  BlockdevOnError on_source_error,
-                  BlockdevOnError on_target_error,
-                  int creation_flags,
-                  BlockCompletionFunc *cb, void *opaque,
-                  BlockJobTxn *txn, Error **errp)
+BlockJob *backup_job_create(const char *job_id, bool persistent,
+                            BlockDriverState *bs, BlockDriverState *target,
+                            int64_t speed, MirrorSyncMode sync_mode,
+                            BdrvDirtyBitmap *sync_bitmap, bool compress,
+                            BlockdevOnError on_source_error,
+                            BlockdevOnError on_target_error,
+                            int creation_flags,
+                            BlockCompletionFunc *cb, void *opaque,
+                            BlockJobTxn *txn, Error **errp)
 {
     int64_t len;
     BlockDriverInfo bdi;
@@ -608,7 +608,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     }
 
     /* job->common.len is fixed, so we can't allow resize */
-    job = block_job_create(job_id, &backup_job_driver, bs,
+    job = block_job_create(job_id, &backup_job_driver, persistent, bs,
                            BLK_PERM_CONSISTENT_READ,
                            BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
                            BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD,
diff --git a/block/commit.c b/block/commit.c
index 8f0e835..308a5fd 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -304,7 +304,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
         return;
     }
 
-    s = block_job_create(job_id, &commit_job_driver, bs, 0, BLK_PERM_ALL,
+    s = block_job_create(job_id, &commit_job_driver, false, bs, 0, BLK_PERM_ALL,
                          speed, BLOCK_JOB_DEFAULT, NULL, NULL, errp);
     if (!s) {
         return;
diff --git a/block/mirror.c b/block/mirror.c
index 6f5cb9f..013e73a 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1180,7 +1180,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
     }
 
     /* Make sure that the source is not resized while the job is running */
-    s = block_job_create(job_id, driver, mirror_top_bs,
+    s = block_job_create(job_id, driver, false, mirror_top_bs,
                          BLK_PERM_CONSISTENT_READ,
                          BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
                          BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD, speed,
diff --git a/block/replication.c b/block/replication.c
index 3a4e682..6c59f00 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -539,8 +539,9 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         bdrv_op_block_all(top_bs, s->blocker);
         bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
 
-        job = backup_job_create(NULL, s->secondary_disk->bs, s->hidden_disk->bs,
-                                0, MIRROR_SYNC_MODE_NONE, NULL, false,
+        job = backup_job_create(NULL, false, s->secondary_disk->bs,
+                                s->hidden_disk->bs, 0, MIRROR_SYNC_MODE_NONE,
+                                NULL, false,
                                 BLOCKDEV_ON_ERROR_REPORT,
                                 BLOCKDEV_ON_ERROR_REPORT, BLOCK_JOB_INTERNAL,
                                 backup_job_completed, bs, NULL, &local_err);
diff --git a/block/stream.c b/block/stream.c
index e6f7234..c644f34 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -244,7 +244,7 @@ void stream_start(const char *job_id, BlockDriverState *bs,
     /* Prevent concurrent jobs trying to modify the graph structure here, we
      * already have our own plans. Also don't allow resize as the image size is
      * queried only at the job start and then cached. */
-    s = block_job_create(job_id, &stream_job_driver, bs,
+    s = block_job_create(job_id, &stream_job_driver, false, bs,
                          BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
                          BLK_PERM_GRAPH_MOD,
                          BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
diff --git a/blockdev.c b/blockdev.c
index 56a6b24..eeb4986 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3290,8 +3290,8 @@ static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
         }
     }
 
-    job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
-                            backup->sync, bmap, backup->compress,
+    job = backup_job_create(backup->job_id, false, bs, target_bs,
+                            backup->speed, backup->sync, bmap, backup->compress,
                             backup->on_source_error, backup->on_target_error,
                             BLOCK_JOB_DEFAULT, NULL, NULL, txn, &local_err);
     bdrv_unref(target_bs);
@@ -3369,8 +3369,8 @@ BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
             goto out;
         }
     }
-    job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
-                            backup->sync, NULL, backup->compress,
+    job = backup_job_create(backup->job_id, false, bs, target_bs,
+                            backup->speed, backup->sync, NULL, backup->compress,
                             backup->on_source_error, backup->on_target_error,
                             BLOCK_JOB_DEFAULT, NULL, NULL, txn, &local_err);
     if (local_err != NULL) {
diff --git a/blockjob.c b/blockjob.c
index 3a0c491..1efac45 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -336,6 +336,8 @@ static void block_job_completed_single(BlockJob *job)
         QLIST_REMOVE(job, txn_list);
         block_job_txn_unref(job->txn);
     }
+
+    job->finished = true;
     block_job_unref(job);
 }
 
@@ -352,6 +354,13 @@ static void block_job_cancel_async(BlockJob *job)
     job->cancelled = true;
 }
 
+static void block_job_do_reap(BlockJob *job)
+{
+    assert(job && job->persistent == true);
+    job->persistent = false;
+    block_job_unref(job);
+}
+
 static int block_job_finish_sync(BlockJob *job,
                                  void (*finish)(BlockJob *, Error **errp),
                                  Error **errp)
@@ -381,6 +390,9 @@ static int block_job_finish_sync(BlockJob *job,
         aio_poll(qemu_get_aio_context(), true);
     }
     ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
+    if (job->persistent) {
+        block_job_do_reap(job);
+    }
     block_job_unref(job);
     return ret;
 }
@@ -483,6 +495,26 @@ void block_job_complete(BlockJob *job, Error **errp)
     job->driver->complete(job, errp);
 }
 
+void block_job_reap(BlockJob **jobptr, Error **errp)
+{
+    BlockJob *job = *jobptr;
+    /* similarly to _complete, this is QMP-interface only. */
+    assert(job->id);
+    if (!job->persistent) {
+        error_setg(errp, "The active block job '%s' was not started with "
+                   "\'manual-reap\': true, and so cannot be reaped as it will "
+                   "clean up after itself automatically", job->id);
+        return;
+    } else if (!job->finished) {
+        error_setg(errp, "The active block job '%s' has not yet terminated, "
+                   "and cannot be reaped yet", job->id);
+        return;
+    }
+
+    block_job_do_reap(job);
+    *jobptr = NULL;
+}
+
 void block_job_user_pause(BlockJob *job)
 {
     job->user_paused = true;
@@ -505,7 +537,9 @@ void block_job_user_resume(BlockJob *job)
 
 void block_job_cancel(BlockJob *job)
 {
-    if (block_job_started(job)) {
+    if (job->finished) {
+        return;
+    } else if (block_job_started(job)) {
         block_job_cancel_async(job);
         block_job_enter(job);
     } else {
@@ -562,6 +596,7 @@ BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
     info->speed     = job->speed;
     info->io_status = job->iostatus;
     info->ready     = job->ready;
+    info->finished  = job->finished;
     return info;
 }
 
@@ -609,7 +644,7 @@ static void block_job_event_completed(BlockJob *job, const char *msg)
  */
 
 void *block_job_create(const char *job_id, const BlockJobDriver *driver,
-                       BlockDriverState *bs, uint64_t perm,
+                       bool persistent, BlockDriverState *bs, uint64_t perm,
                        uint64_t shared_perm, int64_t speed, int flags,
                        BlockCompletionFunc *cb, void *opaque, Error **errp)
 {
@@ -664,6 +699,7 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     job->paused        = true;
     job->pause_count   = 1;
     job->refcnt        = 1;
+    job->persistent   = persistent;
 
     error_setg(&job->blocker, "block device is in use by block job: %s",
                BlockJobType_str(driver->job_type));
@@ -689,6 +725,12 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
             return NULL;
         }
     }
+
+    /* Hang on to an extra reference on behalf of the QMP monitor */
+    if (job->persistent) {
+        block_job_ref(job);
+    }
+
     return job;
 }
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 99abe2c..a576357 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -944,6 +944,8 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
  * backup_job_create:
  * @job_id: The id of the newly-created job, or %NULL to use the
  * device name of @bs.
+ * @persistent: Whether or not this job, when completed, will need to be
+ *               manually reaped via block-job-reap or not.
  * @bs: Block device to operate on.
  * @target: Block device to write to.
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
@@ -960,9 +962,9 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
  * Create a backup operation on @bs.  Clusters in @bs are written to @target
  * until the job is cancelled or manually completed.
  */
-BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
-                            BlockDriverState *target, int64_t speed,
-                            MirrorSyncMode sync_mode,
+BlockJob *backup_job_create(const char *job_id, bool persistent,
+                            BlockDriverState *bs, BlockDriverState *target,
+                            int64_t speed, MirrorSyncMode sync_mode,
                             BdrvDirtyBitmap *sync_bitmap,
                             bool compress,
                             BlockdevOnError on_source_error,
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 67c0968..cd1b41b 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -135,6 +135,15 @@ typedef struct BlockJob {
      */
     int ret;
 
+    /* True if this job must remain present post-completion until
+     * it can be queried and removed via block-job-reap. */
+    bool persistent;
+
+    /* True if and only if persistent is true and the job has finished
+     * calling either abort or commit (if applicable) and can be reaped
+     * via interactive user command. */
+    bool finished;
+
     /** Non-NULL if this job is part of a transaction */
     BlockJobTxn *txn;
     QLIST_ENTRY(BlockJob) txn_list;
@@ -227,6 +236,18 @@ void block_job_cancel(BlockJob *job);
 void block_job_complete(BlockJob *job, Error **errp);
 
 /**
+ * block_job_reap:
+ * @job: The job to be reaped.
+ * @errp: Error object.
+ *
+ * For completed or canceled jobs, remove the job object
+ * from the block_job_query list and perform final cleanup.
+ *
+ * Note that at this stage, the job cannot be rolled back.
+ */
+void block_job_reap(BlockJob **job, Error **errp);
+
+/**
  * block_job_query:
  * @job: The job to get information about.
  *
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index f13ad05..c19275c 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -132,7 +132,7 @@ struct BlockJobDriver {
  * called from a wrapper that is specific to the job type.
  */
 void *block_job_create(const char *job_id, const BlockJobDriver *driver,
-                       BlockDriverState *bs, uint64_t perm,
+                       bool persistent, BlockDriverState *bs, uint64_t perm,
                        uint64_t shared_perm, int64_t speed, int flags,
                        BlockCompletionFunc *cb, void *opaque, Error **errp);
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 750bb0c..a4f5e10 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -984,9 +984,10 @@
 # Since: 1.1
 ##
 { 'struct': 'BlockJobInfo',
-  'data': {'type': 'str', 'device': 'str', 'len': 'int',
-           'offset': 'int', 'busy': 'bool', 'paused': 'bool', 'speed': 'int',
-           'io-status': 'BlockDeviceIoStatus', 'ready': 'bool'} }
+  'data': {'type': 'str', 'device': 'str', 'len': 'int', 'offset': 'int',
+           'busy': 'bool', 'paused': 'bool', 'speed': 'int',
+           'io-status': 'BlockDeviceIoStatus', 'ready': 'bool',
+           'finished': 'bool'} }
 
 ##
 # @query-block-jobs:
diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
index c77343f..5ab0e22 100644
--- a/tests/test-blockjob-txn.c
+++ b/tests/test-blockjob-txn.c
@@ -101,7 +101,7 @@ static BlockJob *test_block_job_start(unsigned int iterations,
     g_assert_nonnull(bs);
 
     snprintf(job_id, sizeof(job_id), "job%u", counter++);
-    s = block_job_create(job_id, &test_block_job_driver, bs,
+    s = block_job_create(job_id, &test_block_job_driver, false, bs,
                          0, BLK_PERM_ALL, 0, BLOCK_JOB_DEFAULT,
                          test_block_job_cb, data, &error_abort);
     s->iterations = iterations;
diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
index 23bdf1a..9fc2785 100644
--- a/tests/test-blockjob.c
+++ b/tests/test-blockjob.c
@@ -30,7 +30,7 @@ static BlockJob *do_test_id(BlockBackend *blk, const char *id,
     BlockJob *job;
     Error *errp = NULL;
 
-    job = block_job_create(id, &test_block_job_driver, blk_bs(blk),
+    job = block_job_create(id, &test_block_job_driver, false, blk_bs(blk),
                            0, BLK_PERM_ALL, 0, BLOCK_JOB_DEFAULT, block_job_cb,
                            NULL, &errp);
     if (should_succeed) {
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v2 2/4] qmp: add block-job-reap command
  2017-10-04  1:52 [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping John Snow
  2017-10-04  1:52 ` [Qemu-devel] [PATCH v2 1/4] blockjob: add persistent property John Snow
@ 2017-10-04  1:52 ` John Snow
  2017-10-04  1:52 ` [Qemu-devel] [PATCH v2 3/4] blockjob: expose persistent property John Snow
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: John Snow @ 2017-10-04  1:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, pkrempa, jtc, qemu-devel, John Snow

For jobs that have finished (either completed or canceled), allow the
user to dismiss the job's status reports via block-job-reap.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/trace-events   |  1 +
 blockdev.c           | 14 ++++++++++++++
 qapi/block-core.json | 21 +++++++++++++++++++++
 3 files changed, 36 insertions(+)

diff --git a/block/trace-events b/block/trace-events
index 25dd5a3..9580efa 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -46,6 +46,7 @@ qmp_block_job_cancel(void *job) "job %p"
 qmp_block_job_pause(void *job) "job %p"
 qmp_block_job_resume(void *job) "job %p"
 qmp_block_job_complete(void *job) "job %p"
+qmp_block_job_reap(void *job) "job %p"
 qmp_block_stream(void *bs, void *job) "bs %p job %p"
 
 # block/file-win32.c
diff --git a/blockdev.c b/blockdev.c
index eeb4986..c08d6fb 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3766,6 +3766,20 @@ void qmp_block_job_complete(const char *device, Error **errp)
     aio_context_release(aio_context);
 }
 
+void qmp_block_job_reap(const char *device, Error **errp)
+{
+    AioContext *aio_context;
+    BlockJob *job = find_block_job(device, &aio_context, errp);
+
+    if (!job) {
+        return;
+    }
+
+    trace_qmp_block_job_reap(job);
+    block_job_reap(&job, errp);
+    aio_context_release(aio_context);
+}
+
 void qmp_change_backing_file(const char *device,
                              const char *image_node_name,
                              const char *backing_file,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index a4f5e10..5cce49d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2161,6 +2161,27 @@
 { 'command': 'block-job-complete', 'data': { 'device': 'str' } }
 
 ##
+# @block-job-reap:
+#
+# For jobs that have already completed, remove them from the block-job-query
+# list. This command only needs to be run for jobs which were started with the
+# persistent=true option.
+#
+# This command will refuse to operate on any job that has not yet reached
+# its terminal state. "cancel" or "complete" will still need to be used as
+# appropriate.
+#
+# @device: The job identifier. This used to be a device name (hence
+#          the name of the parameter), but since QEMU 2.7 it can have
+#          other values.
+#
+# Returns: Nothing on success
+#
+# Since: 2.11
+##
+{ 'command': 'block-job-reap', 'data': { 'device': 'str' } }
+
+##
 # @BlockdevDiscardOptions:
 #
 # Determines how to handle discard requests.
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v2 3/4] blockjob: expose persistent property
  2017-10-04  1:52 [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping John Snow
  2017-10-04  1:52 ` [Qemu-devel] [PATCH v2 1/4] blockjob: add persistent property John Snow
  2017-10-04  1:52 ` [Qemu-devel] [PATCH v2 2/4] qmp: add block-job-reap command John Snow
@ 2017-10-04  1:52 ` John Snow
  2017-10-04  1:52 ` [Qemu-devel] [PATCH v2 4/4] iotests: test manual job reaping John Snow
  2017-10-04 18:27 ` [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit " Kevin Wolf
  4 siblings, 0 replies; 13+ messages in thread
From: John Snow @ 2017-10-04  1:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, pkrempa, jtc, qemu-devel, John Snow

For drive-backup and blockdev-backup, expose the persistent
property, having it default to false. There are no universal
creation parameters, so it must be added to each job type that
it makes sense for individually.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 blockdev.c           | 10 ++++++++--
 qapi/block-core.json | 21 ++++++++++++++++-----
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index c08d6fb..8bbbf2a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3198,6 +3198,9 @@ static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
     if (!backup->has_job_id) {
         backup->job_id = NULL;
     }
+    if (!backup->has_persistent) {
+        backup->persistent = false;
+    }
     if (!backup->has_compress) {
         backup->compress = false;
     }
@@ -3290,7 +3293,7 @@ static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
         }
     }
 
-    job = backup_job_create(backup->job_id, false, bs, target_bs,
+    job = backup_job_create(backup->job_id, backup->persistent, bs, target_bs,
                             backup->speed, backup->sync, bmap, backup->compress,
                             backup->on_source_error, backup->on_target_error,
                             BLOCK_JOB_DEFAULT, NULL, NULL, txn, &local_err);
@@ -3341,6 +3344,9 @@ BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
     if (!backup->has_job_id) {
         backup->job_id = NULL;
     }
+    if (!backup->has_persistent) {
+        backup->persistent = false;
+    }
     if (!backup->has_compress) {
         backup->compress = false;
     }
@@ -3369,7 +3375,7 @@ BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
             goto out;
         }
     }
-    job = backup_job_create(backup->job_id, false, bs, target_bs,
+    job = backup_job_create(backup->job_id, backup->persistent, bs, target_bs,
                             backup->speed, backup->sync, NULL, backup->compress,
                             backup->on_source_error, backup->on_target_error,
                             BLOCK_JOB_DEFAULT, NULL, NULL, txn, &local_err);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5cce49d..4c7c17b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1104,6 +1104,11 @@
 # @job-id: identifier for the newly-created block job. If
 #          omitted, the device name will be used. (Since 2.7)
 #
+# @persistent: Whether or not the job created by this command needs to be
+#              cleaned up manually via block-job-reap or not. The default is
+#              false. When true, the job will remain in a "completed" state
+#              until reaped manually with block-job-reap. (Since 2.11)
+#
 # @device: the device name or node-name of a root node which should be copied.
 #
 # @target: the target of the new image. If the file exists, or if it
@@ -1144,9 +1149,10 @@
 # Since: 1.6
 ##
 { 'struct': 'DriveBackup',
-  'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
-            '*format': 'str', 'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
-            '*speed': 'int', '*bitmap': 'str', '*compress': 'bool',
+  'data': { '*job-id': 'str', '*persistent': 'bool', 'device': 'str',
+            'target': 'str', '*format': 'str', 'sync': 'MirrorSyncMode',
+            '*mode': 'NewImageMode', '*speed': 'int', '*bitmap': 'str',
+            '*compress': 'bool',
             '*on-source-error': 'BlockdevOnError',
             '*on-target-error': 'BlockdevOnError' } }
 
@@ -1156,6 +1162,11 @@
 # @job-id: identifier for the newly-created block job. If
 #          omitted, the device name will be used. (Since 2.7)
 #
+# @persistent: Whether or not the job created by this command needs to be
+#              cleaned up manually via block-job-reap or not. The default is
+#              false. When true, the job will remain in a "completed" state
+#              until reaped manually with block-job-reap. (Since 2.11)
+#
 # @device: the device name or node-name of a root node which should be copied.
 #
 # @target: the device name or node-name of the backup target node.
@@ -1185,8 +1196,8 @@
 # Since: 2.3
 ##
 { 'struct': 'BlockdevBackup',
-  'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
-            'sync': 'MirrorSyncMode',
+  'data': { '*job-id': 'str', '*persistent': 'bool', 'device': 'str',
+            'target': 'str', 'sync': 'MirrorSyncMode',
             '*speed': 'int',
             '*compress': 'bool',
             '*on-source-error': 'BlockdevOnError',
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v2 4/4] iotests: test manual job reaping
  2017-10-04  1:52 [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping John Snow
                   ` (2 preceding siblings ...)
  2017-10-04  1:52 ` [Qemu-devel] [PATCH v2 3/4] blockjob: expose persistent property John Snow
@ 2017-10-04  1:52 ` John Snow
  2017-10-04 18:27 ` [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit " Kevin Wolf
  4 siblings, 0 replies; 13+ messages in thread
From: John Snow @ 2017-10-04  1:52 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, pkrempa, jtc, qemu-devel, John Snow

RFC: The error returned by a job creation command when that device
already has a job attached has become misleading; "Someone should
do something about that!"

Signed-off-by: John Snow <jsnow@redhat.com>
---
 tests/qemu-iotests/056     | 227 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/056.out |   4 +-
 2 files changed, 229 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/056 b/tests/qemu-iotests/056
index 04f2c3c..d6bed20 100755
--- a/tests/qemu-iotests/056
+++ b/tests/qemu-iotests/056
@@ -29,6 +29,26 @@ backing_img = os.path.join(iotests.test_dir, 'backing.img')
 test_img = os.path.join(iotests.test_dir, 'test.img')
 target_img = os.path.join(iotests.test_dir, 'target.img')
 
+def img_create(img, fmt=iotests.imgfmt, size='64M', **kwargs):
+    fullname = os.path.join(iotests.test_dir, '%s.%s' % (img, fmt))
+    optargs = []
+    for k,v in kwargs.iteritems():
+        optargs = optargs + ['-o', '%s=%s' % (k,v)]
+    args = ['create', '-f', fmt] + optargs + [fullname, size]
+    iotests.qemu_img(*args)
+    return fullname
+
+def try_remove(img):
+    try:
+        os.remove(img)
+    except OSError:
+        pass
+
+def io_write_patterns(img, patterns):
+    for pattern in patterns:
+        iotests.qemu_io('-c', 'write -P%s %s %s' % pattern, img)
+
+
 class TestSyncModesNoneAndTop(iotests.QMPTestCase):
     image_len = 64 * 1024 * 1024 # MB
 
@@ -108,5 +128,212 @@ class TestBeforeWriteNotifier(iotests.QMPTestCase):
         event = self.cancel_and_wait()
         self.assert_qmp(event, 'data/type', 'backup')
 
+class BackupTest(iotests.QMPTestCase):
+    def setUp(self):
+        self.vm = iotests.VM()
+        self.test_img = img_create('test')
+        self.dest_img = img_create('dest')
+        self.vm.add_drive(self.test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        try_remove(self.test_img)
+        try_remove(self.dest_img)
+
+    def hmp_io_writes(self, drive, patterns):
+        for pattern in patterns:
+            self.vm.hmp_qemu_io(drive, 'write -P%s %s %s' % pattern)
+        self.vm.hmp_qemu_io(drive, 'flush')
+
+    def qmp_backup_and_wait(self, cmd='drive-backup', serror=None,
+                            aerror=None, **kwargs):
+        return (self.qmp_backup(cmd, serror, **kwargs) and
+                self.qmp_backup_wait(kwargs['device'], aerror))
+
+    def qmp_backup(self, cmd='drive-backup',
+                   error=None, **kwargs):
+        self.assertTrue('device' in kwargs)
+        res = self.vm.qmp(cmd, **kwargs)
+        if error:
+            self.assert_qmp(res, 'error/desc', error)
+            return False
+        self.assert_qmp(res, 'return', {})
+        return True
+
+    def qmp_backup_wait(self, device, error=None):
+        event = self.vm.event_wait(name="BLOCK_JOB_COMPLETED",
+                                   match={'data': {'device': device}})
+        self.assertNotEqual(event, None)
+        try:
+            failure = self.dictpath(event, 'data/error')
+        except AssertionError:
+            # Backup succeeded.
+            self.assert_qmp(event, 'data/offset', event['data']['len'])
+            return True
+        else:
+            # Failure.
+            self.assert_qmp(event, 'data/error', qerror)
+            return False
+
+    def test_reap_false(self):
+        res = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(res, 'return', [])
+        self.qmp_backup_and_wait(device='drive0', format=iotests.imgfmt,
+                                 sync='full', target=self.dest_img, persistent=False)
+        res = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(res, 'return', [])
+
+    def test_reap_true(self):
+        res = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(res, 'return', [])
+        self.qmp_backup_and_wait(device='drive0', format=iotests.imgfmt,
+                                 sync='full', target=self.dest_img, persistent=True)
+        res = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(res, 'return[0]/finished', True)
+        res = self.vm.qmp('block-job-reap', device='drive0')
+        self.assert_qmp(res, 'return', {})
+        res = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(res, 'return', [])
+
+    def test_reap_bad_id(self):
+        res = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(res, 'return', [])
+        res = self.vm.qmp('block-job-reap', device='foobar')
+        self.assert_qmp(res, 'error/class', 'DeviceNotActive')
+
+    def test_reap_collision(self):
+        res = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(res, 'return', [])
+        self.qmp_backup_and_wait(device='drive0', format=iotests.imgfmt,
+                                 sync='full', target=self.dest_img, persistent=True)
+        res = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(res, 'return[0]/finished', True)
+        # Leave zombie job un-reaped, observe a failure:
+        res = self.qmp_backup_and_wait(serror='Need a root block node',
+                                       device='drive0', format=iotests.imgfmt,
+                                       sync='full', target=self.dest_img,
+                                       persistent=True)
+        self.assertEqual(res, False)
+        # OK, reap the zombie.
+        res = self.vm.qmp('block-job-reap', device='drive0')
+        self.assert_qmp(res, 'return', {})
+        res = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(res, 'return', [])
+        # Ensure it's really gone.
+        self.qmp_backup_and_wait(device='drive0', format=iotests.imgfmt,
+                                 sync='full', target=self.dest_img, persistent=True)
+
+    def test_reap_premature(self):
+        res = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(res, 'return', [])
+        # Give blkdebug something to chew on
+        self.hmp_io_writes('drive0',
+                           (('0x9a', 0, 512),
+                           ('0x55', '8M', '352k'),
+                           ('0x78', '15872k', '1M')))
+        # Add destination node via blkdebug
+        res = self.vm.qmp('blockdev-add',
+                          node_name='target0',
+                          driver=iotests.imgfmt,
+                          file={
+                              'driver': 'blkdebug',
+                              'image': {
+                                  'driver': 'file',
+                                  'filename': self.dest_img
+                              },
+                              'inject-error': [{
+                                  'event': 'write_aio',
+                                  'errno': 5,
+                                  'immediately': False,
+                                  'once': True
+                              }],
+                          })
+        self.assert_qmp(res, 'return', {})
+
+        res = self.qmp_backup(cmd='blockdev-backup',
+                              device='drive0', target='target0',
+                              on_target_error='stop',
+                              sync='full',
+                              persistent=True)
+        self.assertTrue(res)
+        event = self.vm.event_wait(name="BLOCK_JOB_ERROR",
+                                   match={'data': {'device': 'drive0'}})
+        self.assertNotEqual(event, None)
+        # OK, job should be wedged
+        res = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(res, 'return[0]/finished', False)
+        res = self.vm.qmp('block-job-reap', device='drive0')
+        self.assert_qmp(res, 'error/desc',
+                        "The active block job 'drive0' has not yet terminated, and cannot be reaped yet")
+        res = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(res, 'return[0]/finished', False)
+        # OK, unstick job and move forward.
+        res = self.vm.qmp('block-job-resume', device='drive0')
+        self.assert_qmp(res, 'return', {})
+        res = self.qmp_backup_wait(device='drive0')
+        self.assertTrue(res)
+        # Job should now be languishing:
+        res = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(res, 'return[0]/finished', True)
+        res = self.vm.qmp('block-job-reap', device='drive0')
+        self.assert_qmp(res, 'return', {})
+        res = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(res, 'return', [])
+
+    def test_reap_erroneous(self):
+        '''Same as above test, but manual-reap is set to False.'''
+        res = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(res, 'return', [])
+        # Give blkdebug something to chew on
+        self.hmp_io_writes('drive0',
+                           (('0x9a', 0, 512),
+                           ('0x55', '8M', '352k'),
+                           ('0x78', '15872k', '1M')))
+        # Add destination node via blkdebug
+        res = self.vm.qmp('blockdev-add',
+                          node_name='target0',
+                          driver=iotests.imgfmt,
+                          file={
+                              'driver': 'blkdebug',
+                              'image': {
+                                  'driver': 'file',
+                                  'filename': self.dest_img
+                              },
+                              'inject-error': [{
+                                  'event': 'write_aio',
+                                  'errno': 5,
+                                  'immediately': False,
+                                  'once': True
+                              }],
+                          })
+        self.assert_qmp(res, 'return', {})
+
+        res = self.qmp_backup(cmd='blockdev-backup',
+                              device='drive0', target='target0',
+                              on_target_error='stop',
+                              sync='full',
+                              persistent=False)
+        self.assertTrue(res)
+        event = self.vm.event_wait(name="BLOCK_JOB_ERROR",
+                                   match={'data': {'device': 'drive0'}})
+        self.assertNotEqual(event, None)
+        # OK, job should be wedged
+        res = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(res, 'return[0]/finished', False)
+        res = self.vm.qmp('block-job-reap', device='drive0')
+        self.assert_qmp(res, 'error/desc',
+                        "The active block job 'drive0' was not started with 'manual-reap': true, and so cannot be reaped as it will clean up after itself automatically")
+        res = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(res, 'return[0]/finished', False)
+        # OK, unstick job and move forward.
+        res = self.vm.qmp('block-job-resume', device='drive0')
+        self.assert_qmp(res, 'return', {})
+        res = self.qmp_backup_wait(device='drive0')
+        self.assertTrue(res)
+        # Job should now be gone:
+        res = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(res, 'return', [])
+
 if __name__ == '__main__':
     iotests.main(supported_fmts=['qcow2', 'qed'])
diff --git a/tests/qemu-iotests/056.out b/tests/qemu-iotests/056.out
index 8d7e996..dae404e 100644
--- a/tests/qemu-iotests/056.out
+++ b/tests/qemu-iotests/056.out
@@ -1,5 +1,5 @@
-...
+.........
 ----------------------------------------------------------------------
-Ran 3 tests
+Ran 9 tests
 
 OK
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping
  2017-10-04  1:52 [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping John Snow
                   ` (3 preceding siblings ...)
  2017-10-04  1:52 ` [Qemu-devel] [PATCH v2 4/4] iotests: test manual job reaping John Snow
@ 2017-10-04 18:27 ` Kevin Wolf
  2017-10-05  1:46   ` John Snow
  4 siblings, 1 reply; 13+ messages in thread
From: Kevin Wolf @ 2017-10-04 18:27 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, pkrempa, jtc, qemu-devel

Am 04.10.2017 um 03:52 hat John Snow geschrieben:
> For jobs that complete when a monitor isn't looking, there's no way to
> tell what the job's final return code was. We need to allow jobs to
> remain in the list until queried for reliable management.

Just a short summary of what I discussed with John on IRC:

Another important reason why we want to have an explicit end of block
jobs is that job completion often makes changes to the graph. For a
management tool that manages the block graph on a node level, it is a
big problem if graph changes can happen at any point that can lead to
bad race conditions. Giving the management tool control over the end of
the block job makes it aware that graph changes happen.

This means that compared to this RFC series, we need to move the waiting
earlier in the process:

1. Block job is done and calls block_job_completed()
2. Wait for other block jobs in the same job transaction to complete
3. Send a (new) QMP event to the management tool to notify it that the
   job is ready to be reaped
4. On block-job-reap, call the .commit/.abort callbacks of the jobs in
   the transaction. They will do most of the work that is currently done
   in the completion callbacks, in particular any graph changes. If we
   need to allow error cases, we can add a .prepare_completion callback
   that can still let the whole transaction fail.
5. Send the final QMP completion event for each job in the transaction
   with the final error code. This is the existing BLOCK_JOB_COMPLETED
   or BLOCK_JOB_CANCELLED event.

The current RFC still executes .commit/.abort before block-job-reap, so
the graph changes happen too early to be under control of the management
tool.

> RFC:
> The next version will add tests for transactions.
> Kevin, can you please take a look at bdrv_is_root_node and how it is
> used with respect to do_drive_backup? I suspect that in this case that
> "is root" should actually be "true", but a node in use by a job has
> two roles; child_root and child_job, so it starts returning false here.
> 
> That's fine because we prevent a collision that way, but it makes the
> error messages pretty bad and misleading. Do you have a quick suggestion?
> (Should I just amend the loop to allow non-root nodes as long as they
> happen to be jobs so that the job creation code can check permissions?)

We kind of sidestepped this problem by deciding that there is no real
reason for the backup job to require the source to be a root node.

I'm not completely sure how we could easily get a better message while
still requiring a root node (and I suppose there are other places that
rightfully still require a root node). Ignoring children with child_job
feels ugly, but might be the best option.

Note that this will not make the conflicting command work suddenly,
because every node that has a child_job parent also has op blockers for
everything, but the error message should be less confusing than "is not
a root node".

Kevin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping
  2017-10-04 18:27 ` [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit " Kevin Wolf
@ 2017-10-05  1:46   ` John Snow
  2017-10-05 11:38     ` Kevin Wolf
  0 siblings, 1 reply; 13+ messages in thread
From: John Snow @ 2017-10-05  1:46 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, pkrempa, jtc, qemu-devel, Fam Zheng

"Oh boy, another email from John! I bet it's concisely worded."

Ha ha. I see my reputation precedes me.

"Holy crap dude, that's a lot of words you've typed down there!"

It's okay, skip to the "TLDR" for the conclusion I arrived at if you
don't care how I got there!

"Sigh, okay, John."

Yes!!

On 10/04/2017 02:27 PM, Kevin Wolf wrote:
> Am 04.10.2017 um 03:52 hat John Snow geschrieben:
>> For jobs that complete when a monitor isn't looking, there's no way to
>> tell what the job's final return code was. We need to allow jobs to
>> remain in the list until queried for reliable management.
> 
> Just a short summary of what I discussed with John on IRC:
> 
> Another important reason why we want to have an explicit end of block
> jobs is that job completion often makes changes to the graph. For a
> management tool that manages the block graph on a node level, it is a
> big problem if graph changes can happen at any point that can lead to
> bad race conditions. Giving the management tool control over the end of
> the block job makes it aware that graph changes happen.
> 
> This means that compared to this RFC series, we need to move the waiting
> earlier in the process:
> 
> 1. Block job is done and calls block_job_completed()
> 2. Wait for other block jobs in the same job transaction to complete
> 3. Send a (new) QMP event to the management tool to notify it that the
>    job is ready to be reaped

Oh, I suppose to distinguish it from "COMPLETED" in that sense, because
it isn't actually COMPLETED anymore under your vision, so it requires a
new event in this proposal.

This becomes a bit messy, bumping up against both "READY" and a
transactional pre-completed state semantically. Uhhhh, for lack of a
better word in the timeframe I'd like to complete this email in, let's
call this new theoretical state "PENDING"?

So presently, a job goes through the following life cycle:

1. CREATED --> RUNNING
2. RUNNING <--> PAUSED
3. RUNNING --> (READY | COMPLETED | CANCELED)
4. READY --> (COMPLETED | CANCELED)
5. (COMPLETED | CANCELED) --> NULL

Where we emit an event upon entering "READY", "COMPLETED" or "CANCELED".

My patchset here effectively adds a new optional terminal state:

5. (COMPLETED | CANCELED) --> (NULL | FINISHED)
6. FINISHED --> NULL

Where the last transition from FINISHED to NULL is performed via
block-job-reap, but notably we get to re-use the events for COMPLETED |
CANCELED to indicate the availability of this operation to be performed.

What happens in the case of transactionally managed jobs presently is
that jobs get stuck as they enter the COMPLETED|CANCELED state. If you
were to query them they behave as if they're RUNNING. There's no
discrete state that exists for this presently.

You can cancel these as normal, but I'm not sure if you can pause them,
actually. (Note to self, test that.) I think they have almost exactly
like any RUNNING job would.

What you're proposing here is the formalization of the pre-completion
state ("PENDING") and that in this state, a job outside of a transaction
can exist until it is manually told to finally, once and for all,
actually finish its business. We can use this as a hook to perform and
last graph changes so they will not come as a surprise to the management
application. Maybe this operation should be called "Finalize". Again,
for lack of a better term in the timeframe, I'll refer to it as such for
now.

I think importantly this actually distinguishes it from "reap" in that
the commit phase can still fail, so we can't let the job follow that
auto transition back to the NULL state. That means that we'd need both a
block-job-finalize AND a block-job-reap to accomplish both of the
following goals:

(1) Allow the management application to control graph changes [libvirt]
(2) Prevent auto transitions to NULL state for asynchronous clients [A
requirement mentioned by Virtuozzo]

It looks like this, overall:

1. CREATED --> RUNNING
2. RUNNING <--> PAUSED
3. RUNNING --> PENDING
	via: auto transition
	event: BLOCK_JOB_PENDING
4. PENDING --> (COMPLETED | CANCELED)
	via: block-job-finalize
	event: (COMPLETED | ERROR)
5. (COMPLETED | CANCELED) --> (NULL | FINISHED)
	via: auto transition
	event: none
6. FINISHED --> NULL
	via: block-job-reap
	event: none

"Hey, wait, where did the ready state go?"

Good question, I'm not sure how it fits in to something like "PENDING"
which is, I think NEARLY equivalent to a proposed pending->finalized
transition. Is it meaningful to have a job go from
running->ready->pending or running->pending->ready? I actually doubt it is.

The only difference really is that not all jobs use the READY -->
COMPLETE transition. We could implement it into those jobs if the job is
created with some kind of boolean, like

auto-complete: true/false

where this defaults to true, the legacy behavior.

For "mirror" we would just omit allowing people to set this setting
(auto-complete is effectively always off) because it is requisite and
essential to the operation of the job.

"OK, maybe that could work; what about transactions?"

Transactions have to be a bit of their own case, I think.

Essentially jobs that transactionally complete already hang around in
pending until all jobs complete, so they can do so together.

What we don't really want is to force users to have to dig into the jobs
manually and complete each one individually. (I think...?) or have to
deal with the managerial nightmare of having some autocomplete, some
that don't, etc.

What I propose for transactions is:

1) Add a new property for transactions also named "auto-complete"
2) If the property is set to false, Jobs in this transaction will have
their auto-complete values forcibly set to false
3) Once all jobs in the transaction are set to pending, emit an event
("TRANSACTION_READY", for instance)
4) Allow the transaction to be manually committed.

The alternative is to leave it on a per-job basis and just stipulate
that any bizarre or inconvenient configurations are the fault of the
caller. Leaving transactions completely untouched should theoretically work.

> 4. On block-job-reap, call the .commit/.abort callbacks of the jobs in
>    the transaction. They will do most of the work that is currently done
>    in the completion callbacks, in particular any graph changes. If we
>    need to allow error cases, we can add a .prepare_completion callback
>    that can still let the whole transaction fail.

Makes sense by analogy. Probably worth having anyway. I moved some
likely-to-deadlock code from the backup cleanup into .commit even when
it runs outside of transactions. Other jobs can likely benefit from some
simplified assumptions by running in that context, too.

> 5. Send the final QMP completion event for each job in the transaction
>    with the final error code. This is the existing BLOCK_JOB_COMPLETED
>    or BLOCK_JOB_CANCELLED event.
> 
> The current RFC still executes .commit/.abort before block-job-reap, so
> the graph changes happen too early to be under control of the management
> tool.
> 
>> RFC:
>> The next version will add tests for transactions.
>> Kevin, can you please take a look at bdrv_is_root_node and how it is
>> used with respect to do_drive_backup? I suspect that in this case that
>> "is root" should actually be "true", but a node in use by a job has
>> two roles; child_root and child_job, so it starts returning false here.
>>
>> That's fine because we prevent a collision that way, but it makes the
>> error messages pretty bad and misleading. Do you have a quick suggestion?
>> (Should I just amend the loop to allow non-root nodes as long as they
>> happen to be jobs so that the job creation code can check permissions?)
> 
> We kind of sidestepped this problem by deciding that there is no real
> reason for the backup job to require the source to be a root node.
> 
> I'm not completely sure how we could easily get a better message while
> still requiring a root node (and I suppose there are other places that
> rightfully still require a root node). Ignoring children with child_job
> feels ugly, but might be the best option.
> 
> Note that this will not make the conflicting command work suddenly,
> because every node that has a child_job parent also has op blockers for
> everything, but the error message should be less confusing than "is not
> a root node".
> 
> Kevin
> 

TLDR:

- I think we may need to have optional manual completion steps both
before and after the job .prepare()/.commit()/.abort() phase.
- Before, like "block-job-complete" to allow graph changes to be under
management tool control, and
- After, so that final job success status can be queried even if the
event was missed.

Proposal:

(1) Extend block-job-complete semantics to all jobs that opt in via
"auto-complete: false" which is not allowed to be set for mirror jobs
(2) If the modern overloading of the BLOCK_JOB_READY event is apt to
cause confusion for existing tools, a new event BLOCK_JOB_PENDING could
be emitted instead for any job capable of accepting the
auto-complete=true/false parameter and emit it upon reaching this state.
(3) Continue forward with this patchset's current persistent/reap
nomenclature to prevent automatic cleanup if desired, and
(4) Implement transaction-wide settings for auto-complete alongside a
new "transaction complete" event to allow for a transaction-wide
"complete" command.

Hopefully that's not too braindead.

--js

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping
  2017-10-05  1:46   ` John Snow
@ 2017-10-05 11:38     ` Kevin Wolf
  2017-10-05 18:17       ` John Snow
                         ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Kevin Wolf @ 2017-10-05 11:38 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, pkrempa, jtc, qemu-devel, Fam Zheng

Am 05.10.2017 um 03:46 hat John Snow geschrieben:
> On 10/04/2017 02:27 PM, Kevin Wolf wrote:
> > Am 04.10.2017 um 03:52 hat John Snow geschrieben:
> >> For jobs that complete when a monitor isn't looking, there's no way to
> >> tell what the job's final return code was. We need to allow jobs to
> >> remain in the list until queried for reliable management.
> > 
> > Just a short summary of what I discussed with John on IRC:
> > 
> > Another important reason why we want to have an explicit end of block
> > jobs is that job completion often makes changes to the graph. For a
> > management tool that manages the block graph on a node level, it is a
> > big problem if graph changes can happen at any point that can lead to
> > bad race conditions. Giving the management tool control over the end of
> > the block job makes it aware that graph changes happen.
> > 
> > This means that compared to this RFC series, we need to move the waiting
> > earlier in the process:
> > 
> > 1. Block job is done and calls block_job_completed()
> > 2. Wait for other block jobs in the same job transaction to complete
> > 3. Send a (new) QMP event to the management tool to notify it that the
> >    job is ready to be reaped
> 
> Oh, I suppose to distinguish it from "COMPLETED" in that sense, because
> it isn't actually COMPLETED anymore under your vision, so it requires a
> new event in this proposal.
> 
> This becomes a bit messy, bumping up against both "READY" and a
> transactional pre-completed state semantically. Uhhhh, for lack of a
> better word in the timeframe I'd like to complete this email in, let's
> call this new theoretical state "PENDING"?
> 
> So presently, a job goes through the following life cycle:
> 
> 1. CREATED --> RUNNING
> 2. RUNNING <--> PAUSED
> 3. RUNNING --> (READY | COMPLETED | CANCELED)
> 4. READY --> (COMPLETED | CANCELED)
> 5. (COMPLETED | CANCELED) --> NULL
> 
> Where we emit an event upon entering "READY", "COMPLETED" or "CANCELED".

Roughly yes, but it's not quite true because you can still pause and
unpause ready jobs. So READY and PAUSED are kind of orthogonal.

> My patchset here effectively adds a new optional terminal state:
> 
> 5. (COMPLETED | CANCELED) --> (NULL | FINISHED)
> 6. FINISHED --> NULL
> 
> Where the last transition from FINISHED to NULL is performed via
> block-job-reap, but notably we get to re-use the events for COMPLETED |
> CANCELED to indicate the availability of this operation to be performed.
> 
> What happens in the case of transactionally managed jobs presently is
> that jobs get stuck as they enter the COMPLETED|CANCELED state. If you
> were to query them they behave as if they're RUNNING. There's no
> discrete state that exists for this presently.
> 
> You can cancel these as normal, but I'm not sure if you can pause them,
> actually. (Note to self, test that.) I think they have almost exactly
> like any RUNNING job would.

Except that they don't do any work any more. This is an mportant
difference for a mirror job which would normally keep copying new writes
until it sends the COMPLETED event. So when libvirt restarts and it sees
a "RUNNING" mirror job, it can't decide whether it is still copying
things or has already completed.

Looks like this is another reason why we want a separate state here.

> What you're proposing here is the formalization of the pre-completion
> state ("PENDING") and that in this state, a job outside of a transaction
> can exist until it is manually told to finally, once and for all,
> actually finish its business. We can use this as a hook to perform and
> last graph changes so they will not come as a surprise to the management
> application. Maybe this operation should be called "Finalize". Again,
> for lack of a better term in the timeframe, I'll refer to it as such for
> now.

"finalize" doesn't sound too bad.

> I think importantly this actually distinguishes it from "reap" in that
> the commit phase can still fail, so we can't let the job follow that
> auto transition back to the NULL state.

Let me see if I understand correctly: We want to make sure that the
management tool sees the final return value for the job. We have already
decided that events aren't enough for this because the management tool
could be restarted while we send the event, so the information is lost.
Having it as a return value of block-job-reap isn't enough either
because it could be lost the same way. We need a separate phase where
libvirt can query the return value and from which we don't automatically
transition away.

I'm afraid that you are right.

> That means that we'd need both a block-job-finalize AND a
> block-job-reap to accomplish both of the following goals:
> 
> (1) Allow the management application to control graph changes [libvirt]
> (2) Prevent auto transitions to NULL state for asynchronous clients [A
> requirement mentioned by Virtuozzo]

Okay, your (2) is another case that forbids auto-transitions to NULL.
What I described above is why it's relevant for libvirt, too.

> It looks like this, overall:
> 
> 1. CREATED --> RUNNING
> 2. RUNNING <--> PAUSED
> 3. RUNNING --> PENDING
> 	via: auto transition
> 	event: BLOCK_JOB_PENDING
> 4. PENDING --> (COMPLETED | CANCELED)
> 	via: block-job-finalize
> 	event: (COMPLETED | ERROR)
> 5. (COMPLETED | CANCELED) --> (NULL | FINISHED)
> 	via: auto transition
> 	event: none
> 6. FINISHED --> NULL
> 	via: block-job-reap
> 	event: none

Looks reasonable. A bit verbose with two new commands, but given the
requirements that's probably unavoidable.

> "Hey, wait, where did the ready state go?"
> 
> Good question, I'm not sure how it fits in to something like "PENDING"
> which is, I think NEARLY equivalent to a proposed pending->finalized
> transition. Is it meaningful to have a job go from
> running->ready->pending or running->pending->ready? I actually doubt it is.

I think it is, but only the former. If we leave the PAUSED state aside
for a moment (as I said above, it's orthogonal) and look at the details,
mirror works like this:

1.  CREATED --> RUNNING
2a. RUNNING --> READY | CANCELLED
        via: auto transition (when bulk copy is finished) / block-job-cancel
        event: BLOCK_JOB_READY
2b. READY --> READY (COMPLETING) | READY (CANCELLING)
        via: block-job-complete / block-job-cancel
        event: none
3.  READY (CANCELLING) --> CANCELLED
        via: auto transition (after draining in-flight mirror requests)
        event: BLOCK_JOB_CANCELLED
    or
    READY (COMPLETING) --> COMPLETED
        via: auto transition (when images are in sync)
        event: BLOCK_JOB_COMPLETED

In the new model, we transition to PENDING in step 3 instead.

> The only difference really is that not all jobs use the READY -->
> COMPLETE transition.

The really important difference is that in 2b you have a state that is
exited via auto transition.

PENDING is a state that is exited synchronously with block-job-finalize
(and having this defined point in time where graph changes occur etc. is
the whole point of the state), whereas READY is a state where
block-job-complete/cancel can request that it be left at the next
opportunity, but the exact point in time is unpredictable - it doesn't
happen during the execution of the QMP command anyway.

This is a fundamental difference that doesn't allow to treat READY and
PENDING the same.

> We could implement it into those jobs if the job is
> created with some kind of boolean, like
> 
> auto-complete: true/false
> 
> where this defaults to true, the legacy behavior.
> 
> For "mirror" we would just omit allowing people to set this setting
> (auto-complete is effectively always off) because it is requisite and
> essential to the operation of the job.

auto-complete=true would basically call block-job-finalize internally?

You can't conflate it with block-job-complete because READY and PENDING
have to stay separate, but it would make sense as auto-finalize.

> "OK, maybe that could work; what about transactions?"
> 
> Transactions have to be a bit of their own case, I think.
> 
> Essentially jobs that transactionally complete already hang around in
> pending until all jobs complete, so they can do so together.
> 
> What we don't really want is to force users to have to dig into the jobs
> manually and complete each one individually. (I think...?) or have to
> deal with the managerial nightmare of having some autocomplete, some
> that don't, etc.

I'm not sure about that. Don't users already have to send
block-job-complete to each job individually? Not doing the same for
block-job-finalize would be kind of inconsistent.

I also think that it would be good if a separately started block job
behaves like a transaction that contains a single job so that we don't
get different models for the two cases.

> What I propose for transactions is:
> 
> 1) Add a new property for transactions also named "auto-complete"
> 2) If the property is set to false, Jobs in this transaction will have
> their auto-complete values forcibly set to false
> 3) Once all jobs in the transaction are set to pending, emit an event
> ("TRANSACTION_READY", for instance)
> 4) Allow the transaction to be manually committed.

This requires that we introduce names for transactions and let them
be managed explicily by the user. Keeping things at the individual jobs
certainly makes the interface simpler.

I can't rule out that we won't want to make transactions explicitly
managed objects in the future for some reason, but that will probably be
something separate from the problem we're trying to solve now.

> The alternative is to leave it on a per-job basis and just stipulate
> that any bizarre or inconvenient configurations are the fault of the
> caller. Leaving transactions completely untouched should theoretically
> work.

Should be a lot easier. To integrate it into you state machine above:

3.  READY (COMPLETING) | READY (CANCELLING) --> DONE
        via: auto transition (block_job_completed() called; job doesn't
                              do any work any more)
        event: BLOCK_JOB_DONE
4a. DONE --> PENDING
        via: auto transition (all jobs in the transaction are DONE)
        event: BLOCK_JOB_PENDING
4b. PENDING --> COMPLETED | CANCELLED
        via: block-job-finalize
        event: COMPLETED | ERROR

The naming clearly leaves a lot to be desired, I'm just running out of
names that actually make sense... But I hope you can see the mechanism
that I have in mind.

> > 4. On block-job-reap, call the .commit/.abort callbacks of the jobs in
> >    the transaction. They will do most of the work that is currently done
> >    in the completion callbacks, in particular any graph changes. If we
> >    need to allow error cases, we can add a .prepare_completion callback
> >    that can still let the whole transaction fail.
> 
> Makes sense by analogy. Probably worth having anyway. I moved some
> likely-to-deadlock code from the backup cleanup into .commit even when
> it runs outside of transactions. Other jobs can likely benefit from some
> simplified assumptions by running in that context, too.
> 
> > 5. Send the final QMP completion event for each job in the transaction
> >    with the final error code. This is the existing BLOCK_JOB_COMPLETED
> >    or BLOCK_JOB_CANCELLED event.
> > 
> > The current RFC still executes .commit/.abort before block-job-reap, so
> > the graph changes happen too early to be under control of the management
> > tool.
> > 
> >> RFC:
> >> The next version will add tests for transactions.
> >> Kevin, can you please take a look at bdrv_is_root_node and how it is
> >> used with respect to do_drive_backup? I suspect that in this case that
> >> "is root" should actually be "true", but a node in use by a job has
> >> two roles; child_root and child_job, so it starts returning false here.
> >>
> >> That's fine because we prevent a collision that way, but it makes the
> >> error messages pretty bad and misleading. Do you have a quick suggestion?
> >> (Should I just amend the loop to allow non-root nodes as long as they
> >> happen to be jobs so that the job creation code can check permissions?)
> > 
> > We kind of sidestepped this problem by deciding that there is no real
> > reason for the backup job to require the source to be a root node.
> > 
> > I'm not completely sure how we could easily get a better message while
> > still requiring a root node (and I suppose there are other places that
> > rightfully still require a root node). Ignoring children with child_job
> > feels ugly, but might be the best option.
> > 
> > Note that this will not make the conflicting command work suddenly,
> > because every node that has a child_job parent also has op blockers for
> > everything, but the error message should be less confusing than "is not
> > a root node".
> > 
> > Kevin
> > 
> 
> TLDR:
> 
> - I think we may need to have optional manual completion steps both
> before and after the job .prepare()/.commit()/.abort() phase.
> - Before, like "block-job-complete" to allow graph changes to be under
> management tool control, and
> - After, so that final job success status can be queried even if the
> event was missed.

Agreed (except that I don't see what "block-job-complete" has to do with
it, this command is about an earlier transition).

> Proposal:
> 
> (1) Extend block-job-complete semantics to all jobs that opt in via
> "auto-complete: false" which is not allowed to be set for mirror jobs
> (2) If the modern overloading of the BLOCK_JOB_READY event is apt to
> cause confusion for existing tools, a new event BLOCK_JOB_PENDING could
> be emitted instead for any job capable of accepting the
> auto-complete=true/false parameter and emit it upon reaching this state.
> (3) Continue forward with this patchset's current persistent/reap
> nomenclature to prevent automatic cleanup if desired, and
> (4) Implement transaction-wide settings for auto-complete alongside a
> new "transaction complete" event to allow for a transaction-wide
> "complete" command.

Let me try to just consolidate all of the above into a single state
machine:

1.  CREATED --> RUNNING
        driver callback: .start
2a. RUNNING --> READY | CANCELLED
        via: auto transition (when bulk copy is finished) / block-job-cancel
        event: BLOCK_JOB_READY
2b. READY --> READY (COMPLETING) | READY (CANCELLING)
        via: block-job-complete / block-job-cancel
        event: none
        driver callback: .complete / none
3.  READY (CANCELLING | COMPLETING) --> DONE
        via: auto transition
             (CANCELLING: after draining in-flight mirror requests;
              COMPLETING: when images are in sync)
        event: BLOCK_JOB_DONE
4.  DONE --> PENDING
        via: auto transition (all jobs in the transaction are DONE)
        event: BLOCK_JOB_PENDING
5.  PENDING --> FINISHED
        via: block-job-finalize
        event: COMPLETED | CANCELLED
        driver callback: .prepare_finalize / .commit / .abort
6.  FINISHED --> NULL
        via: block-job-reap
        event: none
        driver callback: .clean

I removed COMPLETED/CANCELLED states because they are never externally
visible. You proposed an "auto transition" there, but the transition
would be immediately after the previous one, so clients always see
PENDING --> NULL | FINISHED.

We would have two booleans to make explicit transition automatically:

    auto-finalize for block-job-finalize (default: true)
    auto-reap     for block-job-reap     (default: true)

Both of them would be executed automatically as soon as the respective
commands would be available.

We could add more auto-* options for the remaining explicit transition
(block-job-complete/cancel in READY), but these are not important for
the problems we're trying to solve here. They might become interesting
if we do decide that we want a single copy block job instead of doing
similar things in mirror, commit and backup.

The naming needs some improvements (done -> pending -> finished looks
really odd), but does this make sense otherwise?

Kevin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping
  2017-10-05 11:38     ` Kevin Wolf
@ 2017-10-05 18:17       ` John Snow
  2017-10-09 14:30         ` Nikolay Shirokovskiy
  2017-10-06  3:56       ` John Snow
  2017-10-06  5:52       ` Markus Armbruster
  2 siblings, 1 reply; 13+ messages in thread
From: John Snow @ 2017-10-05 18:17 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, pkrempa, jtc, qemu-devel, Fam Zheng,
	Nikolay Shirokovskiy, Vladimir Sementsov-Ogievskiy

Nikolay: You mentioned a while ago that you had issues with incremental
backup's eventual return status being unknown. Can you please elaborate
for me why this is a problem?

I assume due to the long running of a backup job it's entirely possible
to imagine losing connection to QEMU and missing the event depending on
how long the interruption is.

Backup operations are expensive, so we need some definite way to catch
this return status.

Please let me know if you have any feedback to this thread.

On 10/05/2017 07:38 AM, Kevin Wolf wrote:
> Am 05.10.2017 um 03:46 hat John Snow geschrieben:
>> On 10/04/2017 02:27 PM, Kevin Wolf wrote:
>>> Am 04.10.2017 um 03:52 hat John Snow geschrieben:>>>> For jobs that complete when a monitor isn't looking, there's no way to
>>>> tell what the job's final return code was. We need to allow jobs to
>>>> remain in the list until queried for reliable management.
>>>
>>> Just a short summary of what I discussed with John on IRC:
>>>
>>> Another important reason why we want to have an explicit end of block
>>> jobs is that job completion often makes changes to the graph. For a
>>> management tool that manages the block graph on a node level, it is a
>>> big problem if graph changes can happen at any point that can lead to
>>> bad race conditions. Giving the management tool control over the end of
>>> the block job makes it aware that graph changes happen.
>>>
>>> This means that compared to this RFC series, we need to move the waiting
>>> earlier in the process:
>>>
>>> 1. Block job is done and calls block_job_completed()
>>> 2. Wait for other block jobs in the same job transaction to complete
>>> 3. Send a (new) QMP event to the management tool to notify it that the
>>>    job is ready to be reaped
>>
>> Oh, I suppose to distinguish it from "COMPLETED" in that sense, because
>> it isn't actually COMPLETED anymore under your vision, so it requires a
>> new event in this proposal.
>>
>> This becomes a bit messy, bumping up against both "READY" and a
>> transactional pre-completed state semantically. Uhhhh, for lack of a
>> better word in the timeframe I'd like to complete this email in, let's
>> call this new theoretical state "PENDING"?
>>
>> So presently, a job goes through the following life cycle:
>>
>> 1. CREATED --> RUNNING
>> 2. RUNNING <--> PAUSED
>> 3. RUNNING --> (READY | COMPLETED | CANCELED)
>> 4. READY --> (COMPLETED | CANCELED)
>> 5. (COMPLETED | CANCELED) --> NULL
>>
>> Where we emit an event upon entering "READY", "COMPLETED" or "CANCELED".
> 
> Roughly yes, but it's not quite true because you can still pause and
> unpause ready jobs. So READY and PAUSED are kind of orthogonal.
> 

But you cannot block-job-complete a running job, so I included it here
so we could keep the concept of the ready-to-complete state in mind.

>> My patchset here effectively adds a new optional terminal state:
>>
>> 5. (COMPLETED | CANCELED) --> (NULL | FINISHED)
>> 6. FINISHED --> NULL
>>
>> Where the last transition from FINISHED to NULL is performed via
>> block-job-reap, but notably we get to re-use the events for COMPLETED |
>> CANCELED to indicate the availability of this operation to be performed.
>>
>> What happens in the case of transactionally managed jobs presently is
>> that jobs get stuck as they enter the COMPLETED|CANCELED state. If you
>> were to query them they behave as if they're RUNNING. There's no
>> discrete state that exists for this presently.
>>
>> You can cancel these as normal, but I'm not sure if you can pause them,
>> actually. (Note to self, test that.) I think they have almost exactly
>> like any RUNNING job would.
> 
> Except that they don't do any work any more. This is an mportant
> difference for a mirror job which would normally keep copying new writes
> until it sends the COMPLETED event. So when libvirt restarts and it sees
> a "RUNNING" mirror job, it can't decide whether it is still copying
> things or has already completed.
> 
> Looks like this is another reason why we want a separate state here.

Yes, I realized as I was writing it that we have no real way to tell
that a job is simply pending completion.

> 
>> What you're proposing here is the formalization of the pre-completion
>> state ("PENDING") and that in this state, a job outside of a transaction
>> can exist until it is manually told to finally, once and for all,
>> actually finish its business. We can use this as a hook to perform and
>> last graph changes so they will not come as a surprise to the management
>> application. Maybe this operation should be called "Finalize". Again,
>> for lack of a better term in the timeframe, I'll refer to it as such for
>> now.
> 
> "finalize" doesn't sound too bad.
> 

Though taken altogether, the set of names we've accumulated is a little
ridiculous.

>> I think importantly this actually distinguishes it from "reap" in that
>> the commit phase can still fail, so we can't let the job follow that
>> auto transition back to the NULL state.
> 
> Let me see if I understand correctly: We want to make sure that the
> management tool sees the final return value for the job. We have already
> decided that events aren't enough for this because the management tool
> could be restarted while we send the event, so the information is lost.
> Having it as a return value of block-job-reap isn't enough either
> because it could be lost the same way. We need a separate phase where
> libvirt can query the return value and from which we don't automatically
> transition away.
> 
> I'm afraid that you are right.

There may be some wiggle room in asserting that some block-job-finalize
command should return the final status in its return value, but it's
clearly more flexible to not mandate this.

However, for existing QMP commands today there's no way to tell if a
command succeeded or failed if you somehow lose the synchronous reply,
so maybe this is actually OK ...

> 
>> That means that we'd need both a block-job-finalize AND a
>> block-job-reap to accomplish both of the following goals:
>>
>> (1) Allow the management application to control graph changes [libvirt]
>> (2) Prevent auto transitions to NULL state for asynchronous clients [A
>> requirement mentioned by Virtuozzo]
> 
> Okay, your (2) is another case that forbids auto-transitions to NULL.
> What I described above is why it's relevant for libvirt, too.
> 
>> It looks like this, overall:
>>
>> 1. CREATED --> RUNNING
>> 2. RUNNING <--> PAUSED
>> 3. RUNNING --> PENDING
>> 	via: auto transition
>> 	event: BLOCK_JOB_PENDING
>> 4. PENDING --> (COMPLETED | CANCELED)
>> 	via: block-job-finalize
>> 	event: (COMPLETED | ERROR)
>> 5. (COMPLETED | CANCELED) --> (NULL | FINISHED)
>> 	via: auto transition
>> 	event: none
>> 6. FINISHED --> NULL
>> 	via: block-job-reap
>> 	event: none
> 
> Looks reasonable. A bit verbose with two new commands, but given the
> requirements that's probably unavoidable.
> 

Unless I can avoid the actual reap action at the end by returning the
return code synchronously.

>> "Hey, wait, where did the ready state go?"
>>
>> Good question, I'm not sure how it fits in to something like "PENDING"
>> which is, I think NEARLY equivalent to a proposed pending->finalized
>> transition. Is it meaningful to have a job go from
>> running->ready->pending or running->pending->ready? I actually doubt it is.
> 
> I think it is, but only the former. If we leave the PAUSED state aside
> for a moment (as I said above, it's orthogonal) and look at the details,
> mirror works like this:
> 
> 1.  CREATED --> RUNNING
> 2a. RUNNING --> READY | CANCELLED
>         via: auto transition (when bulk copy is finished) / block-job-cancel
>         event: BLOCK_JOB_READY
> 2b. READY --> READY (COMPLETING) | READY (CANCELLING)
>         via: block-job-complete / block-job-cancel
>         event: none
> 3.  READY (CANCELLING) --> CANCELLED
>         via: auto transition (after draining in-flight mirror requests)
>         event: BLOCK_JOB_CANCELLED
>     or
>     READY (COMPLETING) --> COMPLETED
>         via: auto transition (when images are in sync)
>         event: BLOCK_JOB_COMPLETED
> 
> In the new model, we transition to PENDING in step 3 instead.
> 
>> The only difference really is that not all jobs use the READY -->
>> COMPLETE transition.
> 
> The really important difference is that in 2b you have a state that is
> exited via auto transition.
> 
> PENDING is a state that is exited synchronously with block-job-finalize
> (and having this defined point in time where graph changes occur etc. is
> the whole point of the state), whereas READY is a state where
> block-job-complete/cancel can request that it be left at the next
> opportunity, but the exact point in time is unpredictable - it doesn't
> happen during the execution of the QMP command anyway.
> 
> This is a fundamental difference that doesn't allow to treat READY and
> PENDING the same.
> 

I think you're right. That transition from READY isn't as synchronous as
I was making it out to be. Tch.

>> We could implement it into those jobs if the job is
>> created with some kind of boolean, like
>>
>> auto-complete: true/false
>>
>> where this defaults to true, the legacy behavior.
>>
>> For "mirror" we would just omit allowing people to set this setting
>> (auto-complete is effectively always off) because it is requisite and
>> essential to the operation of the job.
> 
> auto-complete=true would basically call block-job-finalize internally?
> 
> You can't conflate it with block-job-complete because READY and PENDING
> have to stay separate, but it would make sense as auto-finalize.
> 

I was conflating it. Or, at least trying to.

>> "OK, maybe that could work; what about transactions?"
>>
>> Transactions have to be a bit of their own case, I think.
>>
>> Essentially jobs that transactionally complete already hang around in
>> pending until all jobs complete, so they can do so together.
>>
>> What we don't really want is to force users to have to dig into the jobs
>> manually and complete each one individually. (I think...?) or have to
>> deal with the managerial nightmare of having some autocomplete, some
>> that don't, etc.
> 
> I'm not sure about that. Don't users already have to send
> block-job-complete to each job individually? Not doing the same for
> block-job-finalize would be kind of inconsistent.
> 

Yes, but that's only theoretical since we don't have support for any of
those kind of jobs in transactions yet, either!

In my head here I was thinking about a transaction-wide finalize to
replace "complete," but you're pointing out I can't mix the two.

That said, there's no reason we couldn't *make* that kind of completion
a transaction-wide, event, but... maybe that's too messy. Maybe
everything should just be left individual...

> I also think that it would be good if a separately started block job
> behaves like a transaction that contains a single job so that we don't
> get different models for the two cases.
> 
>> What I propose for transactions is:
>>
>> 1) Add a new property for transactions also named "auto-complete"
>> 2) If the property is set to false, Jobs in this transaction will have
>> their auto-complete values forcibly set to false
>> 3) Once all jobs in the transaction are set to pending, emit an event
>> ("TRANSACTION_READY", for instance)
>> 4) Allow the transaction to be manually committed.
> 
> This requires that we introduce names for transactions and let them
> be managed explicily by the user. Keeping things at the individual jobs
> certainly makes the interface simpler.
> 

True..

> I can't rule out that we won't want to make transactions explicitly
> managed objects in the future for some reason, but that will probably be
> something separate from the problem we're trying to solve now.
> 
>> The alternative is to leave it on a per-job basis and just stipulate
>> that any bizarre or inconvenient configurations are the fault of the
>> caller. Leaving transactions completely untouched should theoretically
>> work.
> 
> Should be a lot easier. To integrate it into you state machine above:
> 
> 3.  READY (COMPLETING) | READY (CANCELLING) --> DONE
>         via: auto transition (block_job_completed() called; job doesn't
>                               do any work any more)
>         event: BLOCK_JOB_DONE
> 4a. DONE --> PENDING
>         via: auto transition (all jobs in the transaction are DONE)
>         event: BLOCK_JOB_PENDING
> 4b. PENDING --> COMPLETED | CANCELLED
>         via: block-job-finalize
>         event: COMPLETED | ERROR
> 
> The naming clearly leaves a lot to be desired, I'm just running out of
> names that actually make sense... But I hope you can see the mechanism
> that I have in mind.
> 

I think so, "DONE" is the state where it's blocked on the transaction to
finish. "PENDING" is the one where it's blocked on the user to allow it
to move forward and call commit/abort/etc.

>>> 4. On block-job-reap, call the .commit/.abort callbacks of the jobs in
>>>    the transaction. They will do most of the work that is currently done
>>>    in the completion callbacks, in particular any graph changes. If we
>>>    need to allow error cases, we can add a .prepare_completion callback
>>>    that can still let the whole transaction fail.
>>
>> Makes sense by analogy. Probably worth having anyway. I moved some
>> likely-to-deadlock code from the backup cleanup into .commit even when
>> it runs outside of transactions. Other jobs can likely benefit from some
>> simplified assumptions by running in that context, too.
>>
>>> 5. Send the final QMP completion event for each job in the transaction
>>>    with the final error code. This is the existing BLOCK_JOB_COMPLETED
>>>    or BLOCK_JOB_CANCELLED event.
>>>
>>> The current RFC still executes .commit/.abort before block-job-reap, so
>>> the graph changes happen too early to be under control of the management
>>> tool.
>>>
>>>> RFC:
>>>> The next version will add tests for transactions.
>>>> Kevin, can you please take a look at bdrv_is_root_node and how it is
>>>> used with respect to do_drive_backup? I suspect that in this case that
>>>> "is root" should actually be "true", but a node in use by a job has
>>>> two roles; child_root and child_job, so it starts returning false here.
>>>>
>>>> That's fine because we prevent a collision that way, but it makes the
>>>> error messages pretty bad and misleading. Do you have a quick suggestion?
>>>> (Should I just amend the loop to allow non-root nodes as long as they
>>>> happen to be jobs so that the job creation code can check permissions?)
>>>
>>> We kind of sidestepped this problem by deciding that there is no real
>>> reason for the backup job to require the source to be a root node.
>>>
>>> I'm not completely sure how we could easily get a better message while
>>> still requiring a root node (and I suppose there are other places that
>>> rightfully still require a root node). Ignoring children with child_job
>>> feels ugly, but might be the best option.
>>>
>>> Note that this will not make the conflicting command work suddenly,
>>> because every node that has a child_job parent also has op blockers for
>>> everything, but the error message should be less confusing than "is not
>>> a root node".
>>>
>>> Kevin
>>>
>>
>> TLDR:
>>
>> - I think we may need to have optional manual completion steps both
>> before and after the job .prepare()/.commit()/.abort() phase.
>> - Before, like "block-job-complete" to allow graph changes to be under
>> management tool control, and
>> - After, so that final job success status can be queried even if the
>> event was missed.
> 
> Agreed (except that I don't see what "block-job-complete" has to do with
> it, this command is about an earlier transition).
> 
>> Proposal:
>>
>> (1) Extend block-job-complete semantics to all jobs that opt in via
>> "auto-complete: false" which is not allowed to be set for mirror jobs
>> (2) If the modern overloading of the BLOCK_JOB_READY event is apt to
>> cause confusion for existing tools, a new event BLOCK_JOB_PENDING could
>> be emitted instead for any job capable of accepting the
>> auto-complete=true/false parameter and emit it upon reaching this state.
>> (3) Continue forward with this patchset's current persistent/reap
>> nomenclature to prevent automatic cleanup if desired, and
>> (4) Implement transaction-wide settings for auto-complete alongside a
>> new "transaction complete" event to allow for a transaction-wide
>> "complete" command.
> 
> Let me try to just consolidate all of the above into a single state
> machine:
> 
> 1.  CREATED --> RUNNING
>         driver callback: .start
> 2a. RUNNING --> READY | CANCELLED
>         via: auto transition (when bulk copy is finished) / block-job-cancel
>         event: BLOCK_JOB_READY
> 2b. READY --> READY (COMPLETING) | READY (CANCELLING)
>         via: block-job-complete / block-job-cancel
>         event: none
>         driver callback: .complete / none
> 3.  READY (CANCELLING | COMPLETING) --> DONE
>         via: auto transition
>              (CANCELLING: after draining in-flight mirror requests;
>               COMPLETING: when images are in sync)
>         event: BLOCK_JOB_DONE
> 4.  DONE --> PENDING
>         via: auto transition (all jobs in the transaction are DONE)
>         event: BLOCK_JOB_PENDING
> 5.  PENDING --> FINISHED
>         via: block-job-finalize
>         event: COMPLETED | CANCELLED
>         driver callback: .prepare_finalize / .commit / .abort
> 6.  FINISHED --> NULL
>         via: block-job-reap
>         event: none
>         driver callback: .clean
> 
> I removed COMPLETED/CANCELLED states because they are never externally
> visible. You proposed an "auto transition" there, but the transition
> would be immediately after the previous one, so clients always see
> PENDING --> NULL | FINISHED.
> 
> We would have two booleans to make explicit transition automatically:
> 
>     auto-finalize for block-job-finalize (default: true)
>     auto-reap     for block-job-reap     (default: true)
> 
> Both of them would be executed automatically as soon as the respective
> commands would be available.
> 
> We could add more auto-* options for the remaining explicit transition
> (block-job-complete/cancel in READY), but these are not important for
> the problems we're trying to solve here. They might become interesting
> if we do decide that we want a single copy block job instead of doing
> similar things in mirror, commit and backup.
> 
> The naming needs some improvements (done -> pending -> finished looks
> really odd), but does this make sense otherwise?
> 
> Kevin
> 
I think so. We'll see if I understand when I write v3 :)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping
  2017-10-05 11:38     ` Kevin Wolf
  2017-10-05 18:17       ` John Snow
@ 2017-10-06  3:56       ` John Snow
  2017-10-06  9:05         ` Kevin Wolf
  2017-10-06  5:52       ` Markus Armbruster
  2 siblings, 1 reply; 13+ messages in thread
From: John Snow @ 2017-10-06  3:56 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Fam Zheng, pkrempa, jtc, qemu-devel, qemu-block



On 10/05/2017 07:38 AM, Kevin Wolf wrote:
> Am 05.10.2017 um 03:46 hat John Snow geschrieben:
>> On 10/04/2017 02:27 PM, Kevin Wolf wrote:
>>> Am 04.10.2017 um 03:52 hat John Snow geschrieben:
>>>> For jobs that complete when a monitor isn't looking, there's no way to
>>>> tell what the job's final return code was. We need to allow jobs to
>>>> remain in the list until queried for reliable management.
>>>
>>> Just a short summary of what I discussed with John on IRC:
>>>
>>> Another important reason why we want to have an explicit end of block
>>> jobs is that job completion often makes changes to the graph. For a
>>> management tool that manages the block graph on a node level, it is a
>>> big problem if graph changes can happen at any point that can lead to
>>> bad race conditions. Giving the management tool control over the end of
>>> the block job makes it aware that graph changes happen.
>>>
>>> This means that compared to this RFC series, we need to move the waiting
>>> earlier in the process:
>>>
>>> 1. Block job is done and calls block_job_completed()
>>> 2. Wait for other block jobs in the same job transaction to complete
>>> 3. Send a (new) QMP event to the management tool to notify it that the
>>>    job is ready to be reaped
>>
>> Oh, I suppose to distinguish it from "COMPLETED" in that sense, because
>> it isn't actually COMPLETED anymore under your vision, so it requires a
>> new event in this proposal.
>>
>> This becomes a bit messy, bumping up against both "READY" and a
>> transactional pre-completed state semantically. Uhhhh, for lack of a
>> better word in the timeframe I'd like to complete this email in, let's
>> call this new theoretical state "PENDING"?
>>
>> So presently, a job goes through the following life cycle:
>>
>> 1. CREATED --> RUNNING
>> 2. RUNNING <--> PAUSED
>> 3. RUNNING --> (READY | COMPLETED | CANCELED)
>> 4. READY --> (COMPLETED | CANCELED)
>> 5. (COMPLETED | CANCELED) --> NULL
>>
>> Where we emit an event upon entering "READY", "COMPLETED" or "CANCELED".
> 
> Roughly yes, but it's not quite true because you can still pause and
> unpause ready jobs. So READY and PAUSED are kind of orthogonal.
> 
>> My patchset here effectively adds a new optional terminal state:
>>
>> 5. (COMPLETED | CANCELED) --> (NULL | FINISHED)
>> 6. FINISHED --> NULL
>>
>> Where the last transition from FINISHED to NULL is performed via
>> block-job-reap, but notably we get to re-use the events for COMPLETED |
>> CANCELED to indicate the availability of this operation to be performed.
>>
>> What happens in the case of transactionally managed jobs presently is
>> that jobs get stuck as they enter the COMPLETED|CANCELED state. If you
>> were to query them they behave as if they're RUNNING. There's no
>> discrete state that exists for this presently.
>>
>> You can cancel these as normal, but I'm not sure if you can pause them,
>> actually. (Note to self, test that.) I think they have almost exactly
>> like any RUNNING job would.
> 
> Except that they don't do any work any more. This is an mportant
> difference for a mirror job which would normally keep copying new writes
> until it sends the COMPLETED event. So when libvirt restarts and it sees
> a "RUNNING" mirror job, it can't decide whether it is still copying
> things or has already completed.
> 
> Looks like this is another reason why we want a separate state here.
> 
>> What you're proposing here is the formalization of the pre-completion
>> state ("PENDING") and that in this state, a job outside of a transaction
>> can exist until it is manually told to finally, once and for all,
>> actually finish its business. We can use this as a hook to perform and
>> last graph changes so they will not come as a surprise to the management
>> application. Maybe this operation should be called "Finalize". Again,
>> for lack of a better term in the timeframe, I'll refer to it as such for
>> now.
> 
> "finalize" doesn't sound too bad.
> 
>> I think importantly this actually distinguishes it from "reap" in that
>> the commit phase can still fail, so we can't let the job follow that
>> auto transition back to the NULL state.
> 
> Let me see if I understand correctly: We want to make sure that the
> management tool sees the final return value for the job. We have already
> decided that events aren't enough for this because the management tool
> could be restarted while we send the event, so the information is lost.
> Having it as a return value of block-job-reap isn't enough either
> because it could be lost the same way. We need a separate phase where
> libvirt can query the return value and from which we don't automatically
> transition away.
> 
> I'm afraid that you are right.
> 
>> That means that we'd need both a block-job-finalize AND a
>> block-job-reap to accomplish both of the following goals:
>>
>> (1) Allow the management application to control graph changes [libvirt]
>> (2) Prevent auto transitions to NULL state for asynchronous clients [A
>> requirement mentioned by Virtuozzo]
> 
> Okay, your (2) is another case that forbids auto-transitions to NULL.
> What I described above is why it's relevant for libvirt, too.
> 
>> It looks like this, overall:
>>
>> 1. CREATED --> RUNNING
>> 2. RUNNING <--> PAUSED
>> 3. RUNNING --> PENDING
>> 	via: auto transition
>> 	event: BLOCK_JOB_PENDING
>> 4. PENDING --> (COMPLETED | CANCELED)
>> 	via: block-job-finalize
>> 	event: (COMPLETED | ERROR)
>> 5. (COMPLETED | CANCELED) --> (NULL | FINISHED)
>> 	via: auto transition
>> 	event: none
>> 6. FINISHED --> NULL
>> 	via: block-job-reap
>> 	event: none
> 
> Looks reasonable. A bit verbose with two new commands, but given the
> requirements that's probably unavoidable.
> 
>> "Hey, wait, where did the ready state go?"
>>
>> Good question, I'm not sure how it fits in to something like "PENDING"
>> which is, I think NEARLY equivalent to a proposed pending->finalized
>> transition. Is it meaningful to have a job go from
>> running->ready->pending or running->pending->ready? I actually doubt it is.
> 
> I think it is, but only the former. If we leave the PAUSED state aside
> for a moment (as I said above, it's orthogonal) and look at the details,
> mirror works like this:
> 
> 1.  CREATED --> RUNNING
> 2a. RUNNING --> READY | CANCELLED
>         via: auto transition (when bulk copy is finished) / block-job-cancel
>         event: BLOCK_JOB_READY
> 2b. READY --> READY (COMPLETING) | READY (CANCELLING)
>         via: block-job-complete / block-job-cancel
>         event: none
> 3.  READY (CANCELLING) --> CANCELLED
>         via: auto transition (after draining in-flight mirror requests)
>         event: BLOCK_JOB_CANCELLED
>     or
>     READY (COMPLETING) --> COMPLETED
>         via: auto transition (when images are in sync)
>         event: BLOCK_JOB_COMPLETED
> 
> In the new model, we transition to PENDING in step 3 instead.
> 
>> The only difference really is that not all jobs use the READY -->
>> COMPLETE transition.
> 
> The really important difference is that in 2b you have a state that is
> exited via auto transition.
> 
> PENDING is a state that is exited synchronously with block-job-finalize
> (and having this defined point in time where graph changes occur etc. is
> the whole point of the state), whereas READY is a state where
> block-job-complete/cancel can request that it be left at the next
> opportunity, but the exact point in time is unpredictable - it doesn't
> happen during the execution of the QMP command anyway.
> 
> This is a fundamental difference that doesn't allow to treat READY and
> PENDING the same.
> 
>> We could implement it into those jobs if the job is
>> created with some kind of boolean, like
>>
>> auto-complete: true/false
>>
>> where this defaults to true, the legacy behavior.
>>
>> For "mirror" we would just omit allowing people to set this setting
>> (auto-complete is effectively always off) because it is requisite and
>> essential to the operation of the job.
> 
> auto-complete=true would basically call block-job-finalize internally?
> 
> You can't conflate it with block-job-complete because READY and PENDING
> have to stay separate, but it would make sense as auto-finalize.
> 
>> "OK, maybe that could work; what about transactions?"
>>
>> Transactions have to be a bit of their own case, I think.
>>
>> Essentially jobs that transactionally complete already hang around in
>> pending until all jobs complete, so they can do so together.
>>
>> What we don't really want is to force users to have to dig into the jobs
>> manually and complete each one individually. (I think...?) or have to
>> deal with the managerial nightmare of having some autocomplete, some
>> that don't, etc.
> 
> I'm not sure about that. Don't users already have to send
> block-job-complete to each job individually? Not doing the same for
> block-job-finalize would be kind of inconsistent.
> 
> I also think that it would be good if a separately started block job
> behaves like a transaction that contains a single job so that we don't
> get different models for the two cases.
> 
>> What I propose for transactions is:
>>
>> 1) Add a new property for transactions also named "auto-complete"
>> 2) If the property is set to false, Jobs in this transaction will have
>> their auto-complete values forcibly set to false
>> 3) Once all jobs in the transaction are set to pending, emit an event
>> ("TRANSACTION_READY", for instance)
>> 4) Allow the transaction to be manually committed.
> 
> This requires that we introduce names for transactions and let them
> be managed explicily by the user. Keeping things at the individual jobs
> certainly makes the interface simpler.
> 
> I can't rule out that we won't want to make transactions explicitly
> managed objects in the future for some reason, but that will probably be
> something separate from the problem we're trying to solve now.
> 
>> The alternative is to leave it on a per-job basis and just stipulate
>> that any bizarre or inconvenient configurations are the fault of the
>> caller. Leaving transactions completely untouched should theoretically
>> work.
> 
> Should be a lot easier. To integrate it into you state machine above:
> 
> 3.  READY (COMPLETING) | READY (CANCELLING) --> DONE
>         via: auto transition (block_job_completed() called; job doesn't
>                               do any work any more)
>         event: BLOCK_JOB_DONE
> 4a. DONE --> PENDING
>         via: auto transition (all jobs in the transaction are DONE)
>         event: BLOCK_JOB_PENDING
> 4b. PENDING --> COMPLETED | CANCELLED
>         via: block-job-finalize
>         event: COMPLETED | ERROR
> 
> The naming clearly leaves a lot to be desired, I'm just running out of
> names that actually make sense... But I hope you can see the mechanism
> that I have in mind.
> 
>>> 4. On block-job-reap, call the .commit/.abort callbacks of the jobs in
>>>    the transaction. They will do most of the work that is currently done
>>>    in the completion callbacks, in particular any graph changes. If we
>>>    need to allow error cases, we can add a .prepare_completion callback
>>>    that can still let the whole transaction fail.
>>
>> Makes sense by analogy. Probably worth having anyway. I moved some
>> likely-to-deadlock code from the backup cleanup into .commit even when
>> it runs outside of transactions. Other jobs can likely benefit from some
>> simplified assumptions by running in that context, too.
>>
>>> 5. Send the final QMP completion event for each job in the transaction
>>>    with the final error code. This is the existing BLOCK_JOB_COMPLETED
>>>    or BLOCK_JOB_CANCELLED event.
>>>
>>> The current RFC still executes .commit/.abort before block-job-reap, so
>>> the graph changes happen too early to be under control of the management
>>> tool.
>>>
>>>> RFC:
>>>> The next version will add tests for transactions.
>>>> Kevin, can you please take a look at bdrv_is_root_node and how it is
>>>> used with respect to do_drive_backup? I suspect that in this case that
>>>> "is root" should actually be "true", but a node in use by a job has
>>>> two roles; child_root and child_job, so it starts returning false here.
>>>>
>>>> That's fine because we prevent a collision that way, but it makes the
>>>> error messages pretty bad and misleading. Do you have a quick suggestion?
>>>> (Should I just amend the loop to allow non-root nodes as long as they
>>>> happen to be jobs so that the job creation code can check permissions?)
>>>
>>> We kind of sidestepped this problem by deciding that there is no real
>>> reason for the backup job to require the source to be a root node.
>>>
>>> I'm not completely sure how we could easily get a better message while
>>> still requiring a root node (and I suppose there are other places that
>>> rightfully still require a root node). Ignoring children with child_job
>>> feels ugly, but might be the best option.
>>>
>>> Note that this will not make the conflicting command work suddenly,
>>> because every node that has a child_job parent also has op blockers for
>>> everything, but the error message should be less confusing than "is not
>>> a root node".
>>>
>>> Kevin
>>>
>>
>> TLDR:
>>
>> - I think we may need to have optional manual completion steps both
>> before and after the job .prepare()/.commit()/.abort() phase.
>> - Before, like "block-job-complete" to allow graph changes to be under
>> management tool control, and
>> - After, so that final job success status can be queried even if the
>> event was missed.
> 
> Agreed (except that I don't see what "block-job-complete" has to do with
> it, this command is about an earlier transition).
> 
>> Proposal:
>>
>> (1) Extend block-job-complete semantics to all jobs that opt in via
>> "auto-complete: false" which is not allowed to be set for mirror jobs
>> (2) If the modern overloading of the BLOCK_JOB_READY event is apt to
>> cause confusion for existing tools, a new event BLOCK_JOB_PENDING could
>> be emitted instead for any job capable of accepting the
>> auto-complete=true/false parameter and emit it upon reaching this state.
>> (3) Continue forward with this patchset's current persistent/reap
>> nomenclature to prevent automatic cleanup if desired, and
>> (4) Implement transaction-wide settings for auto-complete alongside a
>> new "transaction complete" event to allow for a transaction-wide
>> "complete" command.
> 
> Let me try to just consolidate all of the above into a single state
> machine:
> 
> 1.  CREATED --> RUNNING
>         driver callback: .start
> 2a. RUNNING --> READY | CANCELLED
>         via: auto transition (when bulk copy is finished) / block-job-cancel
>         event: BLOCK_JOB_READY
> 2b. READY --> READY (COMPLETING) | READY (CANCELLING)
>         via: block-job-complete / block-job-cancel
>         event: none
>         driver callback: .complete / none
> 3.  READY (CANCELLING | COMPLETING) --> DONE
>         via: auto transition
>              (CANCELLING: after draining in-flight mirror requests;
>               COMPLETING: when images are in sync)
>         event: BLOCK_JOB_DONE

I have some doubts about "DONE" necessarily coming prior to "PENDING" as
this means that the transaction gives up control of the jobs at this
point, and then "PENDING" jobs may complete one without the other,
especially if we introduce a PREPARE() callback.

(At least, if I've understood you correctly that "DONE" is the phase
where jobs queue up, ready to be dispatched by the transaction mechanism.)

I think jobs needs to not clear that transactional hurdle until they've
been allowed to call prepare so that we can be guaranteed that any
changes that occur after that point in time will not fail (and cannot
any longer affect the transactional group.)

I'll have to play with this a bit, it's getting a bit messy as a hack
for 2.11.

> 4.  DONE --> PENDING
>         via: auto transition (all jobs in the transaction are DONE)
>         event: BLOCK_JOB_PENDING
> 5.  PENDING --> FINISHED
>         via: block-job-finalize
>         event: COMPLETED | CANCELLED
>         driver callback: .prepare_finalize / .commit / .abort
> 6.  FINISHED --> NULL
>         via: block-job-reap
>         event: none
>         driver callback: .clean
> 
> I removed COMPLETED/CANCELLED states because they are never externally
> visible. You proposed an "auto transition" there, but the transition
> would be immediately after the previous one, so clients always see
> PENDING --> NULL | FINISHED.
> 
> We would have two booleans to make explicit transition automatically:
> 
>     auto-finalize for block-job-finalize (default: true)
>     auto-reap     for block-job-reap     (default: true)
> 
> Both of them would be executed automatically as soon as the respective
> commands would be available.
> 
> We could add more auto-* options for the remaining explicit transition
> (block-job-complete/cancel in READY), but these are not important for
> the problems we're trying to solve here. They might become interesting
> if we do decide that we want a single copy block job instead of doing
> similar things in mirror, commit and backup.
> 
> The naming needs some improvements (done -> pending -> finished looks
> really odd), but does this make sense otherwise?
> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping
  2017-10-05 11:38     ` Kevin Wolf
  2017-10-05 18:17       ` John Snow
  2017-10-06  3:56       ` John Snow
@ 2017-10-06  5:52       ` Markus Armbruster
  2 siblings, 0 replies; 13+ messages in thread
From: Markus Armbruster @ 2017-10-06  5:52 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: John Snow, Fam Zheng, pkrempa, jtc, qemu-devel, qemu-block

Quick drive-by comment:

Kevin Wolf <kwolf@redhat.com> writes:

[...]
> Let me try to just consolidate all of the above into a single state
> machine:
>
> 1.  CREATED --> RUNNING
>         driver callback: .start
> 2a. RUNNING --> READY | CANCELLED
>         via: auto transition (when bulk copy is finished) / block-job-cancel
>         event: BLOCK_JOB_READY
> 2b. READY --> READY (COMPLETING) | READY (CANCELLING)
>         via: block-job-complete / block-job-cancel
>         event: none
>         driver callback: .complete / none
> 3.  READY (CANCELLING | COMPLETING) --> DONE
>         via: auto transition
>              (CANCELLING: after draining in-flight mirror requests;
>               COMPLETING: when images are in sync)
>         event: BLOCK_JOB_DONE
> 4.  DONE --> PENDING
>         via: auto transition (all jobs in the transaction are DONE)
>         event: BLOCK_JOB_PENDING
> 5.  PENDING --> FINISHED
>         via: block-job-finalize
>         event: COMPLETED | CANCELLED
>         driver callback: .prepare_finalize / .commit / .abort
> 6.  FINISHED --> NULL
>         via: block-job-reap
>         event: none
>         driver callback: .clean
>
> I removed COMPLETED/CANCELLED states because they are never externally
> visible. You proposed an "auto transition" there, but the transition
> would be immediately after the previous one, so clients always see
> PENDING --> NULL | FINISHED.
>
> We would have two booleans to make explicit transition automatically:
>
>     auto-finalize for block-job-finalize (default: true)
>     auto-reap     for block-job-reap     (default: true)

Are we *sure* we need to quadruple the test matrix?

What exactly is the use case for either of these two flags?

> Both of them would be executed automatically as soon as the respective
> commands would be available.
>
> We could add more auto-* options for the remaining explicit transition

*groan*

> (block-job-complete/cancel in READY), but these are not important for
> the problems we're trying to solve here. They might become interesting
> if we do decide that we want a single copy block job instead of doing
> similar things in mirror, commit and backup.
> The naming needs some improvements (done -> pending -> finished looks
> really odd), but does this make sense otherwise?
>
> Kevin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping
  2017-10-06  3:56       ` John Snow
@ 2017-10-06  9:05         ` Kevin Wolf
  0 siblings, 0 replies; 13+ messages in thread
From: Kevin Wolf @ 2017-10-06  9:05 UTC (permalink / raw)
  To: John Snow; +Cc: Fam Zheng, pkrempa, jtc, qemu-devel, qemu-block

Am 06.10.2017 um 05:56 hat John Snow geschrieben:
> On 10/05/2017 07:38 AM, Kevin Wolf wrote:
> > Let me try to just consolidate all of the above into a single state
> > machine:
> > 
> > 1.  CREATED --> RUNNING
> >         driver callback: .start
> > 2a. RUNNING --> READY | CANCELLED
> >         via: auto transition (when bulk copy is finished) / block-job-cancel
> >         event: BLOCK_JOB_READY
> > 2b. READY --> READY (COMPLETING) | READY (CANCELLING)
> >         via: block-job-complete / block-job-cancel
> >         event: none
> >         driver callback: .complete / none
> > 3.  READY (CANCELLING | COMPLETING) --> DONE
> >         via: auto transition
> >              (CANCELLING: after draining in-flight mirror requests;
> >               COMPLETING: when images are in sync)
> >         event: BLOCK_JOB_DONE
> 
> I have some doubts about "DONE" necessarily coming prior to "PENDING" as
> this means that the transaction gives up control of the jobs at this
> point, and then "PENDING" jobs may complete one without the other,
> especially if we introduce a PREPARE() callback.
> 
> (At least, if I've understood you correctly that "DONE" is the phase
> where jobs queue up, ready to be dispatched by the transaction mechanism.)

Yes. This means that DONE is state where a job end up when it has
completed its work, which is generally a different point in time for
each job in the transaction. Something has to come there, and it can't
be PENDING yet because the transaction hasn't completed yet.

In other words, DONE is the inactive state that exists today, but is
externally exposed as RUNNING even though the job isn't actually doing
any work any more.

I don't see though why this means that the transaction has to give up
control?

> I think jobs needs to not clear that transactional hurdle until they've
> been allowed to call prepare so that we can be guaranteed that any
> changes that occur after that point in time will not fail (and cannot
> any longer affect the transactional group.)

The earliest point where the transaction can be removed from the picture
is the first call of block-job-finalize for any job in the transaction.
This is where all jobs of the transaction need to complete at least
their .prepare stage, otherwise this first job can't be finalised.

As we discussed yesterday, block-job-finalize is really an operation on
the whole transaction, but keeping it at the job level in the external
interface spares us managing transactions as named objects.

Kevin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping
  2017-10-05 18:17       ` John Snow
@ 2017-10-09 14:30         ` Nikolay Shirokovskiy
  0 siblings, 0 replies; 13+ messages in thread
From: Nikolay Shirokovskiy @ 2017-10-09 14:30 UTC (permalink / raw)
  To: John Snow, Kevin Wolf
  Cc: qemu-block, pkrempa, jtc, qemu-devel, Fam Zheng,
	Vladimir Sementsov-Ogievskiy

Hi, John.

This is the original letter https://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg04091.html.

In short problem is next. If during full backup I miss the completion
event I don't know whether backup file is correct or not. If I miss the
event during incremental backup additionally I have to make full backup
aftermath because I can not be sure at what state dirty bitmap is. 
(It depends on whether backup was successful or not). This problem
can be approached by method suggested by Vladimir in the above thread
apart from introducing zombie job status though.

Well, may be not that short...


On 05.10.2017 21:17, John Snow wrote:
> Nikolay: You mentioned a while ago that you had issues with incremental
> backup's eventual return status being unknown. Can you please elaborate
> for me why this is a problem?
> 
> I assume due to the long running of a backup job it's entirely possible
> to imagine losing connection to QEMU and missing the event depending on
> how long the interruption is.
> 
> Backup operations are expensive, so we need some definite way to catch
> this return status.
> 
> Please let me know if you have any feedback to this thread.
> 
> On 10/05/2017 07:38 AM, Kevin Wolf wrote:
>> Am 05.10.2017 um 03:46 hat John Snow geschrieben:
>>> On 10/04/2017 02:27 PM, Kevin Wolf wrote:
>>>> Am 04.10.2017 um 03:52 hat John Snow geschrieben:>>>> For jobs that complete when a monitor isn't looking, there's no way to
>>>>> tell what the job's final return code was. We need to allow jobs to
>>>>> remain in the list until queried for reliable management.
>>>>
>>>> Just a short summary of what I discussed with John on IRC:
>>>>
>>>> Another important reason why we want to have an explicit end of block
>>>> jobs is that job completion often makes changes to the graph. For a
>>>> management tool that manages the block graph on a node level, it is a
>>>> big problem if graph changes can happen at any point that can lead to
>>>> bad race conditions. Giving the management tool control over the end of
>>>> the block job makes it aware that graph changes happen.
>>>>
>>>> This means that compared to this RFC series, we need to move the waiting
>>>> earlier in the process:
>>>>
>>>> 1. Block job is done and calls block_job_completed()
>>>> 2. Wait for other block jobs in the same job transaction to complete
>>>> 3. Send a (new) QMP event to the management tool to notify it that the
>>>>    job is ready to be reaped
>>>
>>> Oh, I suppose to distinguish it from "COMPLETED" in that sense, because
>>> it isn't actually COMPLETED anymore under your vision, so it requires a
>>> new event in this proposal.
>>>
>>> This becomes a bit messy, bumping up against both "READY" and a
>>> transactional pre-completed state semantically. Uhhhh, for lack of a
>>> better word in the timeframe I'd like to complete this email in, let's
>>> call this new theoretical state "PENDING"?
>>>
>>> So presently, a job goes through the following life cycle:
>>>
>>> 1. CREATED --> RUNNING
>>> 2. RUNNING <--> PAUSED
>>> 3. RUNNING --> (READY | COMPLETED | CANCELED)
>>> 4. READY --> (COMPLETED | CANCELED)
>>> 5. (COMPLETED | CANCELED) --> NULL
>>>
>>> Where we emit an event upon entering "READY", "COMPLETED" or "CANCELED".
>>
>> Roughly yes, but it's not quite true because you can still pause and
>> unpause ready jobs. So READY and PAUSED are kind of orthogonal.
>>
> 
> But you cannot block-job-complete a running job, so I included it here
> so we could keep the concept of the ready-to-complete state in mind.
> 
>>> My patchset here effectively adds a new optional terminal state:
>>>
>>> 5. (COMPLETED | CANCELED) --> (NULL | FINISHED)
>>> 6. FINISHED --> NULL
>>>
>>> Where the last transition from FINISHED to NULL is performed via
>>> block-job-reap, but notably we get to re-use the events for COMPLETED |
>>> CANCELED to indicate the availability of this operation to be performed.
>>>
>>> What happens in the case of transactionally managed jobs presently is
>>> that jobs get stuck as they enter the COMPLETED|CANCELED state. If you
>>> were to query them they behave as if they're RUNNING. There's no
>>> discrete state that exists for this presently.
>>>
>>> You can cancel these as normal, but I'm not sure if you can pause them,
>>> actually. (Note to self, test that.) I think they have almost exactly
>>> like any RUNNING job would.
>>
>> Except that they don't do any work any more. This is an mportant
>> difference for a mirror job which would normally keep copying new writes
>> until it sends the COMPLETED event. So when libvirt restarts and it sees
>> a "RUNNING" mirror job, it can't decide whether it is still copying
>> things or has already completed.
>>
>> Looks like this is another reason why we want a separate state here.
> 
> Yes, I realized as I was writing it that we have no real way to tell
> that a job is simply pending completion.
> 
>>
>>> What you're proposing here is the formalization of the pre-completion
>>> state ("PENDING") and that in this state, a job outside of a transaction
>>> can exist until it is manually told to finally, once and for all,
>>> actually finish its business. We can use this as a hook to perform and
>>> last graph changes so they will not come as a surprise to the management
>>> application. Maybe this operation should be called "Finalize". Again,
>>> for lack of a better term in the timeframe, I'll refer to it as such for
>>> now.
>>
>> "finalize" doesn't sound too bad.
>>
> 
> Though taken altogether, the set of names we've accumulated is a little
> ridiculous.
> 
>>> I think importantly this actually distinguishes it from "reap" in that
>>> the commit phase can still fail, so we can't let the job follow that
>>> auto transition back to the NULL state.
>>
>> Let me see if I understand correctly: We want to make sure that the
>> management tool sees the final return value for the job. We have already
>> decided that events aren't enough for this because the management tool
>> could be restarted while we send the event, so the information is lost.
>> Having it as a return value of block-job-reap isn't enough either
>> because it could be lost the same way. We need a separate phase where
>> libvirt can query the return value and from which we don't automatically
>> transition away.
>>
>> I'm afraid that you are right.
> 
> There may be some wiggle room in asserting that some block-job-finalize
> command should return the final status in its return value, but it's
> clearly more flexible to not mandate this.
> 
> However, for existing QMP commands today there's no way to tell if a
> command succeeded or failed if you somehow lose the synchronous reply,
> so maybe this is actually OK ...
> 
>>
>>> That means that we'd need both a block-job-finalize AND a
>>> block-job-reap to accomplish both of the following goals:
>>>
>>> (1) Allow the management application to control graph changes [libvirt]
>>> (2) Prevent auto transitions to NULL state for asynchronous clients [A
>>> requirement mentioned by Virtuozzo]
>>
>> Okay, your (2) is another case that forbids auto-transitions to NULL.
>> What I described above is why it's relevant for libvirt, too.
>>
>>> It looks like this, overall:
>>>
>>> 1. CREATED --> RUNNING
>>> 2. RUNNING <--> PAUSED
>>> 3. RUNNING --> PENDING
>>> 	via: auto transition
>>> 	event: BLOCK_JOB_PENDING
>>> 4. PENDING --> (COMPLETED | CANCELED)
>>> 	via: block-job-finalize
>>> 	event: (COMPLETED | ERROR)
>>> 5. (COMPLETED | CANCELED) --> (NULL | FINISHED)
>>> 	via: auto transition
>>> 	event: none
>>> 6. FINISHED --> NULL
>>> 	via: block-job-reap
>>> 	event: none
>>
>> Looks reasonable. A bit verbose with two new commands, but given the
>> requirements that's probably unavoidable.
>>
> 
> Unless I can avoid the actual reap action at the end by returning the
> return code synchronously.
> 
>>> "Hey, wait, where did the ready state go?"
>>>
>>> Good question, I'm not sure how it fits in to something like "PENDING"
>>> which is, I think NEARLY equivalent to a proposed pending->finalized
>>> transition. Is it meaningful to have a job go from
>>> running->ready->pending or running->pending->ready? I actually doubt it is.
>>
>> I think it is, but only the former. If we leave the PAUSED state aside
>> for a moment (as I said above, it's orthogonal) and look at the details,
>> mirror works like this:
>>
>> 1.  CREATED --> RUNNING
>> 2a. RUNNING --> READY | CANCELLED
>>         via: auto transition (when bulk copy is finished) / block-job-cancel
>>         event: BLOCK_JOB_READY
>> 2b. READY --> READY (COMPLETING) | READY (CANCELLING)
>>         via: block-job-complete / block-job-cancel
>>         event: none
>> 3.  READY (CANCELLING) --> CANCELLED
>>         via: auto transition (after draining in-flight mirror requests)
>>         event: BLOCK_JOB_CANCELLED
>>     or
>>     READY (COMPLETING) --> COMPLETED
>>         via: auto transition (when images are in sync)
>>         event: BLOCK_JOB_COMPLETED
>>
>> In the new model, we transition to PENDING in step 3 instead.
>>
>>> The only difference really is that not all jobs use the READY -->
>>> COMPLETE transition.
>>
>> The really important difference is that in 2b you have a state that is
>> exited via auto transition.
>>
>> PENDING is a state that is exited synchronously with block-job-finalize
>> (and having this defined point in time where graph changes occur etc. is
>> the whole point of the state), whereas READY is a state where
>> block-job-complete/cancel can request that it be left at the next
>> opportunity, but the exact point in time is unpredictable - it doesn't
>> happen during the execution of the QMP command anyway.
>>
>> This is a fundamental difference that doesn't allow to treat READY and
>> PENDING the same.
>>
> 
> I think you're right. That transition from READY isn't as synchronous as
> I was making it out to be. Tch.
> 
>>> We could implement it into those jobs if the job is
>>> created with some kind of boolean, like
>>>
>>> auto-complete: true/false
>>>
>>> where this defaults to true, the legacy behavior.
>>>
>>> For "mirror" we would just omit allowing people to set this setting
>>> (auto-complete is effectively always off) because it is requisite and
>>> essential to the operation of the job.
>>
>> auto-complete=true would basically call block-job-finalize internally?
>>
>> You can't conflate it with block-job-complete because READY and PENDING
>> have to stay separate, but it would make sense as auto-finalize.
>>
> 
> I was conflating it. Or, at least trying to.
> 
>>> "OK, maybe that could work; what about transactions?"
>>>
>>> Transactions have to be a bit of their own case, I think.
>>>
>>> Essentially jobs that transactionally complete already hang around in
>>> pending until all jobs complete, so they can do so together.
>>>
>>> What we don't really want is to force users to have to dig into the jobs
>>> manually and complete each one individually. (I think...?) or have to
>>> deal with the managerial nightmare of having some autocomplete, some
>>> that don't, etc.
>>
>> I'm not sure about that. Don't users already have to send
>> block-job-complete to each job individually? Not doing the same for
>> block-job-finalize would be kind of inconsistent.
>>
> 
> Yes, but that's only theoretical since we don't have support for any of
> those kind of jobs in transactions yet, either!
> 
> In my head here I was thinking about a transaction-wide finalize to
> replace "complete," but you're pointing out I can't mix the two.
> 
> That said, there's no reason we couldn't *make* that kind of completion
> a transaction-wide, event, but... maybe that's too messy. Maybe
> everything should just be left individual...
> 
>> I also think that it would be good if a separately started block job
>> behaves like a transaction that contains a single job so that we don't
>> get different models for the two cases.
>>
>>> What I propose for transactions is:
>>>
>>> 1) Add a new property for transactions also named "auto-complete"
>>> 2) If the property is set to false, Jobs in this transaction will have
>>> their auto-complete values forcibly set to false
>>> 3) Once all jobs in the transaction are set to pending, emit an event
>>> ("TRANSACTION_READY", for instance)
>>> 4) Allow the transaction to be manually committed.
>>
>> This requires that we introduce names for transactions and let them
>> be managed explicily by the user. Keeping things at the individual jobs
>> certainly makes the interface simpler.
>>
> 
> True..
> 
>> I can't rule out that we won't want to make transactions explicitly
>> managed objects in the future for some reason, but that will probably be
>> something separate from the problem we're trying to solve now.
>>
>>> The alternative is to leave it on a per-job basis and just stipulate
>>> that any bizarre or inconvenient configurations are the fault of the
>>> caller. Leaving transactions completely untouched should theoretically
>>> work.
>>
>> Should be a lot easier. To integrate it into you state machine above:
>>
>> 3.  READY (COMPLETING) | READY (CANCELLING) --> DONE
>>         via: auto transition (block_job_completed() called; job doesn't
>>                               do any work any more)
>>         event: BLOCK_JOB_DONE
>> 4a. DONE --> PENDING
>>         via: auto transition (all jobs in the transaction are DONE)
>>         event: BLOCK_JOB_PENDING
>> 4b. PENDING --> COMPLETED | CANCELLED
>>         via: block-job-finalize
>>         event: COMPLETED | ERROR
>>
>> The naming clearly leaves a lot to be desired, I'm just running out of
>> names that actually make sense... But I hope you can see the mechanism
>> that I have in mind.
>>
> 
> I think so, "DONE" is the state where it's blocked on the transaction to
> finish. "PENDING" is the one where it's blocked on the user to allow it
> to move forward and call commit/abort/etc.
> 
>>>> 4. On block-job-reap, call the .commit/.abort callbacks of the jobs in
>>>>    the transaction. They will do most of the work that is currently done
>>>>    in the completion callbacks, in particular any graph changes. If we
>>>>    need to allow error cases, we can add a .prepare_completion callback
>>>>    that can still let the whole transaction fail.
>>>
>>> Makes sense by analogy. Probably worth having anyway. I moved some
>>> likely-to-deadlock code from the backup cleanup into .commit even when
>>> it runs outside of transactions. Other jobs can likely benefit from some
>>> simplified assumptions by running in that context, too.
>>>
>>>> 5. Send the final QMP completion event for each job in the transaction
>>>>    with the final error code. This is the existing BLOCK_JOB_COMPLETED
>>>>    or BLOCK_JOB_CANCELLED event.
>>>>
>>>> The current RFC still executes .commit/.abort before block-job-reap, so
>>>> the graph changes happen too early to be under control of the management
>>>> tool.
>>>>
>>>>> RFC:
>>>>> The next version will add tests for transactions.
>>>>> Kevin, can you please take a look at bdrv_is_root_node and how it is
>>>>> used with respect to do_drive_backup? I suspect that in this case that
>>>>> "is root" should actually be "true", but a node in use by a job has
>>>>> two roles; child_root and child_job, so it starts returning false here.
>>>>>
>>>>> That's fine because we prevent a collision that way, but it makes the
>>>>> error messages pretty bad and misleading. Do you have a quick suggestion?
>>>>> (Should I just amend the loop to allow non-root nodes as long as they
>>>>> happen to be jobs so that the job creation code can check permissions?)
>>>>
>>>> We kind of sidestepped this problem by deciding that there is no real
>>>> reason for the backup job to require the source to be a root node.
>>>>
>>>> I'm not completely sure how we could easily get a better message while
>>>> still requiring a root node (and I suppose there are other places that
>>>> rightfully still require a root node). Ignoring children with child_job
>>>> feels ugly, but might be the best option.
>>>>
>>>> Note that this will not make the conflicting command work suddenly,
>>>> because every node that has a child_job parent also has op blockers for
>>>> everything, but the error message should be less confusing than "is not
>>>> a root node".
>>>>
>>>> Kevin
>>>>
>>>
>>> TLDR:
>>>
>>> - I think we may need to have optional manual completion steps both
>>> before and after the job .prepare()/.commit()/.abort() phase.
>>> - Before, like "block-job-complete" to allow graph changes to be under
>>> management tool control, and
>>> - After, so that final job success status can be queried even if the
>>> event was missed.
>>
>> Agreed (except that I don't see what "block-job-complete" has to do with
>> it, this command is about an earlier transition).
>>
>>> Proposal:
>>>
>>> (1) Extend block-job-complete semantics to all jobs that opt in via
>>> "auto-complete: false" which is not allowed to be set for mirror jobs
>>> (2) If the modern overloading of the BLOCK_JOB_READY event is apt to
>>> cause confusion for existing tools, a new event BLOCK_JOB_PENDING could
>>> be emitted instead for any job capable of accepting the
>>> auto-complete=true/false parameter and emit it upon reaching this state.
>>> (3) Continue forward with this patchset's current persistent/reap
>>> nomenclature to prevent automatic cleanup if desired, and
>>> (4) Implement transaction-wide settings for auto-complete alongside a
>>> new "transaction complete" event to allow for a transaction-wide
>>> "complete" command.
>>
>> Let me try to just consolidate all of the above into a single state
>> machine:
>>
>> 1.  CREATED --> RUNNING
>>         driver callback: .start
>> 2a. RUNNING --> READY | CANCELLED
>>         via: auto transition (when bulk copy is finished) / block-job-cancel
>>         event: BLOCK_JOB_READY
>> 2b. READY --> READY (COMPLETING) | READY (CANCELLING)
>>         via: block-job-complete / block-job-cancel
>>         event: none
>>         driver callback: .complete / none
>> 3.  READY (CANCELLING | COMPLETING) --> DONE
>>         via: auto transition
>>              (CANCELLING: after draining in-flight mirror requests;
>>               COMPLETING: when images are in sync)
>>         event: BLOCK_JOB_DONE
>> 4.  DONE --> PENDING
>>         via: auto transition (all jobs in the transaction are DONE)
>>         event: BLOCK_JOB_PENDING
>> 5.  PENDING --> FINISHED
>>         via: block-job-finalize
>>         event: COMPLETED | CANCELLED
>>         driver callback: .prepare_finalize / .commit / .abort
>> 6.  FINISHED --> NULL
>>         via: block-job-reap
>>         event: none
>>         driver callback: .clean
>>
>> I removed COMPLETED/CANCELLED states because they are never externally
>> visible. You proposed an "auto transition" there, but the transition
>> would be immediately after the previous one, so clients always see
>> PENDING --> NULL | FINISHED.
>>
>> We would have two booleans to make explicit transition automatically:
>>
>>     auto-finalize for block-job-finalize (default: true)
>>     auto-reap     for block-job-reap     (default: true)
>>
>> Both of them would be executed automatically as soon as the respective
>> commands would be available.
>>
>> We could add more auto-* options for the remaining explicit transition
>> (block-job-complete/cancel in READY), but these are not important for
>> the problems we're trying to solve here. They might become interesting
>> if we do decide that we want a single copy block job instead of doing
>> similar things in mirror, commit and backup.
>>
>> The naming needs some improvements (done -> pending -> finished looks
>> really odd), but does this make sense otherwise?
>>
>> Kevin
>>
> I think so. We'll see if I understand when I write v3 :)
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-10-09 14:45 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-04  1:52 [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping John Snow
2017-10-04  1:52 ` [Qemu-devel] [PATCH v2 1/4] blockjob: add persistent property John Snow
2017-10-04  1:52 ` [Qemu-devel] [PATCH v2 2/4] qmp: add block-job-reap command John Snow
2017-10-04  1:52 ` [Qemu-devel] [PATCH v2 3/4] blockjob: expose persistent property John Snow
2017-10-04  1:52 ` [Qemu-devel] [PATCH v2 4/4] iotests: test manual job reaping John Snow
2017-10-04 18:27 ` [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit " Kevin Wolf
2017-10-05  1:46   ` John Snow
2017-10-05 11:38     ` Kevin Wolf
2017-10-05 18:17       ` John Snow
2017-10-09 14:30         ` Nikolay Shirokovskiy
2017-10-06  3:56       ` John Snow
2017-10-06  9:05         ` Kevin Wolf
2017-10-06  5:52       ` Markus Armbruster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.