All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/9] jobs: Job Exit Refactoring Pt 1
@ 2018-08-23 22:08 John Snow
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 1/9] jobs: change start callback to run callback John Snow
                   ` (8 more replies)
  0 siblings, 9 replies; 23+ messages in thread
From: John Snow @ 2018-08-23 22:08 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody, Max Reitz, John Snow

This is part one of a two part series that refactors the exit logic
of jobs.

Part one removes job_defer_to_main_loop.
Part two removes the job->exit() callback introduced in part one.

It's redundant to have each job manage deferring to the main loop
itself. Unifying this makes sense from an API standpoint.
Doing so also allows us to remove a tricky case where the completion
code is called under an aio_context lock, which then calls the
finalization code which is itself executed under a second aio_context
lock, leading to deadlocks.

Removing this recursive lock acquisition is necessary for converting
mirror to only modify its graph post-finalization, but it's also just
safer and will bite us less in the future.

This series introduces a job->exit callback, but after jobs are
fully transitioned to using the .commit/.abort callbacks in Pt 2,
this new completion callback will be removed again. It's only here
as a crutch to let us investigate the completion refactoring in Pt 2
more carefully.

V2:

001/9:[----] [--] 'jobs: change start callback to run callback'
002/9:[0003] [FC] 'jobs: canonize Error object'
003/9:[0005] [FC] 'jobs: add exit shim'
004/9:[0002] [FC] 'block/commit: utilize job_exit shim'
005/9:[----] [--] 'block/mirror: utilize job_exit shim'
006/9:[----] [--] 'jobs: utilize job_exit shim'
007/9:[down] 'block/backup: make function variables consistently named'
008/9:[down] 'jobs: remove ret argument to job_completed; privatize it'
009/9:[----] [--] 'jobs: remove job_defer_to_main_loop'

002: Update commit message
     Remove errant space (Eric, Max)
     Update error message setting (Kevin)
003: Add comment clarifying that .exit is temporary/transitional (Max, Eric)
004: change reference from `ret` to `job->ret`
     (Note: most of these references go away in Pt 2 of the series,
            except for those in mirror_exit.) (for Max.)
007: Added, at Eric's suggestion.
008: Moved forward from Pt 2 of the series. (Max.)

Hopefully this version makes the trajectory clearer.
--js

John Snow (9):
  jobs: change start callback to run callback
  jobs: canonize Error object
  jobs: add exit shim
  block/commit: utilize job_exit shim
  block/mirror: utilize job_exit shim
  jobs: utilize job_exit shim
  block/backup: make function variables consistently named
  jobs: remove ret argument to job_completed; privatize it
  jobs: remove job_defer_to_main_loop

 block/backup.c            | 81 +++++++++++++++++++----------------------------
 block/commit.c            | 29 ++++++-----------
 block/create.c            | 19 ++++-------
 block/mirror.c            | 35 +++++++++-----------
 block/stream.c            | 29 +++++++----------
 include/qemu/job.h        | 62 +++++++++++++++---------------------
 job-qmp.c                 |  5 +--
 job.c                     | 73 +++++++++++++++---------------------------
 tests/test-bdrv-drain.c   | 13 +++-----
 tests/test-blockjob-txn.c | 25 ++++++---------
 tests/test-blockjob.c     | 17 +++++-----
 trace-events              |  2 +-
 12 files changed, 150 insertions(+), 240 deletions(-)

-- 
2.14.4

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH v2 1/9] jobs: change start callback to run callback
  2018-08-23 22:08 [Qemu-devel] [PATCH v2 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
@ 2018-08-23 22:08 ` John Snow
  2018-08-27  9:30   ` Max Reitz
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 2/9] jobs: canonize Error object John Snow
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: John Snow @ 2018-08-23 22:08 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody, Max Reitz, John Snow

Presently we codify the entry point for a job as the "start" callback,
but a more apt name would be "run" to clarify the idea that when this
function returns we consider the job to have "finished," except for
any cleanup which occurs in separate callbacks later.

As part of this clarification, change the signature to include an error
object and a return code. The error ptr is not yet used, and the return
code while captured, will be overwritten by actions in the job_completed
function.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c            |  7 ++++---
 block/commit.c            |  7 ++++---
 block/create.c            |  8 +++++---
 block/mirror.c            | 10 ++++++----
 block/stream.c            |  7 ++++---
 include/qemu/job.h        |  2 +-
 job.c                     |  6 +++---
 tests/test-bdrv-drain.c   |  7 ++++---
 tests/test-blockjob-txn.c | 16 ++++++++--------
 tests/test-blockjob.c     |  7 ++++---
 10 files changed, 43 insertions(+), 34 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 8630d32926..5d47781840 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -480,9 +480,9 @@ static void backup_incremental_init_copy_bitmap(BackupBlockJob *job)
     bdrv_dirty_iter_free(dbi);
 }
 
-static void coroutine_fn backup_run(void *opaque)
+static int coroutine_fn backup_run(Job *opaque_job, Error **errp)
 {
-    BackupBlockJob *job = opaque;
+    BackupBlockJob *job = container_of(opaque_job, BackupBlockJob, common.job);
     BackupCompleteData *data;
     BlockDriverState *bs = blk_bs(job->common.blk);
     int64_t offset, nb_clusters;
@@ -587,6 +587,7 @@ static void coroutine_fn backup_run(void *opaque)
     data = g_malloc(sizeof(*data));
     data->ret = ret;
     job_defer_to_main_loop(&job->common.job, backup_complete, data);
+    return ret;
 }
 
 static const BlockJobDriver backup_job_driver = {
@@ -596,7 +597,7 @@ static const BlockJobDriver backup_job_driver = {
         .free                   = block_job_free,
         .user_resume            = block_job_user_resume,
         .drain                  = block_job_drain,
-        .start                  = backup_run,
+        .run                    = backup_run,
         .commit                 = backup_commit,
         .abort                  = backup_abort,
         .clean                  = backup_clean,
diff --git a/block/commit.c b/block/commit.c
index eb414579bd..a0ea86ff64 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -134,9 +134,9 @@ static void commit_complete(Job *job, void *opaque)
     bdrv_unref(top);
 }
 
-static void coroutine_fn commit_run(void *opaque)
+static int coroutine_fn commit_run(Job *job, Error **errp)
 {
-    CommitBlockJob *s = opaque;
+    CommitBlockJob *s = container_of(job, CommitBlockJob, common.job);
     CommitCompleteData *data;
     int64_t offset;
     uint64_t delay_ns = 0;
@@ -213,6 +213,7 @@ out:
     data = g_malloc(sizeof(*data));
     data->ret = ret;
     job_defer_to_main_loop(&s->common.job, commit_complete, data);
+    return ret;
 }
 
 static const BlockJobDriver commit_job_driver = {
@@ -222,7 +223,7 @@ static const BlockJobDriver commit_job_driver = {
         .free          = block_job_free,
         .user_resume   = block_job_user_resume,
         .drain         = block_job_drain,
-        .start         = commit_run,
+        .run           = commit_run,
     },
 };
 
diff --git a/block/create.c b/block/create.c
index 915cd41bcc..04733c3618 100644
--- a/block/create.c
+++ b/block/create.c
@@ -45,9 +45,9 @@ static void blockdev_create_complete(Job *job, void *opaque)
     job_completed(job, s->ret, s->err);
 }
 
-static void coroutine_fn blockdev_create_run(void *opaque)
+static int coroutine_fn blockdev_create_run(Job *job, Error **errp)
 {
-    BlockdevCreateJob *s = opaque;
+    BlockdevCreateJob *s = container_of(job, BlockdevCreateJob, common);
 
     job_progress_set_remaining(&s->common, 1);
     s->ret = s->drv->bdrv_co_create(s->opts, &s->err);
@@ -55,12 +55,14 @@ static void coroutine_fn blockdev_create_run(void *opaque)
 
     qapi_free_BlockdevCreateOptions(s->opts);
     job_defer_to_main_loop(&s->common, blockdev_create_complete, NULL);
+
+    return s->ret;
 }
 
 static const JobDriver blockdev_create_job_driver = {
     .instance_size = sizeof(BlockdevCreateJob),
     .job_type      = JOB_TYPE_CREATE,
-    .start         = blockdev_create_run,
+    .run           = blockdev_create_run,
 };
 
 void qmp_blockdev_create(const char *job_id, BlockdevCreateOptions *options,
diff --git a/block/mirror.c b/block/mirror.c
index 6cc10df5c9..691763db41 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -812,9 +812,9 @@ static int mirror_flush(MirrorBlockJob *s)
     return ret;
 }
 
-static void coroutine_fn mirror_run(void *opaque)
+static int coroutine_fn mirror_run(Job *job, Error **errp)
 {
-    MirrorBlockJob *s = opaque;
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
     MirrorExitData *data;
     BlockDriverState *bs = s->mirror_top_bs->backing->bs;
     BlockDriverState *target_bs = blk_bs(s->target);
@@ -1041,7 +1041,9 @@ immediate_exit:
     if (need_drain) {
         bdrv_drained_begin(bs);
     }
+
     job_defer_to_main_loop(&s->common.job, mirror_exit, data);
+    return ret;
 }
 
 static void mirror_complete(Job *job, Error **errp)
@@ -1138,7 +1140,7 @@ static const BlockJobDriver mirror_job_driver = {
         .free                   = block_job_free,
         .user_resume            = block_job_user_resume,
         .drain                  = block_job_drain,
-        .start                  = mirror_run,
+        .run                    = mirror_run,
         .pause                  = mirror_pause,
         .complete               = mirror_complete,
     },
@@ -1154,7 +1156,7 @@ static const BlockJobDriver commit_active_job_driver = {
         .free                   = block_job_free,
         .user_resume            = block_job_user_resume,
         .drain                  = block_job_drain,
-        .start                  = mirror_run,
+        .run                    = mirror_run,
         .pause                  = mirror_pause,
         .complete               = mirror_complete,
     },
diff --git a/block/stream.c b/block/stream.c
index 9264b68a1e..b4b987df7e 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -97,9 +97,9 @@ out:
     g_free(data);
 }
 
-static void coroutine_fn stream_run(void *opaque)
+static int coroutine_fn stream_run(Job *job, Error **errp)
 {
-    StreamBlockJob *s = opaque;
+    StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
     StreamCompleteData *data;
     BlockBackend *blk = s->common.blk;
     BlockDriverState *bs = blk_bs(blk);
@@ -206,6 +206,7 @@ out:
     data = g_malloc(sizeof(*data));
     data->ret = ret;
     job_defer_to_main_loop(&s->common.job, stream_complete, data);
+    return ret;
 }
 
 static const BlockJobDriver stream_job_driver = {
@@ -213,7 +214,7 @@ static const BlockJobDriver stream_job_driver = {
         .instance_size = sizeof(StreamBlockJob),
         .job_type      = JOB_TYPE_STREAM,
         .free          = block_job_free,
-        .start         = stream_run,
+        .run           = stream_run,
         .user_resume   = block_job_user_resume,
         .drain         = block_job_drain,
     },
diff --git a/include/qemu/job.h b/include/qemu/job.h
index 18c9223e31..9cf463d228 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -169,7 +169,7 @@ struct JobDriver {
     JobType job_type;
 
     /** Mandatory: Entrypoint for the Coroutine. */
-    CoroutineEntry *start;
+    int coroutine_fn (*run)(Job *job, Error **errp);
 
     /**
      * If the callback is not NULL, it will be invoked when the job transitions
diff --git a/job.c b/job.c
index e36ebaafd8..76988f6678 100644
--- a/job.c
+++ b/job.c
@@ -544,16 +544,16 @@ static void coroutine_fn job_co_entry(void *opaque)
 {
     Job *job = opaque;
 
-    assert(job && job->driver && job->driver->start);
+    assert(job && job->driver && job->driver->run);
     job_pause_point(job);
-    job->driver->start(job);
+    job->ret = job->driver->run(job, NULL);
 }
 
 
 void job_start(Job *job)
 {
     assert(job && !job_started(job) && job->paused &&
-           job->driver && job->driver->start);
+           job->driver && job->driver->run);
     job->co = qemu_coroutine_create(job_co_entry, job);
     job->pause_count--;
     job->busy = true;
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 17bb8508ae..a7533861f6 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -757,9 +757,9 @@ static void test_job_completed(Job *job, void *opaque)
     job_completed(job, 0, NULL);
 }
 
-static void coroutine_fn test_job_start(void *opaque)
+static int coroutine_fn test_job_run(Job *job, Error **errp)
 {
-    TestBlockJob *s = opaque;
+    TestBlockJob *s = container_of(job, TestBlockJob, common.job);
 
     job_transition_to_ready(&s->common.job);
     while (!s->should_complete) {
@@ -771,6 +771,7 @@ static void coroutine_fn test_job_start(void *opaque)
     }
 
     job_defer_to_main_loop(&s->common.job, test_job_completed, NULL);
+    return 0;
 }
 
 static void test_job_complete(Job *job, Error **errp)
@@ -785,7 +786,7 @@ BlockJobDriver test_job_driver = {
         .free           = block_job_free,
         .user_resume    = block_job_user_resume,
         .drain          = block_job_drain,
-        .start          = test_job_start,
+        .run            = test_job_run,
         .complete       = test_job_complete,
     },
 };
diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
index 58d9b87fb2..3194924071 100644
--- a/tests/test-blockjob-txn.c
+++ b/tests/test-blockjob-txn.c
@@ -38,25 +38,25 @@ static void test_block_job_complete(Job *job, void *opaque)
     bdrv_unref(bs);
 }
 
-static void coroutine_fn test_block_job_run(void *opaque)
+static int coroutine_fn test_block_job_run(Job *job, Error **errp)
 {
-    TestBlockJob *s = opaque;
-    BlockJob *job = &s->common;
+    TestBlockJob *s = container_of(job, TestBlockJob, common.job);
 
     while (s->iterations--) {
         if (s->use_timer) {
-            job_sleep_ns(&job->job, 0);
+            job_sleep_ns(job, 0);
         } else {
-            job_yield(&job->job);
+            job_yield(job);
         }
 
-        if (job_is_cancelled(&job->job)) {
+        if (job_is_cancelled(job)) {
             break;
         }
     }
 
-    job_defer_to_main_loop(&job->job, test_block_job_complete,
+    job_defer_to_main_loop(job, test_block_job_complete,
                            (void *)(intptr_t)s->rc);
+    return s->rc;
 }
 
 typedef struct {
@@ -80,7 +80,7 @@ static const BlockJobDriver test_block_job_driver = {
         .free          = block_job_free,
         .user_resume   = block_job_user_resume,
         .drain         = block_job_drain,
-        .start         = test_block_job_run,
+        .run           = test_block_job_run,
     },
 };
 
diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
index cb42f06e61..b0462bfdec 100644
--- a/tests/test-blockjob.c
+++ b/tests/test-blockjob.c
@@ -176,9 +176,9 @@ static void cancel_job_complete(Job *job, Error **errp)
     s->should_complete = true;
 }
 
-static void coroutine_fn cancel_job_start(void *opaque)
+static int coroutine_fn cancel_job_run(Job *job, Error **errp)
 {
-    CancelJob *s = opaque;
+    CancelJob *s = container_of(job, CancelJob, common.job);
 
     while (!s->should_complete) {
         if (job_is_cancelled(&s->common.job)) {
@@ -194,6 +194,7 @@ static void coroutine_fn cancel_job_start(void *opaque)
 
  defer:
     job_defer_to_main_loop(&s->common.job, cancel_job_completed, s);
+    return 0;
 }
 
 static const BlockJobDriver test_cancel_driver = {
@@ -202,7 +203,7 @@ static const BlockJobDriver test_cancel_driver = {
         .free          = block_job_free,
         .user_resume   = block_job_user_resume,
         .drain         = block_job_drain,
-        .start         = cancel_job_start,
+        .run           = cancel_job_run,
         .complete      = cancel_job_complete,
     },
 };
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH v2 2/9] jobs: canonize Error object
  2018-08-23 22:08 [Qemu-devel] [PATCH v2 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 1/9] jobs: change start callback to run callback John Snow
@ 2018-08-23 22:08 ` John Snow
  2018-08-27  9:41   ` Max Reitz
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 3/9] jobs: add exit shim John Snow
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: John Snow @ 2018-08-23 22:08 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody, Max Reitz, John Snow

Jobs presently use both an Error object in the case of the create job,
and char strings in the case of generic errors elsewhere.

Unify the two paths as just j->err, and remove the extra argument from
job_completed. The integer error code for job_completed is kept for now,
to be removed shortly in a separate patch.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c            |  2 +-
 block/commit.c            |  2 +-
 block/create.c            |  5 ++---
 block/mirror.c            |  2 +-
 block/stream.c            |  2 +-
 include/qemu/job.h        | 10 ++++------
 job-qmp.c                 |  5 +++--
 job.c                     | 18 ++++++------------
 tests/test-bdrv-drain.c   |  2 +-
 tests/test-blockjob-txn.c |  2 +-
 tests/test-blockjob.c     |  2 +-
 11 files changed, 22 insertions(+), 30 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 5d47781840..1e965d54e5 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -388,7 +388,7 @@ static void backup_complete(Job *job, void *opaque)
 {
     BackupCompleteData *data = opaque;
 
-    job_completed(job, data->ret, NULL);
+    job_completed(job, data->ret);
     g_free(data);
 }
 
diff --git a/block/commit.c b/block/commit.c
index a0ea86ff64..4a17bb73ec 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -117,7 +117,7 @@ static void commit_complete(Job *job, void *opaque)
      * bdrv_set_backing_hd() to fail. */
     block_job_remove_all_bdrv(bjob);
 
-    job_completed(job, ret, NULL);
+    job_completed(job, ret);
     g_free(data);
 
     /* If bdrv_drop_intermediate() didn't already do that, remove the commit
diff --git a/block/create.c b/block/create.c
index 04733c3618..26a385c6c7 100644
--- a/block/create.c
+++ b/block/create.c
@@ -35,14 +35,13 @@ typedef struct BlockdevCreateJob {
     BlockDriver *drv;
     BlockdevCreateOptions *opts;
     int ret;
-    Error *err;
 } BlockdevCreateJob;
 
 static void blockdev_create_complete(Job *job, void *opaque)
 {
     BlockdevCreateJob *s = container_of(job, BlockdevCreateJob, common);
 
-    job_completed(job, s->ret, s->err);
+    job_completed(job, s->ret);
 }
 
 static int coroutine_fn blockdev_create_run(Job *job, Error **errp)
@@ -50,7 +49,7 @@ static int coroutine_fn blockdev_create_run(Job *job, Error **errp)
     BlockdevCreateJob *s = container_of(job, BlockdevCreateJob, common);
 
     job_progress_set_remaining(&s->common, 1);
-    s->ret = s->drv->bdrv_co_create(s->opts, &s->err);
+    s->ret = s->drv->bdrv_co_create(s->opts, errp);
     job_progress_update(&s->common, 1);
 
     qapi_free_BlockdevCreateOptions(s->opts);
diff --git a/block/mirror.c b/block/mirror.c
index 691763db41..be5dc6b7b0 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -710,7 +710,7 @@ static void mirror_exit(Job *job, void *opaque)
     blk_insert_bs(bjob->blk, mirror_top_bs, &error_abort);
 
     bs_opaque->job = NULL;
-    job_completed(job, data->ret, NULL);
+    job_completed(job, data->ret);
 
     g_free(data);
     bdrv_drained_end(src);
diff --git a/block/stream.c b/block/stream.c
index b4b987df7e..26a775386b 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -93,7 +93,7 @@ out:
     }
 
     g_free(s->backing_file_str);
-    job_completed(job, data->ret, NULL);
+    job_completed(job, data->ret);
     g_free(data);
 }
 
diff --git a/include/qemu/job.h b/include/qemu/job.h
index 9cf463d228..5c92c53ef0 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -124,12 +124,12 @@ typedef struct Job {
     /** Estimated progress_current value at the completion of the job */
     int64_t progress_total;
 
-    /** Error string for a failed job (NULL if, and only if, job->ret == 0) */
-    char *error;
-
     /** ret code passed to job_completed. */
     int ret;
 
+    /** Error object for a failed job **/
+    Error *err;
+
     /** The completion function that will be called when the job completes.  */
     BlockCompletionFunc *cb;
 
@@ -484,15 +484,13 @@ void job_transition_to_ready(Job *job);
 /**
  * @job: The job being completed.
  * @ret: The status code.
- * @error: The error message for a failing job (only with @ret < 0). If @ret is
- *         negative, but NULL is given for @error, strerror() is used.
  *
  * Marks @job as completed. If @ret is non-zero, the job transaction it is part
  * of is aborted. If @ret is zero, the job moves into the WAITING state. If it
  * is the last job to complete in its transaction, all jobs in the transaction
  * move from WAITING to PENDING.
  */
-void job_completed(Job *job, int ret, Error *error);
+void job_completed(Job *job, int ret);
 
 /** Asynchronously complete the specified @job. */
 void job_complete(Job *job, Error **errp);
diff --git a/job-qmp.c b/job-qmp.c
index 410775df61..a969b2bbf0 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -146,8 +146,9 @@ static JobInfo *job_query_single(Job *job, Error **errp)
         .status             = job->status,
         .current_progress   = job->progress_current,
         .total_progress     = job->progress_total,
-        .has_error          = !!job->error,
-        .error              = g_strdup(job->error),
+        .has_error          = !!job->err,
+        .error              = job->err ? \
+                              g_strdup(error_get_pretty(job->err)) : NULL,
     };
 
     return info;
diff --git a/job.c b/job.c
index 76988f6678..bc1d970df4 100644
--- a/job.c
+++ b/job.c
@@ -369,7 +369,7 @@ void job_unref(Job *job)
 
         QLIST_REMOVE(job, job_list);
 
-        g_free(job->error);
+        error_free(job->err);
         g_free(job->id);
         g_free(job);
     }
@@ -546,7 +546,7 @@ static void coroutine_fn job_co_entry(void *opaque)
 
     assert(job && job->driver && job->driver->run);
     job_pause_point(job);
-    job->ret = job->driver->run(job, NULL);
+    job->ret = job->driver->run(job, &job->err);
 }
 
 
@@ -666,8 +666,8 @@ static void job_update_rc(Job *job)
         job->ret = -ECANCELED;
     }
     if (job->ret) {
-        if (!job->error) {
-            job->error = g_strdup(strerror(-job->ret));
+        if (!job->err) {
+            error_setg(&job->err, "%s", g_strdup(strerror(-job->ret)));
         }
         job_state_transition(job, JOB_STATUS_ABORTING);
     }
@@ -865,17 +865,11 @@ static void job_completed_txn_success(Job *job)
     }
 }
 
-void job_completed(Job *job, int ret, Error *error)
+void job_completed(Job *job, int ret)
 {
     assert(job && job->txn && !job_is_completed(job));
 
     job->ret = ret;
-    if (error) {
-        assert(job->ret < 0);
-        job->error = g_strdup(error_get_pretty(error));
-        error_free(error);
-    }
-
     job_update_rc(job);
     trace_job_completed(job, ret, job->ret);
     if (job->ret) {
@@ -893,7 +887,7 @@ void job_cancel(Job *job, bool force)
     }
     job_cancel_async(job, force);
     if (!job_started(job)) {
-        job_completed(job, -ECANCELED, NULL);
+        job_completed(job, -ECANCELED);
     } else if (job->deferred_to_main_loop) {
         job_completed_txn_abort(job);
     } else {
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index a7533861f6..00604dfc0c 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -754,7 +754,7 @@ typedef struct TestBlockJob {
 
 static void test_job_completed(Job *job, void *opaque)
 {
-    job_completed(job, 0, NULL);
+    job_completed(job, 0);
 }
 
 static int coroutine_fn test_job_run(Job *job, Error **errp)
diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
index 3194924071..82cedee78b 100644
--- a/tests/test-blockjob-txn.c
+++ b/tests/test-blockjob-txn.c
@@ -34,7 +34,7 @@ static void test_block_job_complete(Job *job, void *opaque)
         rc = -ECANCELED;
     }
 
-    job_completed(job, rc, NULL);
+    job_completed(job, rc);
     bdrv_unref(bs);
 }
 
diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
index b0462bfdec..408a226939 100644
--- a/tests/test-blockjob.c
+++ b/tests/test-blockjob.c
@@ -167,7 +167,7 @@ static void cancel_job_completed(Job *job, void *opaque)
 {
     CancelJob *s = opaque;
     s->completed = true;
-    job_completed(job, 0, NULL);
+    job_completed(job, 0);
 }
 
 static void cancel_job_complete(Job *job, Error **errp)
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH v2 3/9] jobs: add exit shim
  2018-08-23 22:08 [Qemu-devel] [PATCH v2 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 1/9] jobs: change start callback to run callback John Snow
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 2/9] jobs: canonize Error object John Snow
@ 2018-08-23 22:08 ` John Snow
  2018-08-27 10:00   ` Max Reitz
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 4/9] block/commit: utilize job_exit shim John Snow
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: John Snow @ 2018-08-23 22:08 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody, Max Reitz, John Snow

All jobs do the same thing when they leave their running loop:
- Store the return code in a structure
- wait to receive this structure in the main thread
- signal job completion via job_completed

Few jobs do anything beyond exactly this. Consolidate this exit
logic for a net reduction in SLOC.

More seriously, when we utilize job_defer_to_main_loop_bh to call
a function that calls job_completed, job_finalize_single will run
in a context where it has recursively taken the aio_context lock,
which can cause hangs if it puts down a reference that causes a flush.

You can observe this in practice by looking at mirror_exit's careful
placement of job_completed and bdrv_unref calls.

If we centralize job exiting, we can signal job completion from outside
of the aio_context, which should allow for job cleanup code to run with
only one lock, which makes cleanup callbacks less tricky to write.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 include/qemu/job.h | 11 +++++++++++
 job.c              | 18 ++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 5c92c53ef0..c67f6a647e 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -204,6 +204,17 @@ struct JobDriver {
      */
     void (*drain)(Job *job);
 
+    /**
+     * If the callback is not NULL, exit will be invoked from the main thread
+     * when the job's coroutine has finished, but before transactional
+     * convergence; before @prepare or @abort.
+     *
+     * FIXME TODO: This callback is only temporary to transition remaining jobs
+     * to prepare/commit/abort/clean callbacks and will be removed before 3.1.
+     * is released.
+     */
+    void (*exit)(Job *job);
+
     /**
      * If the callback is not NULL, prepare will be invoked when all the jobs
      * belonging to the same transaction complete; or upon this job's completion
diff --git a/job.c b/job.c
index bc1d970df4..bc8dad4e71 100644
--- a/job.c
+++ b/job.c
@@ -535,6 +535,18 @@ void job_drain(Job *job)
     }
 }
 
+static void job_exit(void *opaque)
+{
+    Job *job = (Job *)opaque;
+    AioContext *aio_context = job->aio_context;
+
+    if (job->driver->exit) {
+        aio_context_acquire(aio_context);
+        job->driver->exit(job);
+        aio_context_release(aio_context);
+    }
+    job_completed(job, job->ret);
+}
 
 /**
  * All jobs must allow a pause point before entering their job proper. This
@@ -547,6 +559,12 @@ static void coroutine_fn job_co_entry(void *opaque)
     assert(job && job->driver && job->driver->run);
     job_pause_point(job);
     job->ret = job->driver->run(job, &job->err);
+    if (!job->deferred_to_main_loop) {
+        job->deferred_to_main_loop = true;
+        aio_bh_schedule_oneshot(qemu_get_aio_context(),
+                                job_exit,
+                                job);
+    }
 }
 
 
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH v2 4/9] block/commit: utilize job_exit shim
  2018-08-23 22:08 [Qemu-devel] [PATCH v2 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
                   ` (2 preceding siblings ...)
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 3/9] jobs: add exit shim John Snow
@ 2018-08-23 22:08 ` John Snow
  2018-08-27 10:28   ` Max Reitz
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 5/9] block/mirror: " John Snow
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: John Snow @ 2018-08-23 22:08 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody, Max Reitz, John Snow

Change the manual deferment to commit_complete into the implicit
callback to job_exit, renaming commit_complete to commit_exit.

This conversion does change the timing of when job_completed is
called to after the bdrv_replace_node and bdrv_unref calls, which
could have implications for bjob->blk which will now be put down
after this cleanup.

Kevin highlights that we did not take any permissions for that backend
at job creation time, so it is safe to reorder these operations.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/commit.c | 22 +++++-----------------
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 4a17bb73ec..da69165de3 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -68,19 +68,13 @@ static int coroutine_fn commit_populate(BlockBackend *bs, BlockBackend *base,
     return 0;
 }
 
-typedef struct {
-    int ret;
-} CommitCompleteData;
-
-static void commit_complete(Job *job, void *opaque)
+static void commit_exit(Job *job)
 {
     CommitBlockJob *s = container_of(job, CommitBlockJob, common.job);
     BlockJob *bjob = &s->common;
-    CommitCompleteData *data = opaque;
     BlockDriverState *top = blk_bs(s->top);
     BlockDriverState *base = blk_bs(s->base);
     BlockDriverState *commit_top_bs = s->commit_top_bs;
-    int ret = data->ret;
     bool remove_commit_top_bs = false;
 
     /* Make sure commit_top_bs and top stay around until bdrv_replace_node() */
@@ -91,10 +85,10 @@ static void commit_complete(Job *job, void *opaque)
      * the normal backing chain can be restored. */
     blk_unref(s->base);
 
-    if (!job_is_cancelled(job) && ret == 0) {
+    if (!job_is_cancelled(job) && job->ret == 0) {
         /* success */
-        ret = bdrv_drop_intermediate(s->commit_top_bs, base,
-                                     s->backing_file_str);
+        job->ret = bdrv_drop_intermediate(s->commit_top_bs, base,
+                                          s->backing_file_str);
     } else {
         /* XXX Can (or should) we somehow keep 'consistent read' blocked even
          * after the failed/cancelled commit job is gone? If we already wrote
@@ -117,9 +111,6 @@ static void commit_complete(Job *job, void *opaque)
      * bdrv_set_backing_hd() to fail. */
     block_job_remove_all_bdrv(bjob);
 
-    job_completed(job, ret);
-    g_free(data);
-
     /* If bdrv_drop_intermediate() didn't already do that, remove the commit
      * filter driver from the backing chain. Do this as the final step so that
      * the 'consistent read' permission can be granted.  */
@@ -137,7 +128,6 @@ static void commit_complete(Job *job, void *opaque)
 static int coroutine_fn commit_run(Job *job, Error **errp)
 {
     CommitBlockJob *s = container_of(job, CommitBlockJob, common.job);
-    CommitCompleteData *data;
     int64_t offset;
     uint64_t delay_ns = 0;
     int ret = 0;
@@ -210,9 +200,6 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
 out:
     qemu_vfree(buf);
 
-    data = g_malloc(sizeof(*data));
-    data->ret = ret;
-    job_defer_to_main_loop(&s->common.job, commit_complete, data);
     return ret;
 }
 
@@ -224,6 +211,7 @@ static const BlockJobDriver commit_job_driver = {
         .user_resume   = block_job_user_resume,
         .drain         = block_job_drain,
         .run           = commit_run,
+        .exit          = commit_exit,
     },
 };
 
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH v2 5/9] block/mirror: utilize job_exit shim
  2018-08-23 22:08 [Qemu-devel] [PATCH v2 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
                   ` (3 preceding siblings ...)
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 4/9] block/commit: utilize job_exit shim John Snow
@ 2018-08-23 22:08 ` John Snow
  2018-08-27 10:30   ` Max Reitz
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 6/9] jobs: " John Snow
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: John Snow @ 2018-08-23 22:08 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody, Max Reitz, John Snow

Change the manual deferment to mirror_exit into the implicit
callback to job_exit and the mirror_exit callback.

This does change the order of some bdrv_unref calls and job_completed,
but thanks to the new context in which we call .exit, this is safe to
defer the possible flushing of any nodes to the job_finalize_single
cleanup stage.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/mirror.c | 25 +++++++++----------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index be5dc6b7b0..57b4ac97d8 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -607,21 +607,17 @@ static void mirror_wait_for_all_io(MirrorBlockJob *s)
     }
 }
 
-typedef struct {
-    int ret;
-} MirrorExitData;
-
-static void mirror_exit(Job *job, void *opaque)
+static void mirror_exit(Job *job)
 {
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
     BlockJob *bjob = &s->common;
-    MirrorExitData *data = opaque;
     MirrorBDSOpaque *bs_opaque = s->mirror_top_bs->opaque;
     AioContext *replace_aio_context = NULL;
     BlockDriverState *src = s->mirror_top_bs->backing->bs;
     BlockDriverState *target_bs = blk_bs(s->target);
     BlockDriverState *mirror_top_bs = s->mirror_top_bs;
     Error *local_err = NULL;
+    int ret = job->ret;
 
     bdrv_release_dirty_bitmap(src, s->dirty_bitmap);
 
@@ -652,7 +648,7 @@ static void mirror_exit(Job *job, void *opaque)
             bdrv_set_backing_hd(target_bs, backing, &local_err);
             if (local_err) {
                 error_report_err(local_err);
-                data->ret = -EPERM;
+                ret = -EPERM;
             }
         }
     }
@@ -662,7 +658,7 @@ static void mirror_exit(Job *job, void *opaque)
         aio_context_acquire(replace_aio_context);
     }
 
-    if (s->should_complete && data->ret == 0) {
+    if (s->should_complete && ret == 0) {
         BlockDriverState *to_replace = src;
         if (s->to_replace) {
             to_replace = s->to_replace;
@@ -679,7 +675,7 @@ static void mirror_exit(Job *job, void *opaque)
         bdrv_drained_end(target_bs);
         if (local_err) {
             error_report_err(local_err);
-            data->ret = -EPERM;
+            ret = -EPERM;
         }
     }
     if (s->to_replace) {
@@ -710,12 +706,12 @@ static void mirror_exit(Job *job, void *opaque)
     blk_insert_bs(bjob->blk, mirror_top_bs, &error_abort);
 
     bs_opaque->job = NULL;
-    job_completed(job, data->ret);
 
-    g_free(data);
     bdrv_drained_end(src);
     bdrv_unref(mirror_top_bs);
     bdrv_unref(src);
+
+    job->ret = ret;
 }
 
 static void mirror_throttle(MirrorBlockJob *s)
@@ -815,7 +811,6 @@ static int mirror_flush(MirrorBlockJob *s)
 static int coroutine_fn mirror_run(Job *job, Error **errp)
 {
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
-    MirrorExitData *data;
     BlockDriverState *bs = s->mirror_top_bs->backing->bs;
     BlockDriverState *target_bs = blk_bs(s->target);
     bool need_drain = true;
@@ -1035,14 +1030,10 @@ immediate_exit:
     g_free(s->in_flight_bitmap);
     bdrv_dirty_iter_free(s->dbi);
 
-    data = g_malloc(sizeof(*data));
-    data->ret = ret;
-
     if (need_drain) {
         bdrv_drained_begin(bs);
     }
 
-    job_defer_to_main_loop(&s->common.job, mirror_exit, data);
     return ret;
 }
 
@@ -1141,6 +1132,7 @@ static const BlockJobDriver mirror_job_driver = {
         .user_resume            = block_job_user_resume,
         .drain                  = block_job_drain,
         .run                    = mirror_run,
+        .exit                   = mirror_exit,
         .pause                  = mirror_pause,
         .complete               = mirror_complete,
     },
@@ -1157,6 +1149,7 @@ static const BlockJobDriver commit_active_job_driver = {
         .user_resume            = block_job_user_resume,
         .drain                  = block_job_drain,
         .run                    = mirror_run,
+        .exit                   = mirror_exit,
         .pause                  = mirror_pause,
         .complete               = mirror_complete,
     },
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH v2 6/9] jobs: utilize job_exit shim
  2018-08-23 22:08 [Qemu-devel] [PATCH v2 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
                   ` (4 preceding siblings ...)
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 5/9] block/mirror: " John Snow
@ 2018-08-23 22:08 ` John Snow
  2018-08-27 10:37   ` Max Reitz
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 7/9] block/backup: make function variables consistently named John Snow
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: John Snow @ 2018-08-23 22:08 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody, Max Reitz, John Snow

Utilize the job_exit shim by not calling job_defer_to_main_loop, and
where applicable, converting the deferred callback into the job_exit
callback.

This converts backup, stream, create, and the unit tests all at once.
Most of these jobs do not see any changes to the order in which they
clean up their resources, except the test-blockjob-txn test, which
now puts down its bs before job_completed is called.

This is safe for the same reason the reordering in the mirror job is
safe, because job_completed no longer runs under two locks, making
the unref safe even if it causes a flush.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c            | 16 ----------------
 block/create.c            | 14 +++-----------
 block/stream.c            | 22 +++++++---------------
 tests/test-bdrv-drain.c   |  6 ------
 tests/test-blockjob-txn.c | 11 ++---------
 tests/test-blockjob.c     | 10 ++++------
 6 files changed, 16 insertions(+), 63 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 1e965d54e5..a67b7fa96b 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -380,18 +380,6 @@ static BlockErrorAction backup_error_action(BackupBlockJob *job,
     }
 }
 
-typedef struct {
-    int ret;
-} BackupCompleteData;
-
-static void backup_complete(Job *job, void *opaque)
-{
-    BackupCompleteData *data = opaque;
-
-    job_completed(job, data->ret);
-    g_free(data);
-}
-
 static bool coroutine_fn yield_and_check(BackupBlockJob *job)
 {
     uint64_t delay_ns;
@@ -483,7 +471,6 @@ static void backup_incremental_init_copy_bitmap(BackupBlockJob *job)
 static int coroutine_fn backup_run(Job *opaque_job, Error **errp)
 {
     BackupBlockJob *job = container_of(opaque_job, BackupBlockJob, common.job);
-    BackupCompleteData *data;
     BlockDriverState *bs = blk_bs(job->common.blk);
     int64_t offset, nb_clusters;
     int ret = 0;
@@ -584,9 +571,6 @@ static int coroutine_fn backup_run(Job *opaque_job, Error **errp)
     qemu_co_rwlock_unlock(&job->flush_rwlock);
     hbitmap_free(job->copy_bitmap);
 
-    data = g_malloc(sizeof(*data));
-    data->ret = ret;
-    job_defer_to_main_loop(&job->common.job, backup_complete, data);
     return ret;
 }
 
diff --git a/block/create.c b/block/create.c
index 26a385c6c7..95341219ef 100644
--- a/block/create.c
+++ b/block/create.c
@@ -34,28 +34,20 @@ typedef struct BlockdevCreateJob {
     Job common;
     BlockDriver *drv;
     BlockdevCreateOptions *opts;
-    int ret;
 } BlockdevCreateJob;
 
-static void blockdev_create_complete(Job *job, void *opaque)
-{
-    BlockdevCreateJob *s = container_of(job, BlockdevCreateJob, common);
-
-    job_completed(job, s->ret);
-}
-
 static int coroutine_fn blockdev_create_run(Job *job, Error **errp)
 {
     BlockdevCreateJob *s = container_of(job, BlockdevCreateJob, common);
+    int ret;
 
     job_progress_set_remaining(&s->common, 1);
-    s->ret = s->drv->bdrv_co_create(s->opts, errp);
+    ret = s->drv->bdrv_co_create(s->opts, errp);
     job_progress_update(&s->common, 1);
 
     qapi_free_BlockdevCreateOptions(s->opts);
-    job_defer_to_main_loop(&s->common, blockdev_create_complete, NULL);
 
-    return s->ret;
+    return ret;
 }
 
 static const JobDriver blockdev_create_job_driver = {
diff --git a/block/stream.c b/block/stream.c
index 26a775386b..67e1e72e23 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -54,20 +54,16 @@ static int coroutine_fn stream_populate(BlockBackend *blk,
     return blk_co_preadv(blk, offset, qiov.size, &qiov, BDRV_REQ_COPY_ON_READ);
 }
 
-typedef struct {
-    int ret;
-} StreamCompleteData;
-
-static void stream_complete(Job *job, void *opaque)
+static void stream_exit(Job *job)
 {
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
     BlockJob *bjob = &s->common;
-    StreamCompleteData *data = opaque;
     BlockDriverState *bs = blk_bs(bjob->blk);
     BlockDriverState *base = s->base;
     Error *local_err = NULL;
+    int ret = job->ret;
 
-    if (!job_is_cancelled(job) && bs->backing && data->ret == 0) {
+    if (!job_is_cancelled(job) && bs->backing && ret == 0) {
         const char *base_id = NULL, *base_fmt = NULL;
         if (base) {
             base_id = s->backing_file_str;
@@ -75,11 +71,11 @@ static void stream_complete(Job *job, void *opaque)
                 base_fmt = base->drv->format_name;
             }
         }
-        data->ret = bdrv_change_backing_file(bs, base_id, base_fmt);
+        ret = bdrv_change_backing_file(bs, base_id, base_fmt);
         bdrv_set_backing_hd(bs, base, &local_err);
         if (local_err) {
             error_report_err(local_err);
-            data->ret = -EPERM;
+            ret = -EPERM;
             goto out;
         }
     }
@@ -93,14 +89,12 @@ out:
     }
 
     g_free(s->backing_file_str);
-    job_completed(job, data->ret);
-    g_free(data);
+    job->ret = ret;
 }
 
 static int coroutine_fn stream_run(Job *job, Error **errp)
 {
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
-    StreamCompleteData *data;
     BlockBackend *blk = s->common.blk;
     BlockDriverState *bs = blk_bs(blk);
     BlockDriverState *base = s->base;
@@ -203,9 +197,6 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 
 out:
     /* Modify backing chain and close BDSes in main loop */
-    data = g_malloc(sizeof(*data));
-    data->ret = ret;
-    job_defer_to_main_loop(&s->common.job, stream_complete, data);
     return ret;
 }
 
@@ -215,6 +206,7 @@ static const BlockJobDriver stream_job_driver = {
         .job_type      = JOB_TYPE_STREAM,
         .free          = block_job_free,
         .run           = stream_run,
+        .exit          = stream_exit,
         .user_resume   = block_job_user_resume,
         .drain         = block_job_drain,
     },
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 00604dfc0c..9bcb3c72af 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -752,11 +752,6 @@ typedef struct TestBlockJob {
     bool should_complete;
 } TestBlockJob;
 
-static void test_job_completed(Job *job, void *opaque)
-{
-    job_completed(job, 0);
-}
-
 static int coroutine_fn test_job_run(Job *job, Error **errp)
 {
     TestBlockJob *s = container_of(job, TestBlockJob, common.job);
@@ -770,7 +765,6 @@ static int coroutine_fn test_job_run(Job *job, Error **errp)
         job_pause_point(&s->common.job);
     }
 
-    job_defer_to_main_loop(&s->common.job, test_job_completed, NULL);
     return 0;
 }
 
diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
index 82cedee78b..ef29f35e44 100644
--- a/tests/test-blockjob-txn.c
+++ b/tests/test-blockjob-txn.c
@@ -24,17 +24,11 @@ typedef struct {
     int *result;
 } TestBlockJob;
 
-static void test_block_job_complete(Job *job, void *opaque)
+static void test_block_job_exit(Job *job)
 {
     BlockJob *bjob = container_of(job, BlockJob, job);
     BlockDriverState *bs = blk_bs(bjob->blk);
-    int rc = (intptr_t)opaque;
 
-    if (job_is_cancelled(job)) {
-        rc = -ECANCELED;
-    }
-
-    job_completed(job, rc);
     bdrv_unref(bs);
 }
 
@@ -54,8 +48,6 @@ static int coroutine_fn test_block_job_run(Job *job, Error **errp)
         }
     }
 
-    job_defer_to_main_loop(job, test_block_job_complete,
-                           (void *)(intptr_t)s->rc);
     return s->rc;
 }
 
@@ -81,6 +73,7 @@ static const BlockJobDriver test_block_job_driver = {
         .user_resume   = block_job_user_resume,
         .drain         = block_job_drain,
         .run           = test_block_job_run,
+        .exit          = test_block_job_exit,
     },
 };
 
diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
index 408a226939..ad4a65bc78 100644
--- a/tests/test-blockjob.c
+++ b/tests/test-blockjob.c
@@ -163,11 +163,10 @@ typedef struct CancelJob {
     bool completed;
 } CancelJob;
 
-static void cancel_job_completed(Job *job, void *opaque)
+static void cancel_job_exit(Job *job)
 {
-    CancelJob *s = opaque;
+    CancelJob *s = container_of(job, CancelJob, common.job);
     s->completed = true;
-    job_completed(job, 0);
 }
 
 static void cancel_job_complete(Job *job, Error **errp)
@@ -182,7 +181,7 @@ static int coroutine_fn cancel_job_run(Job *job, Error **errp)
 
     while (!s->should_complete) {
         if (job_is_cancelled(&s->common.job)) {
-            goto defer;
+            return 0;
         }
 
         if (!job_is_ready(&s->common.job) && s->should_converge) {
@@ -192,8 +191,6 @@ static int coroutine_fn cancel_job_run(Job *job, Error **errp)
         job_sleep_ns(&s->common.job, 100000);
     }
 
- defer:
-    job_defer_to_main_loop(&s->common.job, cancel_job_completed, s);
     return 0;
 }
 
@@ -204,6 +201,7 @@ static const BlockJobDriver test_cancel_driver = {
         .user_resume   = block_job_user_resume,
         .drain         = block_job_drain,
         .run           = cancel_job_run,
+        .exit          = cancel_job_exit,
         .complete      = cancel_job_complete,
     },
 };
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH v2 7/9] block/backup: make function variables consistently named
  2018-08-23 22:08 [Qemu-devel] [PATCH v2 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
                   ` (5 preceding siblings ...)
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 6/9] jobs: " John Snow
@ 2018-08-23 22:08 ` John Snow
  2018-08-27 10:41   ` Max Reitz
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 8/9] jobs: remove ret argument to job_completed; privatize it John Snow
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 9/9] jobs: remove job_defer_to_main_loop John Snow
  8 siblings, 1 reply; 23+ messages in thread
From: John Snow @ 2018-08-23 22:08 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody, Max Reitz, John Snow

Rename opaque_job to job to be consistent with other job implementations.
Rename 'job', the BackupBlockJob object, to 's' to also be consistent.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c | 62 +++++++++++++++++++++++++++++-----------------------------
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index a67b7fa96b..4d084f6ca6 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -468,59 +468,59 @@ static void backup_incremental_init_copy_bitmap(BackupBlockJob *job)
     bdrv_dirty_iter_free(dbi);
 }
 
-static int coroutine_fn backup_run(Job *opaque_job, Error **errp)
+static int coroutine_fn backup_run(Job *job, Error **errp)
 {
-    BackupBlockJob *job = container_of(opaque_job, BackupBlockJob, common.job);
-    BlockDriverState *bs = blk_bs(job->common.blk);
+    BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
+    BlockDriverState *bs = blk_bs(s->common.blk);
     int64_t offset, nb_clusters;
     int ret = 0;
 
-    QLIST_INIT(&job->inflight_reqs);
-    qemu_co_rwlock_init(&job->flush_rwlock);
+    QLIST_INIT(&s->inflight_reqs);
+    qemu_co_rwlock_init(&s->flush_rwlock);
 
-    nb_clusters = DIV_ROUND_UP(job->len, job->cluster_size);
-    job_progress_set_remaining(&job->common.job, job->len);
+    nb_clusters = DIV_ROUND_UP(s->len, s->cluster_size);
+    job_progress_set_remaining(job, s->len);
 
-    job->copy_bitmap = hbitmap_alloc(nb_clusters, 0);
-    if (job->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
-        backup_incremental_init_copy_bitmap(job);
+    s->copy_bitmap = hbitmap_alloc(nb_clusters, 0);
+    if (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
+        backup_incremental_init_copy_bitmap(s);
     } else {
-        hbitmap_set(job->copy_bitmap, 0, nb_clusters);
+        hbitmap_set(s->copy_bitmap, 0, nb_clusters);
     }
 
 
-    job->before_write.notify = backup_before_write_notify;
-    bdrv_add_before_write_notifier(bs, &job->before_write);
+    s->before_write.notify = backup_before_write_notify;
+    bdrv_add_before_write_notifier(bs, &s->before_write);
 
-    if (job->sync_mode == MIRROR_SYNC_MODE_NONE) {
+    if (s->sync_mode == MIRROR_SYNC_MODE_NONE) {
         /* All bits are set in copy_bitmap to allow any cluster to be copied.
          * This does not actually require them to be copied. */
-        while (!job_is_cancelled(&job->common.job)) {
+        while (!job_is_cancelled(job)) {
             /* Yield until the job is cancelled.  We just let our before_write
              * notify callback service CoW requests. */
-            job_yield(&job->common.job);
+            job_yield(job);
         }
-    } else if (job->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
-        ret = backup_run_incremental(job);
+    } else if (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
+        ret = backup_run_incremental(s);
     } else {
         /* Both FULL and TOP SYNC_MODE's require copying.. */
-        for (offset = 0; offset < job->len;
-             offset += job->cluster_size) {
+        for (offset = 0; offset < s->len;
+             offset += s->cluster_size) {
             bool error_is_read;
             int alloced = 0;
 
-            if (yield_and_check(job)) {
+            if (yield_and_check(s)) {
                 break;
             }
 
-            if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
+            if (s->sync_mode == MIRROR_SYNC_MODE_TOP) {
                 int i;
                 int64_t n;
 
                 /* Check to see if these blocks are already in the
                  * backing file. */
 
-                for (i = 0; i < job->cluster_size;) {
+                for (i = 0; i < s->cluster_size;) {
                     /* bdrv_is_allocated() only returns true/false based
                      * on the first set of sectors it comes across that
                      * are are all in the same state.
@@ -529,7 +529,7 @@ static int coroutine_fn backup_run(Job *opaque_job, Error **errp)
                      * needed but at some point that is always the case. */
                     alloced =
                         bdrv_is_allocated(bs, offset + i,
-                                          job->cluster_size - i, &n);
+                                          s->cluster_size - i, &n);
                     i += n;
 
                     if (alloced || n == 0) {
@@ -547,29 +547,29 @@ static int coroutine_fn backup_run(Job *opaque_job, Error **errp)
             if (alloced < 0) {
                 ret = alloced;
             } else {
-                ret = backup_do_cow(job, offset, job->cluster_size,
+                ret = backup_do_cow(s, offset, s->cluster_size,
                                     &error_is_read, false);
             }
             if (ret < 0) {
                 /* Depending on error action, fail now or retry cluster */
                 BlockErrorAction action =
-                    backup_error_action(job, error_is_read, -ret);
+                    backup_error_action(s, error_is_read, -ret);
                 if (action == BLOCK_ERROR_ACTION_REPORT) {
                     break;
                 } else {
-                    offset -= job->cluster_size;
+                    offset -= s->cluster_size;
                     continue;
                 }
             }
         }
     }
 
-    notifier_with_return_remove(&job->before_write);
+    notifier_with_return_remove(&s->before_write);
 
     /* wait until pending backup_do_cow() calls have completed */
-    qemu_co_rwlock_wrlock(&job->flush_rwlock);
-    qemu_co_rwlock_unlock(&job->flush_rwlock);
-    hbitmap_free(job->copy_bitmap);
+    qemu_co_rwlock_wrlock(&s->flush_rwlock);
+    qemu_co_rwlock_unlock(&s->flush_rwlock);
+    hbitmap_free(s->copy_bitmap);
 
     return ret;
 }
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH v2 8/9] jobs: remove ret argument to job_completed; privatize it
  2018-08-23 22:08 [Qemu-devel] [PATCH v2 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
                   ` (6 preceding siblings ...)
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 7/9] block/backup: make function variables consistently named John Snow
@ 2018-08-23 22:08 ` John Snow
  2018-08-27 10:52   ` Max Reitz
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 9/9] jobs: remove job_defer_to_main_loop John Snow
  8 siblings, 1 reply; 23+ messages in thread
From: John Snow @ 2018-08-23 22:08 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody, Max Reitz, John Snow

Jobs are now expected to return their retcode on the stack, from the
.run callback, so we can remove that argument.

job_cancel does not need to set -ECANCELED because job_completed will
update the return code itself if the job was canceled.

While we're here, make job_completed static to job.c and remove it from
job.h; move the documentation of return code to the .run() callback and
to the job->ret property, accordingly.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 include/qemu/job.h | 24 +++++++++++-------------
 job.c              | 11 ++++++-----
 trace-events       |  2 +-
 3 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index c67f6a647e..2990f28edc 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -124,7 +124,7 @@ typedef struct Job {
     /** Estimated progress_current value at the completion of the job */
     int64_t progress_total;
 
-    /** ret code passed to job_completed. */
+    /** Return code from @run callback; 0 on success and -errno on failure. */
     int ret;
 
     /** Error object for a failed job **/
@@ -168,7 +168,16 @@ struct JobDriver {
     /** Enum describing the operation */
     JobType job_type;
 
-    /** Mandatory: Entrypoint for the Coroutine. */
+    /**
+     * Mandatory: Entrypoint for the Coroutine.
+     *
+     * This callback will be invoked when moving from CREATED to RUNNING.
+     *
+     * If this callback returns nonzero, the job transaction it is part of is
+     * aborted. If it returns zero, the job moves into the WAITING state. If it
+     * is the last job to complete in its transaction, all jobs in the
+     * transaction move from WAITING to PENDING.
+     */
     int coroutine_fn (*run)(Job *job, Error **errp);
 
     /**
@@ -492,17 +501,6 @@ void job_early_fail(Job *job);
 /** Moves the @job from RUNNING to READY */
 void job_transition_to_ready(Job *job);
 
-/**
- * @job: The job being completed.
- * @ret: The status code.
- *
- * Marks @job as completed. If @ret is non-zero, the job transaction it is part
- * of is aborted. If @ret is zero, the job moves into the WAITING state. If it
- * is the last job to complete in its transaction, all jobs in the transaction
- * move from WAITING to PENDING.
- */
-void job_completed(Job *job, int ret);
-
 /** Asynchronously complete the specified @job. */
 void job_complete(Job *job, Error **errp);
 
diff --git a/job.c b/job.c
index bc8dad4e71..213042b762 100644
--- a/job.c
+++ b/job.c
@@ -535,6 +535,8 @@ void job_drain(Job *job)
     }
 }
 
+static void job_completed(Job *job);
+
 static void job_exit(void *opaque)
 {
     Job *job = (Job *)opaque;
@@ -545,7 +547,7 @@ static void job_exit(void *opaque)
         job->driver->exit(job);
         aio_context_release(aio_context);
     }
-    job_completed(job, job->ret);
+    job_completed(job);
 }
 
 /**
@@ -883,13 +885,12 @@ static void job_completed_txn_success(Job *job)
     }
 }
 
-void job_completed(Job *job, int ret)
+static void job_completed(Job *job)
 {
     assert(job && job->txn && !job_is_completed(job));
 
-    job->ret = ret;
     job_update_rc(job);
-    trace_job_completed(job, ret, job->ret);
+    trace_job_completed(job, job->ret);
     if (job->ret) {
         job_completed_txn_abort(job);
     } else {
@@ -905,7 +906,7 @@ void job_cancel(Job *job, bool force)
     }
     job_cancel_async(job, force);
     if (!job_started(job)) {
-        job_completed(job, -ECANCELED);
+        job_completed(job);
     } else if (job->deferred_to_main_loop) {
         job_completed_txn_abort(job);
     } else {
diff --git a/trace-events b/trace-events
index c445f54773..4fd2cb4b97 100644
--- a/trace-events
+++ b/trace-events
@@ -107,7 +107,7 @@ gdbstub_err_checksum_incorrect(uint8_t expected, uint8_t got) "got command packe
 # job.c
 job_state_transition(void *job,  int ret, const char *legal, const char *s0, const char *s1) "job %p (ret: %d) attempting %s transition (%s-->%s)"
 job_apply_verb(void *job, const char *state, const char *verb, const char *legal) "job %p in state %s; applying verb %s (%s)"
-job_completed(void *job, int ret, int jret) "job %p ret %d corrected ret %d"
+job_completed(void *job, int ret) "job %p ret %d"
 
 # job-qmp.c
 qmp_job_cancel(void *job) "job %p"
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH v2 9/9] jobs: remove job_defer_to_main_loop
  2018-08-23 22:08 [Qemu-devel] [PATCH v2 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
                   ` (7 preceding siblings ...)
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 8/9] jobs: remove ret argument to job_completed; privatize it John Snow
@ 2018-08-23 22:08 ` John Snow
  2018-08-27 10:56   ` Max Reitz
  8 siblings, 1 reply; 23+ messages in thread
From: John Snow @ 2018-08-23 22:08 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody, Max Reitz, John Snow

Now that the job infrastructure is handling the job_completed call for
all implemented jobs, we can remove the interface that allowed jobs to
schedule their own completion.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 include/qemu/job.h | 17 -----------------
 job.c              | 40 ++--------------------------------------
 2 files changed, 2 insertions(+), 55 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 2990f28edc..858cc2b37a 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -560,23 +560,6 @@ void job_finalize(Job *job, Error **errp);
  */
 void job_dismiss(Job **job, Error **errp);
 
-typedef void JobDeferToMainLoopFn(Job *job, void *opaque);
-
-/**
- * @job: The job
- * @fn: The function to run in the main loop
- * @opaque: The opaque value that is passed to @fn
- *
- * This function must be called by the main job coroutine just before it
- * returns.  @fn is executed in the main loop with the job AioContext acquired.
- *
- * Block jobs must call bdrv_unref(), bdrv_close(), and anything that uses
- * bdrv_drain_all() in the main loop.
- *
- * The @job AioContext is held while @fn executes.
- */
-void job_defer_to_main_loop(Job *job, JobDeferToMainLoopFn *fn, void *opaque);
-
 /**
  * Synchronously finishes the given @job. If @finish is given, it is called to
  * trigger completion or cancellation of the job.
diff --git a/job.c b/job.c
index 213042b762..01dd97fee3 100644
--- a/job.c
+++ b/job.c
@@ -561,12 +561,8 @@ static void coroutine_fn job_co_entry(void *opaque)
     assert(job && job->driver && job->driver->run);
     job_pause_point(job);
     job->ret = job->driver->run(job, &job->err);
-    if (!job->deferred_to_main_loop) {
-        job->deferred_to_main_loop = true;
-        aio_bh_schedule_oneshot(qemu_get_aio_context(),
-                                job_exit,
-                                job);
-    }
+    job->deferred_to_main_loop = true;
+    aio_bh_schedule_oneshot(qemu_get_aio_context(), job_exit, job);
 }
 
 
@@ -969,38 +965,6 @@ void job_complete(Job *job, Error **errp)
     job->driver->complete(job, errp);
 }
 
-
-typedef struct {
-    Job *job;
-    JobDeferToMainLoopFn *fn;
-    void *opaque;
-} JobDeferToMainLoopData;
-
-static void job_defer_to_main_loop_bh(void *opaque)
-{
-    JobDeferToMainLoopData *data = opaque;
-    Job *job = data->job;
-    AioContext *aio_context = job->aio_context;
-
-    aio_context_acquire(aio_context);
-    data->fn(data->job, data->opaque);
-    aio_context_release(aio_context);
-
-    g_free(data);
-}
-
-void job_defer_to_main_loop(Job *job, JobDeferToMainLoopFn *fn, void *opaque)
-{
-    JobDeferToMainLoopData *data = g_malloc(sizeof(*data));
-    data->job = job;
-    data->fn = fn;
-    data->opaque = opaque;
-    job->deferred_to_main_loop = true;
-
-    aio_bh_schedule_oneshot(qemu_get_aio_context(),
-                            job_defer_to_main_loop_bh, data);
-}
-
 int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
 {
     Error *local_err = NULL;
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/9] jobs: change start callback to run callback
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 1/9] jobs: change start callback to run callback John Snow
@ 2018-08-27  9:30   ` Max Reitz
  2018-08-30  0:06     ` John Snow
  0 siblings, 1 reply; 23+ messages in thread
From: Max Reitz @ 2018-08-27  9:30 UTC (permalink / raw)
  To: John Snow, qemu-block, qemu-devel; +Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 1438 bytes --]

On 2018-08-24 00:08, John Snow wrote:
> Presently we codify the entry point for a job as the "start" callback,
> but a more apt name would be "run" to clarify the idea that when this
> function returns we consider the job to have "finished," except for
> any cleanup which occurs in separate callbacks later.
> 
> As part of this clarification, change the signature to include an error
> object and a return code. The error ptr is not yet used, and the return
> code while captured, will be overwritten by actions in the job_completed
> function.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  block/backup.c            |  7 ++++---
>  block/commit.c            |  7 ++++---
>  block/create.c            |  8 +++++---
>  block/mirror.c            | 10 ++++++----
>  block/stream.c            |  7 ++++---
>  include/qemu/job.h        |  2 +-
>  job.c                     |  6 +++---
>  tests/test-bdrv-drain.c   |  7 ++++---
>  tests/test-blockjob-txn.c | 16 ++++++++--------
>  tests/test-blockjob.c     |  7 ++++---
>  10 files changed, 43 insertions(+), 34 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>

But I see a discrepancy in the upcoming s->ret <=> s->err relationship
now.  And that is if .run() doesn't return an Error *...

That could be remedied immediately in job_co_entry(), though, either by
calling job_update_rc(), or by inlining its "if (!job->err)" part.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/9] jobs: canonize Error object
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 2/9] jobs: canonize Error object John Snow
@ 2018-08-27  9:41   ` Max Reitz
  2018-08-27 10:43     ` Max Reitz
  0 siblings, 1 reply; 23+ messages in thread
From: Max Reitz @ 2018-08-27  9:41 UTC (permalink / raw)
  To: John Snow, qemu-block, qemu-devel; +Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 2326 bytes --]

On 2018-08-24 00:08, John Snow wrote:
> Jobs presently use both an Error object in the case of the create job,
> and char strings in the case of generic errors elsewhere.
> 
> Unify the two paths as just j->err, and remove the extra argument from
> job_completed. The integer error code for job_completed is kept for now,
> to be removed shortly in a separate patch.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  block/backup.c            |  2 +-
>  block/commit.c            |  2 +-
>  block/create.c            |  5 ++---
>  block/mirror.c            |  2 +-
>  block/stream.c            |  2 +-
>  include/qemu/job.h        | 10 ++++------
>  job-qmp.c                 |  5 +++--
>  job.c                     | 18 ++++++------------
>  tests/test-bdrv-drain.c   |  2 +-
>  tests/test-blockjob-txn.c |  2 +-
>  tests/test-blockjob.c     |  2 +-
>  11 files changed, 22 insertions(+), 30 deletions(-)

[...]

> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index 9cf463d228..5c92c53ef0 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -124,12 +124,12 @@ typedef struct Job {
>      /** Estimated progress_current value at the completion of the job */
>      int64_t progress_total;
>  
> -    /** Error string for a failed job (NULL if, and only if, job->ret == 0) */
> -    char *error;
> -
>      /** ret code passed to job_completed. */
>      int ret;
>  
> +    /** Error object for a failed job **/
> +    Error *err;
> +

My question remains why you don't keep the iff condition here...

>      /** The completion function that will be called when the job completes.  */
>      BlockCompletionFunc *cb;
>  

[...]

> diff --git a/job.c b/job.c
> index 76988f6678..bc1d970df4 100644
> --- a/job.c
> +++ b/job.c

[...]

> @@ -546,7 +546,7 @@ static void coroutine_fn job_co_entry(void *opaque)
>  
>      assert(job && job->driver && job->driver->run);
>      job_pause_point(job);
> -    job->ret = job->driver->run(job, NULL);
> +    job->ret = job->driver->run(job, &job->err);

...by e.g. calling job_update_rc() here.

(Which seems reasonable since this did update the return code.)

Rest looks good, although I'm missing a "jobs: remove @ret from
job_completed" patch in one of the two series...

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 484 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/9] jobs: add exit shim
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 3/9] jobs: add exit shim John Snow
@ 2018-08-27 10:00   ` Max Reitz
  0 siblings, 0 replies; 23+ messages in thread
From: Max Reitz @ 2018-08-27 10:00 UTC (permalink / raw)
  To: John Snow, qemu-block, qemu-devel; +Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 1224 bytes --]

On 2018-08-24 00:08, John Snow wrote:
> All jobs do the same thing when they leave their running loop:
> - Store the return code in a structure
> - wait to receive this structure in the main thread
> - signal job completion via job_completed
> 
> Few jobs do anything beyond exactly this. Consolidate this exit
> logic for a net reduction in SLOC.
> 
> More seriously, when we utilize job_defer_to_main_loop_bh to call
> a function that calls job_completed, job_finalize_single will run
> in a context where it has recursively taken the aio_context lock,
> which can cause hangs if it puts down a reference that causes a flush.
> 
> You can observe this in practice by looking at mirror_exit's careful
> placement of job_completed and bdrv_unref calls.
> 
> If we centralize job exiting, we can signal job completion from outside
> of the aio_context, which should allow for job cleanup code to run with
> only one lock, which makes cleanup callbacks less tricky to write.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  include/qemu/job.h | 11 +++++++++++
>  job.c              | 18 ++++++++++++++++++
>  2 files changed, 29 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/9] block/commit: utilize job_exit shim
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 4/9] block/commit: utilize job_exit shim John Snow
@ 2018-08-27 10:28   ` Max Reitz
  0 siblings, 0 replies; 23+ messages in thread
From: Max Reitz @ 2018-08-27 10:28 UTC (permalink / raw)
  To: John Snow, qemu-block, qemu-devel; +Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 756 bytes --]

On 2018-08-24 00:08, John Snow wrote:
> Change the manual deferment to commit_complete into the implicit
> callback to job_exit, renaming commit_complete to commit_exit.
> 
> This conversion does change the timing of when job_completed is
> called to after the bdrv_replace_node and bdrv_unref calls, which
> could have implications for bjob->blk which will now be put down
> after this cleanup.
> 
> Kevin highlights that we did not take any permissions for that backend
> at job creation time, so it is safe to reorder these operations.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  block/commit.c | 22 +++++-----------------
>  1 file changed, 5 insertions(+), 17 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/9] block/mirror: utilize job_exit shim
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 5/9] block/mirror: " John Snow
@ 2018-08-27 10:30   ` Max Reitz
  0 siblings, 0 replies; 23+ messages in thread
From: Max Reitz @ 2018-08-27 10:30 UTC (permalink / raw)
  To: John Snow, qemu-block, qemu-devel; +Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 652 bytes --]

On 2018-08-24 00:08, John Snow wrote:
> Change the manual deferment to mirror_exit into the implicit
> callback to job_exit and the mirror_exit callback.
> 
> This does change the order of some bdrv_unref calls and job_completed,
> but thanks to the new context in which we call .exit, this is safe to
> defer the possible flushing of any nodes to the job_finalize_single
> cleanup stage.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  block/mirror.c | 25 +++++++++----------------
>  1 file changed, 9 insertions(+), 16 deletions(-)

Looks good, but the comment about why @src is bdrv_ref()'ed needs to be
updated.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v2 6/9] jobs: utilize job_exit shim
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 6/9] jobs: " John Snow
@ 2018-08-27 10:37   ` Max Reitz
  0 siblings, 0 replies; 23+ messages in thread
From: Max Reitz @ 2018-08-27 10:37 UTC (permalink / raw)
  To: John Snow, qemu-block, qemu-devel; +Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 1118 bytes --]

On 2018-08-24 00:08, John Snow wrote:
> Utilize the job_exit shim by not calling job_defer_to_main_loop, and
> where applicable, converting the deferred callback into the job_exit
> callback.
> 
> This converts backup, stream, create, and the unit tests all at once.
> Most of these jobs do not see any changes to the order in which they
> clean up their resources, except the test-blockjob-txn test, which
> now puts down its bs before job_completed is called.
> 
> This is safe for the same reason the reordering in the mirror job is
> safe, because job_completed no longer runs under two locks, making
> the unref safe even if it causes a flush.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  block/backup.c            | 16 ----------------
>  block/create.c            | 14 +++-----------
>  block/stream.c            | 22 +++++++---------------
>  tests/test-bdrv-drain.c   |  6 ------
>  tests/test-blockjob-txn.c | 11 ++---------
>  tests/test-blockjob.c     | 10 ++++------
>  6 files changed, 16 insertions(+), 63 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/9] block/backup: make function variables consistently named
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 7/9] block/backup: make function variables consistently named John Snow
@ 2018-08-27 10:41   ` Max Reitz
  0 siblings, 0 replies; 23+ messages in thread
From: Max Reitz @ 2018-08-27 10:41 UTC (permalink / raw)
  To: John Snow, qemu-block, qemu-devel; +Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 481 bytes --]

On 2018-08-24 00:08, John Snow wrote:
> Rename opaque_job to job to be consistent with other job implementations.
> Rename 'job', the BackupBlockJob object, to 's' to also be consistent.
> 
> Suggested-by: Eric Blake <eblake@redhat.com>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  block/backup.c | 62 +++++++++++++++++++++++++++++-----------------------------
>  1 file changed, 31 insertions(+), 31 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/9] jobs: canonize Error object
  2018-08-27  9:41   ` Max Reitz
@ 2018-08-27 10:43     ` Max Reitz
  0 siblings, 0 replies; 23+ messages in thread
From: Max Reitz @ 2018-08-27 10:43 UTC (permalink / raw)
  To: John Snow, qemu-block, qemu-devel; +Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 2477 bytes --]

On 2018-08-27 11:41, Max Reitz wrote:
> On 2018-08-24 00:08, John Snow wrote:
>> Jobs presently use both an Error object in the case of the create job,
>> and char strings in the case of generic errors elsewhere.
>>
>> Unify the two paths as just j->err, and remove the extra argument from
>> job_completed. The integer error code for job_completed is kept for now,
>> to be removed shortly in a separate patch.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  block/backup.c            |  2 +-
>>  block/commit.c            |  2 +-
>>  block/create.c            |  5 ++---
>>  block/mirror.c            |  2 +-
>>  block/stream.c            |  2 +-
>>  include/qemu/job.h        | 10 ++++------
>>  job-qmp.c                 |  5 +++--
>>  job.c                     | 18 ++++++------------
>>  tests/test-bdrv-drain.c   |  2 +-
>>  tests/test-blockjob-txn.c |  2 +-
>>  tests/test-blockjob.c     |  2 +-
>>  11 files changed, 22 insertions(+), 30 deletions(-)
> 
> [...]
> 
>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>> index 9cf463d228..5c92c53ef0 100644
>> --- a/include/qemu/job.h
>> +++ b/include/qemu/job.h
>> @@ -124,12 +124,12 @@ typedef struct Job {
>>      /** Estimated progress_current value at the completion of the job */
>>      int64_t progress_total;
>>  
>> -    /** Error string for a failed job (NULL if, and only if, job->ret == 0) */
>> -    char *error;
>> -
>>      /** ret code passed to job_completed. */
>>      int ret;
>>  
>> +    /** Error object for a failed job **/
>> +    Error *err;
>> +
> 
> My question remains why you don't keep the iff condition here...
> 
>>      /** The completion function that will be called when the job completes.  */
>>      BlockCompletionFunc *cb;
>>  
> 
> [...]
> 
>> diff --git a/job.c b/job.c
>> index 76988f6678..bc1d970df4 100644
>> --- a/job.c
>> +++ b/job.c
> 
> [...]
> 
>> @@ -546,7 +546,7 @@ static void coroutine_fn job_co_entry(void *opaque)
>>  
>>      assert(job && job->driver && job->driver->run);
>>      job_pause_point(job);
>> -    job->ret = job->driver->run(job, NULL);
>> +    job->ret = job->driver->run(job, &job->err);
> 
> ...by e.g. calling job_update_rc() here.
> 
> (Which seems reasonable since this did update the return code.)
> 
> Rest looks good, although I'm missing a "jobs: remove @ret from
> job_completed" patch in one of the two series...

"Max can't read" confirmed


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v2 8/9] jobs: remove ret argument to job_completed; privatize it
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 8/9] jobs: remove ret argument to job_completed; privatize it John Snow
@ 2018-08-27 10:52   ` Max Reitz
  2018-08-27 18:43     ` John Snow
  0 siblings, 1 reply; 23+ messages in thread
From: Max Reitz @ 2018-08-27 10:52 UTC (permalink / raw)
  To: John Snow, qemu-block, qemu-devel; +Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 5551 bytes --]

On 2018-08-24 00:08, John Snow wrote:
> Jobs are now expected to return their retcode on the stack, from the
> .run callback, so we can remove that argument.
> 
> job_cancel does not need to set -ECANCELED because job_completed will
> update the return code itself if the job was canceled.
> 
> While we're here, make job_completed static to job.c and remove it from
> job.h; move the documentation of return code to the .run() callback and
> to the job->ret property, accordingly.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  include/qemu/job.h | 24 +++++++++++-------------
>  job.c              | 11 ++++++-----
>  trace-events       |  2 +-
>  3 files changed, 18 insertions(+), 19 deletions(-)

Er, yeah.  Sorry for not being able to read.  Again.

> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index c67f6a647e..2990f28edc 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -124,7 +124,7 @@ typedef struct Job {
>      /** Estimated progress_current value at the completion of the job */
>      int64_t progress_total;
>  
> -    /** ret code passed to job_completed. */
> +    /** Return code from @run callback; 0 on success and -errno on failure. */

Hm.  Not really, it's the general status of the whole job, isn't it?
Besides being the return value from .run(), it's also set by .exit() (so
it's presumably going to be the return value from .prepare() after part
2) and by job_update_rc() when the job has been cancelled.

>      int ret;
>  
>      /** Error object for a failed job **/
> @@ -168,7 +168,16 @@ struct JobDriver {
>      /** Enum describing the operation */
>      JobType job_type;
>  
> -    /** Mandatory: Entrypoint for the Coroutine. */
> +    /**
> +     * Mandatory: Entrypoint for the Coroutine.
> +     *
> +     * This callback will be invoked when moving from CREATED to RUNNING.
> +     *
> +     * If this callback returns nonzero, the job transaction it is part of is
> +     * aborted. If it returns zero, the job moves into the WAITING state. If it
> +     * is the last job to complete in its transaction, all jobs in the
> +     * transaction move from WAITING to PENDING.
> +     */

Moving this description from job_completed() seems to imply we do want
to call job_update_rc() right after invoking .run().

>      int coroutine_fn (*run)(Job *job, Error **errp);
>  
>      /**
> @@ -492,17 +501,6 @@ void job_early_fail(Job *job);
>  /** Moves the @job from RUNNING to READY */
>  void job_transition_to_ready(Job *job);
>  
> -/**
> - * @job: The job being completed.
> - * @ret: The status code.
> - *
> - * Marks @job as completed. If @ret is non-zero, the job transaction it is part
> - * of is aborted. If @ret is zero, the job moves into the WAITING state. If it
> - * is the last job to complete in its transaction, all jobs in the transaction
> - * move from WAITING to PENDING.
> - */
> -void job_completed(Job *job, int ret);
> -
>  /** Asynchronously complete the specified @job. */
>  void job_complete(Job *job, Error **errp);
>  
> diff --git a/job.c b/job.c
> index bc8dad4e71..213042b762 100644
> --- a/job.c
> +++ b/job.c
> @@ -535,6 +535,8 @@ void job_drain(Job *job)
>      }
>  }
>  
> +static void job_completed(Job *job);
> +
>  static void job_exit(void *opaque)
>  {
>      Job *job = (Job *)opaque;
> @@ -545,7 +547,7 @@ static void job_exit(void *opaque)
>          job->driver->exit(job);
>          aio_context_release(aio_context);
>      }
> -    job_completed(job, job->ret);
> +    job_completed(job);
>  }
>  
>  /**
> @@ -883,13 +885,12 @@ static void job_completed_txn_success(Job *job)
>      }
>  }
>  
> -void job_completed(Job *job, int ret)
> +static void job_completed(Job *job)
>  {
>      assert(job && job->txn && !job_is_completed(job));
>  
> -    job->ret = ret;
>      job_update_rc(job);

I think we want to remove the job_update_rc() from here.  It should be
called after job->ret is updated, i.e. immediately after .run() and
.exit() have been invoked.  (Or presumably .prepare() in part 2.)
Oh, and in job_cancel() before it invokes job_completed()?

But then again, maybe it would be easiest to keep it here...  It just
doesn't feel quite right to me.

Max

> -    trace_job_completed(job, ret, job->ret);
> +    trace_job_completed(job, job->ret);
>      if (job->ret) {
>          job_completed_txn_abort(job);
>      } else {
> @@ -905,7 +906,7 @@ void job_cancel(Job *job, bool force)
>      }
>      job_cancel_async(job, force);
>      if (!job_started(job)) {
> -        job_completed(job, -ECANCELED);
> +        job_completed(job);
>      } else if (job->deferred_to_main_loop) {
>          job_completed_txn_abort(job);
>      } else {
> diff --git a/trace-events b/trace-events
> index c445f54773..4fd2cb4b97 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -107,7 +107,7 @@ gdbstub_err_checksum_incorrect(uint8_t expected, uint8_t got) "got command packe
>  # job.c
>  job_state_transition(void *job,  int ret, const char *legal, const char *s0, const char *s1) "job %p (ret: %d) attempting %s transition (%s-->%s)"
>  job_apply_verb(void *job, const char *state, const char *verb, const char *legal) "job %p in state %s; applying verb %s (%s)"
> -job_completed(void *job, int ret, int jret) "job %p ret %d corrected ret %d"
> +job_completed(void *job, int ret) "job %p ret %d"
>  
>  # job-qmp.c
>  qmp_job_cancel(void *job) "job %p"
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v2 9/9] jobs: remove job_defer_to_main_loop
  2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 9/9] jobs: remove job_defer_to_main_loop John Snow
@ 2018-08-27 10:56   ` Max Reitz
  0 siblings, 0 replies; 23+ messages in thread
From: Max Reitz @ 2018-08-27 10:56 UTC (permalink / raw)
  To: John Snow, qemu-block, qemu-devel; +Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 496 bytes --]

On 2018-08-24 00:08, John Snow wrote:
> Now that the job infrastructure is handling the job_completed call for
> all implemented jobs, we can remove the interface that allowed jobs to
> schedule their own completion.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  include/qemu/job.h | 17 -----------------
>  job.c              | 40 ++--------------------------------------
>  2 files changed, 2 insertions(+), 55 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v2 8/9] jobs: remove ret argument to job_completed; privatize it
  2018-08-27 10:52   ` Max Reitz
@ 2018-08-27 18:43     ` John Snow
  0 siblings, 0 replies; 23+ messages in thread
From: John Snow @ 2018-08-27 18:43 UTC (permalink / raw)
  To: Max Reitz, qemu-block, qemu-devel; +Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody



On 08/27/2018 06:52 AM, Max Reitz wrote:
> On 2018-08-24 00:08, John Snow wrote:
>> Jobs are now expected to return their retcode on the stack, from the
>> .run callback, so we can remove that argument.
>>
>> job_cancel does not need to set -ECANCELED because job_completed will
>> update the return code itself if the job was canceled.
>>
>> While we're here, make job_completed static to job.c and remove it from
>> job.h; move the documentation of return code to the .run() callback and
>> to the job->ret property, accordingly.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  include/qemu/job.h | 24 +++++++++++-------------
>>  job.c              | 11 ++++++-----
>>  trace-events       |  2 +-
>>  3 files changed, 18 insertions(+), 19 deletions(-)
> 
> Er, yeah.  Sorry for not being able to read.  Again.
> 
>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>> index c67f6a647e..2990f28edc 100644
>> --- a/include/qemu/job.h
>> +++ b/include/qemu/job.h
>> @@ -124,7 +124,7 @@ typedef struct Job {
>>      /** Estimated progress_current value at the completion of the job */
>>      int64_t progress_total;
>>  
>> -    /** ret code passed to job_completed. */
>> +    /** Return code from @run callback; 0 on success and -errno on failure. */
> 
> Hm.  Not really, it's the general status of the whole job, isn't it?
> Besides being the return value from .run(), it's also set by .exit() (so
> it's presumably going to be the return value from .prepare() after part
> 2) and by job_update_rc() when the job has been cancelled.
> 

You're right. I was trying to emphasize where it gets set in the
normative case. I'll rephrase.

What I want to say is effectively: "This is the return code for the job,
which is what gets returned by the .run and/or .prepare callbacks, or
gets set to -ECANCELED if the job is canceled and the job itself
neglects to set a nonzero code."

>>      int ret;
>>  
>>      /** Error object for a failed job **/
>> @@ -168,7 +168,16 @@ struct JobDriver {
>>      /** Enum describing the operation */
>>      JobType job_type;
>>  
>> -    /** Mandatory: Entrypoint for the Coroutine. */
>> +    /**
>> +     * Mandatory: Entrypoint for the Coroutine.
>> +     *
>> +     * This callback will be invoked when moving from CREATED to RUNNING.
>> +     *
>> +     * If this callback returns nonzero, the job transaction it is part of is
>> +     * aborted. If it returns zero, the job moves into the WAITING state. If it
>> +     * is the last job to complete in its transaction, all jobs in the
>> +     * transaction move from WAITING to PENDING.
>> +     */
> 
> Moving this description from job_completed() seems to imply we do want
> to call job_update_rc() right after invoking .run().
> 

Sure, I'll take a look at that.

>>      int coroutine_fn (*run)(Job *job, Error **errp);
>>  
>>      /**
>> @@ -492,17 +501,6 @@ void job_early_fail(Job *job);
>>  /** Moves the @job from RUNNING to READY */
>>  void job_transition_to_ready(Job *job);
>>  
>> -/**
>> - * @job: The job being completed.
>> - * @ret: The status code.
>> - *
>> - * Marks @job as completed. If @ret is non-zero, the job transaction it is part
>> - * of is aborted. If @ret is zero, the job moves into the WAITING state. If it
>> - * is the last job to complete in its transaction, all jobs in the transaction
>> - * move from WAITING to PENDING.
>> - */
>> -void job_completed(Job *job, int ret);
>> -
>>  /** Asynchronously complete the specified @job. */
>>  void job_complete(Job *job, Error **errp);
>>  
>> diff --git a/job.c b/job.c
>> index bc8dad4e71..213042b762 100644
>> --- a/job.c
>> +++ b/job.c
>> @@ -535,6 +535,8 @@ void job_drain(Job *job)
>>      }
>>  }
>>  
>> +static void job_completed(Job *job);
>> +
>>  static void job_exit(void *opaque)
>>  {
>>      Job *job = (Job *)opaque;
>> @@ -545,7 +547,7 @@ static void job_exit(void *opaque)
>>          job->driver->exit(job);
>>          aio_context_release(aio_context);
>>      }
>> -    job_completed(job, job->ret);
>> +    job_completed(job);
>>  }
>>  
>>  /**
>> @@ -883,13 +885,12 @@ static void job_completed_txn_success(Job *job)
>>      }
>>  }
>>  
>> -void job_completed(Job *job, int ret)
>> +static void job_completed(Job *job)
>>  {
>>      assert(job && job->txn && !job_is_completed(job));
>>  
>> -    job->ret = ret;
>>      job_update_rc(job);
> 
> I think we want to remove the job_update_rc() from here.  It should be
> called after job->ret is updated, i.e. immediately after .run() and
> .exit() have been invoked.  (Or presumably .prepare() in part 2.)
> Oh, and in job_cancel() before it invokes job_completed()?
> 
> But then again, maybe it would be easiest to keep it here...  It just
> doesn't feel quite right to me.
> 
> Max
> 

It does feel slightly strange now. I'll see if I can find something that
feels better.

>> -    trace_job_completed(job, ret, job->ret);
>> +    trace_job_completed(job, job->ret);
>>      if (job->ret) {
>>          job_completed_txn_abort(job);
>>      } else {
>> @@ -905,7 +906,7 @@ void job_cancel(Job *job, bool force)
>>      }
>>      job_cancel_async(job, force);
>>      if (!job_started(job)) {
>> -        job_completed(job, -ECANCELED);
>> +        job_completed(job);
>>      } else if (job->deferred_to_main_loop) {
>>          job_completed_txn_abort(job);
>>      } else {
>> diff --git a/trace-events b/trace-events
>> index c445f54773..4fd2cb4b97 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -107,7 +107,7 @@ gdbstub_err_checksum_incorrect(uint8_t expected, uint8_t got) "got command packe
>>  # job.c
>>  job_state_transition(void *job,  int ret, const char *legal, const char *s0, const char *s1) "job %p (ret: %d) attempting %s transition (%s-->%s)"
>>  job_apply_verb(void *job, const char *state, const char *verb, const char *legal) "job %p in state %s; applying verb %s (%s)"
>> -job_completed(void *job, int ret, int jret) "job %p ret %d corrected ret %d"
>> +job_completed(void *job, int ret) "job %p ret %d"
>>  
>>  # job-qmp.c
>>  qmp_job_cancel(void *job) "job %p"
>>
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/9] jobs: change start callback to run callback
  2018-08-27  9:30   ` Max Reitz
@ 2018-08-30  0:06     ` John Snow
  2018-08-31  9:06       ` Max Reitz
  0 siblings, 1 reply; 23+ messages in thread
From: John Snow @ 2018-08-30  0:06 UTC (permalink / raw)
  To: Max Reitz, qemu-block, qemu-devel; +Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody



On 08/27/2018 05:30 AM, Max Reitz wrote:
> On 2018-08-24 00:08, John Snow wrote:
>> Presently we codify the entry point for a job as the "start" callback,
>> but a more apt name would be "run" to clarify the idea that when this
>> function returns we consider the job to have "finished," except for
>> any cleanup which occurs in separate callbacks later.
>>
>> As part of this clarification, change the signature to include an error
>> object and a return code. The error ptr is not yet used, and the return
>> code while captured, will be overwritten by actions in the job_completed
>> function.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  block/backup.c            |  7 ++++---
>>  block/commit.c            |  7 ++++---
>>  block/create.c            |  8 +++++---
>>  block/mirror.c            | 10 ++++++----
>>  block/stream.c            |  7 ++++---
>>  include/qemu/job.h        |  2 +-
>>  job.c                     |  6 +++---
>>  tests/test-bdrv-drain.c   |  7 ++++---
>>  tests/test-blockjob-txn.c | 16 ++++++++--------
>>  tests/test-blockjob.c     |  7 ++++---
>>  10 files changed, 43 insertions(+), 34 deletions(-)
> 
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 
> But I see a discrepancy in the upcoming s->ret <=> s->err relationship
> now.  And that is if .run() doesn't return an Error *...
> 
> That could be remedied immediately in job_co_entry(), though, either by
> calling job_update_rc(), or by inlining its "if (!job->err)" part.
> 
> Max
> 

Jobs currently exist in ... five-ish phases.

Phase 0: Not started. (Always UNDEFINED or CREATED.)
Phase 1: In the coroutine. (RUNNING, READY, STANDBY, PAUSED.)
Phase 2: Deferred to main, but job_completed not yet called. [Not
dignified with a formal status, but job->deferred_to_main_loop set.]
Phase 3: job_completed has been called. (ABORTING, WAITING, PENDING)
Phase 4: job_finalize_single has been called. (CONCLUDED, NULL)

Broadly, though, we separate these out into two main clusters:

(A): job_is_completed == FALSE; Phases 0, 1 and 2 above.
(B): job_is_completed == TRUE; Phases 3 and 4 above.

The ABORTING status as it exists now is a phase 3 status. It never gets
set before this call, so it is a reliable indicator of being in phase 3.

If I adjust the usage of job_update_rc like you asked in several
reviews, it changes it to being a status that can exist in either Phase
2 *or* 3, which complicates the code a bit as it requires an audit of
every caller to job_is_completed and replacing it with something more
appropriate. Worse, we have no way to identify phase 2 anymore without
adding a new status or a new boolean.

I think this is a change worth making, but I must beg to defer this
change for a later patchset for the time-being, and leave the
job_update_rc calls alone for the present patchset so I can focus on
more pressing matters.

It might be simplest to say that at CONCLUDED time, the "err iff ret"
relationship will be true but potentially not before then. I think this
is reasonable as the error code cannot be held to be final until, well,
the job has finished.

Thanks,
--js

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/9] jobs: change start callback to run callback
  2018-08-30  0:06     ` John Snow
@ 2018-08-31  9:06       ` Max Reitz
  0 siblings, 0 replies; 23+ messages in thread
From: Max Reitz @ 2018-08-31  9:06 UTC (permalink / raw)
  To: John Snow, qemu-block, qemu-devel; +Cc: kwolf, Stefan Hajnoczi, jtc, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 3650 bytes --]

On 2018-08-30 02:06, John Snow wrote:
> 
> 
> On 08/27/2018 05:30 AM, Max Reitz wrote:
>> On 2018-08-24 00:08, John Snow wrote:
>>> Presently we codify the entry point for a job as the "start" callback,
>>> but a more apt name would be "run" to clarify the idea that when this
>>> function returns we consider the job to have "finished," except for
>>> any cleanup which occurs in separate callbacks later.
>>>
>>> As part of this clarification, change the signature to include an error
>>> object and a return code. The error ptr is not yet used, and the return
>>> code while captured, will be overwritten by actions in the job_completed
>>> function.
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>> ---
>>>  block/backup.c            |  7 ++++---
>>>  block/commit.c            |  7 ++++---
>>>  block/create.c            |  8 +++++---
>>>  block/mirror.c            | 10 ++++++----
>>>  block/stream.c            |  7 ++++---
>>>  include/qemu/job.h        |  2 +-
>>>  job.c                     |  6 +++---
>>>  tests/test-bdrv-drain.c   |  7 ++++---
>>>  tests/test-blockjob-txn.c | 16 ++++++++--------
>>>  tests/test-blockjob.c     |  7 ++++---
>>>  10 files changed, 43 insertions(+), 34 deletions(-)
>>
>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>
>> But I see a discrepancy in the upcoming s->ret <=> s->err relationship
>> now.  And that is if .run() doesn't return an Error *...
>>
>> That could be remedied immediately in job_co_entry(), though, either by
>> calling job_update_rc(), or by inlining its "if (!job->err)" part.
>>
>> Max
>>
> 
> Jobs currently exist in ... five-ish phases.
> 
> Phase 0: Not started. (Always UNDEFINED or CREATED.)
> Phase 1: In the coroutine. (RUNNING, READY, STANDBY, PAUSED.)
> Phase 2: Deferred to main, but job_completed not yet called. [Not
> dignified with a formal status, but job->deferred_to_main_loop set.]
> Phase 3: job_completed has been called. (ABORTING, WAITING, PENDING)
> Phase 4: job_finalize_single has been called. (CONCLUDED, NULL)
> 
> Broadly, though, we separate these out into two main clusters:
> 
> (A): job_is_completed == FALSE; Phases 0, 1 and 2 above.
> (B): job_is_completed == TRUE; Phases 3 and 4 above.
> 
> The ABORTING status as it exists now is a phase 3 status. It never gets
> set before this call, so it is a reliable indicator of being in phase 3.
> 
> If I adjust the usage of job_update_rc like you asked in several
> reviews, it changes it to being a status that can exist in either Phase
> 2 *or* 3, which complicates the code a bit as it requires an audit of
> every caller to job_is_completed and replacing it with something more
> appropriate. Worse, we have no way to identify phase 2 anymore without
> adding a new status or a new boolean.
> 
> I think this is a change worth making, but I must beg to defer this
> change for a later patchset for the time-being, and leave the
> job_update_rc calls alone for the present patchset so I can focus on
> more pressing matters.
> 
> It might be simplest to say that at CONCLUDED time, the "err iff ret"
> relationship will be true but potentially not before then. I think this
> is reasonable as the error code cannot be held to be final until, well,
> the job has finished.

It's OK making the change at a later point.

Maybe it would make more sense to pull out the
"set @err if @ret && !@err" part from job_update_rc() into an own
function, and then call that every time @ret has been updated.  (And
call the rest of the function, which does a possible state transition,
only where that makes sense.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-08-31  9:07 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-23 22:08 [Qemu-devel] [PATCH v2 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 1/9] jobs: change start callback to run callback John Snow
2018-08-27  9:30   ` Max Reitz
2018-08-30  0:06     ` John Snow
2018-08-31  9:06       ` Max Reitz
2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 2/9] jobs: canonize Error object John Snow
2018-08-27  9:41   ` Max Reitz
2018-08-27 10:43     ` Max Reitz
2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 3/9] jobs: add exit shim John Snow
2018-08-27 10:00   ` Max Reitz
2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 4/9] block/commit: utilize job_exit shim John Snow
2018-08-27 10:28   ` Max Reitz
2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 5/9] block/mirror: " John Snow
2018-08-27 10:30   ` Max Reitz
2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 6/9] jobs: " John Snow
2018-08-27 10:37   ` Max Reitz
2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 7/9] block/backup: make function variables consistently named John Snow
2018-08-27 10:41   ` Max Reitz
2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 8/9] jobs: remove ret argument to job_completed; privatize it John Snow
2018-08-27 10:52   ` Max Reitz
2018-08-27 18:43     ` John Snow
2018-08-23 22:08 ` [Qemu-devel] [PATCH v2 9/9] jobs: remove job_defer_to_main_loop John Snow
2018-08-27 10:56   ` Max Reitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.