All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/9] jobs: Job Exit Refactoring Pt 1
@ 2018-08-30  1:57 John Snow
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 1/9] jobs: change start callback to run callback John Snow
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: John Snow @ 2018-08-30  1:57 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Jeff Cody, Stefan Hajnoczi, Max Reitz, jtc, John Snow

This is part one of a two part series that refactors the exit logic
of jobs.

Part one removes job_defer_to_main_loop.
Part two removes the job->exit() callback introduced in part one.

It's redundant to have each job manage deferring to the main loop
itself. Unifying this makes sense from an API standpoint.
Doing so also allows us to remove a tricky case where the completion
code is called under an aio_context lock, which then calls the
finalization code which is itself executed under a second aio_context
lock, leading to deadlocks.

Removing this recursive lock acquisition is necessary for converting
mirror to only modify its graph post-finalization, but it's also just
safer and will bite us less in the future.

This series introduces a job->exit callback, but after jobs are
fully transitioned to using the .commit/.abort callbacks in Pt 2,
this new completion callback will be removed again. It's only here
as a crutch to let us investigate the completion refactoring in Pt 2
more carefully.

====
V3:
====

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/9:[----] [--] 'jobs: change start callback to run callback'
002/9:[0006] [FC] 'jobs: canonize Error object'
003/9:[----] [--] 'jobs: add exit shim'
004/9:[----] [--] 'block/commit: utilize job_exit shim'
005/9:[0004] [FC] 'block/mirror: utilize job_exit shim'
006/9:[----] [--] 'jobs: utilize job_exit shim'
007/9:[----] [--] 'block/backup: make function variables consistently named'
008/9:[0006] [FC] 'jobs: remove ret argument to job_completed; privatize it'
009/9:[----] [--] 'jobs: remove job_defer_to_main_loop'

002: Adjusted Error *err comment to clarify when it is expected to be set.
     (I could not implement all of Max's suggestions for when job_update_rc
     should be called, in his responses to patches 002 and 008.)

005: Confirmed the reason why we need to bdrv_ref(src),
     and updated comment accordingly.

008: Clarified the meaning of the job->ret property.

001, 003-004, 006-007, 009: Added R-B (Max)

====
V2:
====

002: Update commit message
     Remove errant space (Eric, Max)
     Update error message setting (Kevin)
003: Add comment clarifying that .exit is temporary/transitional (Max, Eric)
004: change reference from `ret` to `job->ret`
     (Note: most of these references go away in Pt 2 of the series,
            except for those in mirror_exit.) (for Max.)
007: Added, at Eric's suggestion.
008: Moved forward from Pt 2 of the series. (Max.)

John Snow (9):
  jobs: change start callback to run callback
  jobs: canonize Error object
  jobs: add exit shim
  block/commit: utilize job_exit shim
  block/mirror: utilize job_exit shim
  jobs: utilize job_exit shim
  block/backup: make function variables consistently named
  jobs: remove ret argument to job_completed; privatize it
  jobs: remove job_defer_to_main_loop

 block/backup.c            | 81 +++++++++++++++++++----------------------------
 block/commit.c            | 29 ++++++-----------
 block/create.c            | 19 ++++-------
 block/mirror.c            | 39 ++++++++++-------------
 block/stream.c            | 29 +++++++----------
 include/qemu/job.h        | 70 ++++++++++++++++++++--------------------
 job-qmp.c                 |  5 +--
 job.c                     | 73 +++++++++++++++---------------------------
 tests/test-bdrv-drain.c   | 13 +++-----
 tests/test-blockjob-txn.c | 25 ++++++---------
 tests/test-blockjob.c     | 17 +++++-----
 trace-events              |  2 +-
 12 files changed, 160 insertions(+), 242 deletions(-)

-- 
2.14.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v3 1/9] jobs: change start callback to run callback
  2018-08-30  1:57 [Qemu-devel] [PATCH v3 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
@ 2018-08-30  1:57 ` John Snow
  2018-08-31 13:27   ` Jeff Cody
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 2/9] jobs: canonize Error object John Snow
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: John Snow @ 2018-08-30  1:57 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Jeff Cody, Stefan Hajnoczi, Max Reitz, jtc, John Snow

Presently we codify the entry point for a job as the "start" callback,
but a more apt name would be "run" to clarify the idea that when this
function returns we consider the job to have "finished," except for
any cleanup which occurs in separate callbacks later.

As part of this clarification, change the signature to include an error
object and a return code. The error ptr is not yet used, and the return
code while captured, will be overwritten by actions in the job_completed
function.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/backup.c            |  7 ++++---
 block/commit.c            |  7 ++++---
 block/create.c            |  8 +++++---
 block/mirror.c            | 10 ++++++----
 block/stream.c            |  7 ++++---
 include/qemu/job.h        |  2 +-
 job.c                     |  6 +++---
 tests/test-bdrv-drain.c   |  7 ++++---
 tests/test-blockjob-txn.c | 16 ++++++++--------
 tests/test-blockjob.c     |  7 ++++---
 10 files changed, 43 insertions(+), 34 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 8630d32926..5d47781840 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -480,9 +480,9 @@ static void backup_incremental_init_copy_bitmap(BackupBlockJob *job)
     bdrv_dirty_iter_free(dbi);
 }
 
-static void coroutine_fn backup_run(void *opaque)
+static int coroutine_fn backup_run(Job *opaque_job, Error **errp)
 {
-    BackupBlockJob *job = opaque;
+    BackupBlockJob *job = container_of(opaque_job, BackupBlockJob, common.job);
     BackupCompleteData *data;
     BlockDriverState *bs = blk_bs(job->common.blk);
     int64_t offset, nb_clusters;
@@ -587,6 +587,7 @@ static void coroutine_fn backup_run(void *opaque)
     data = g_malloc(sizeof(*data));
     data->ret = ret;
     job_defer_to_main_loop(&job->common.job, backup_complete, data);
+    return ret;
 }
 
 static const BlockJobDriver backup_job_driver = {
@@ -596,7 +597,7 @@ static const BlockJobDriver backup_job_driver = {
         .free                   = block_job_free,
         .user_resume            = block_job_user_resume,
         .drain                  = block_job_drain,
-        .start                  = backup_run,
+        .run                    = backup_run,
         .commit                 = backup_commit,
         .abort                  = backup_abort,
         .clean                  = backup_clean,
diff --git a/block/commit.c b/block/commit.c
index eb414579bd..a0ea86ff64 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -134,9 +134,9 @@ static void commit_complete(Job *job, void *opaque)
     bdrv_unref(top);
 }
 
-static void coroutine_fn commit_run(void *opaque)
+static int coroutine_fn commit_run(Job *job, Error **errp)
 {
-    CommitBlockJob *s = opaque;
+    CommitBlockJob *s = container_of(job, CommitBlockJob, common.job);
     CommitCompleteData *data;
     int64_t offset;
     uint64_t delay_ns = 0;
@@ -213,6 +213,7 @@ out:
     data = g_malloc(sizeof(*data));
     data->ret = ret;
     job_defer_to_main_loop(&s->common.job, commit_complete, data);
+    return ret;
 }
 
 static const BlockJobDriver commit_job_driver = {
@@ -222,7 +223,7 @@ static const BlockJobDriver commit_job_driver = {
         .free          = block_job_free,
         .user_resume   = block_job_user_resume,
         .drain         = block_job_drain,
-        .start         = commit_run,
+        .run           = commit_run,
     },
 };
 
diff --git a/block/create.c b/block/create.c
index 915cd41bcc..04733c3618 100644
--- a/block/create.c
+++ b/block/create.c
@@ -45,9 +45,9 @@ static void blockdev_create_complete(Job *job, void *opaque)
     job_completed(job, s->ret, s->err);
 }
 
-static void coroutine_fn blockdev_create_run(void *opaque)
+static int coroutine_fn blockdev_create_run(Job *job, Error **errp)
 {
-    BlockdevCreateJob *s = opaque;
+    BlockdevCreateJob *s = container_of(job, BlockdevCreateJob, common);
 
     job_progress_set_remaining(&s->common, 1);
     s->ret = s->drv->bdrv_co_create(s->opts, &s->err);
@@ -55,12 +55,14 @@ static void coroutine_fn blockdev_create_run(void *opaque)
 
     qapi_free_BlockdevCreateOptions(s->opts);
     job_defer_to_main_loop(&s->common, blockdev_create_complete, NULL);
+
+    return s->ret;
 }
 
 static const JobDriver blockdev_create_job_driver = {
     .instance_size = sizeof(BlockdevCreateJob),
     .job_type      = JOB_TYPE_CREATE,
-    .start         = blockdev_create_run,
+    .run           = blockdev_create_run,
 };
 
 void qmp_blockdev_create(const char *job_id, BlockdevCreateOptions *options,
diff --git a/block/mirror.c b/block/mirror.c
index 6cc10df5c9..691763db41 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -812,9 +812,9 @@ static int mirror_flush(MirrorBlockJob *s)
     return ret;
 }
 
-static void coroutine_fn mirror_run(void *opaque)
+static int coroutine_fn mirror_run(Job *job, Error **errp)
 {
-    MirrorBlockJob *s = opaque;
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
     MirrorExitData *data;
     BlockDriverState *bs = s->mirror_top_bs->backing->bs;
     BlockDriverState *target_bs = blk_bs(s->target);
@@ -1041,7 +1041,9 @@ immediate_exit:
     if (need_drain) {
         bdrv_drained_begin(bs);
     }
+
     job_defer_to_main_loop(&s->common.job, mirror_exit, data);
+    return ret;
 }
 
 static void mirror_complete(Job *job, Error **errp)
@@ -1138,7 +1140,7 @@ static const BlockJobDriver mirror_job_driver = {
         .free                   = block_job_free,
         .user_resume            = block_job_user_resume,
         .drain                  = block_job_drain,
-        .start                  = mirror_run,
+        .run                    = mirror_run,
         .pause                  = mirror_pause,
         .complete               = mirror_complete,
     },
@@ -1154,7 +1156,7 @@ static const BlockJobDriver commit_active_job_driver = {
         .free                   = block_job_free,
         .user_resume            = block_job_user_resume,
         .drain                  = block_job_drain,
-        .start                  = mirror_run,
+        .run                    = mirror_run,
         .pause                  = mirror_pause,
         .complete               = mirror_complete,
     },
diff --git a/block/stream.c b/block/stream.c
index 9264b68a1e..b4b987df7e 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -97,9 +97,9 @@ out:
     g_free(data);
 }
 
-static void coroutine_fn stream_run(void *opaque)
+static int coroutine_fn stream_run(Job *job, Error **errp)
 {
-    StreamBlockJob *s = opaque;
+    StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
     StreamCompleteData *data;
     BlockBackend *blk = s->common.blk;
     BlockDriverState *bs = blk_bs(blk);
@@ -206,6 +206,7 @@ out:
     data = g_malloc(sizeof(*data));
     data->ret = ret;
     job_defer_to_main_loop(&s->common.job, stream_complete, data);
+    return ret;
 }
 
 static const BlockJobDriver stream_job_driver = {
@@ -213,7 +214,7 @@ static const BlockJobDriver stream_job_driver = {
         .instance_size = sizeof(StreamBlockJob),
         .job_type      = JOB_TYPE_STREAM,
         .free          = block_job_free,
-        .start         = stream_run,
+        .run           = stream_run,
         .user_resume   = block_job_user_resume,
         .drain         = block_job_drain,
     },
diff --git a/include/qemu/job.h b/include/qemu/job.h
index 18c9223e31..9cf463d228 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -169,7 +169,7 @@ struct JobDriver {
     JobType job_type;
 
     /** Mandatory: Entrypoint for the Coroutine. */
-    CoroutineEntry *start;
+    int coroutine_fn (*run)(Job *job, Error **errp);
 
     /**
      * If the callback is not NULL, it will be invoked when the job transitions
diff --git a/job.c b/job.c
index e36ebaafd8..76988f6678 100644
--- a/job.c
+++ b/job.c
@@ -544,16 +544,16 @@ static void coroutine_fn job_co_entry(void *opaque)
 {
     Job *job = opaque;
 
-    assert(job && job->driver && job->driver->start);
+    assert(job && job->driver && job->driver->run);
     job_pause_point(job);
-    job->driver->start(job);
+    job->ret = job->driver->run(job, NULL);
 }
 
 
 void job_start(Job *job)
 {
     assert(job && !job_started(job) && job->paused &&
-           job->driver && job->driver->start);
+           job->driver && job->driver->run);
     job->co = qemu_coroutine_create(job_co_entry, job);
     job->pause_count--;
     job->busy = true;
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 17bb8508ae..a7533861f6 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -757,9 +757,9 @@ static void test_job_completed(Job *job, void *opaque)
     job_completed(job, 0, NULL);
 }
 
-static void coroutine_fn test_job_start(void *opaque)
+static int coroutine_fn test_job_run(Job *job, Error **errp)
 {
-    TestBlockJob *s = opaque;
+    TestBlockJob *s = container_of(job, TestBlockJob, common.job);
 
     job_transition_to_ready(&s->common.job);
     while (!s->should_complete) {
@@ -771,6 +771,7 @@ static void coroutine_fn test_job_start(void *opaque)
     }
 
     job_defer_to_main_loop(&s->common.job, test_job_completed, NULL);
+    return 0;
 }
 
 static void test_job_complete(Job *job, Error **errp)
@@ -785,7 +786,7 @@ BlockJobDriver test_job_driver = {
         .free           = block_job_free,
         .user_resume    = block_job_user_resume,
         .drain          = block_job_drain,
-        .start          = test_job_start,
+        .run            = test_job_run,
         .complete       = test_job_complete,
     },
 };
diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
index 58d9b87fb2..3194924071 100644
--- a/tests/test-blockjob-txn.c
+++ b/tests/test-blockjob-txn.c
@@ -38,25 +38,25 @@ static void test_block_job_complete(Job *job, void *opaque)
     bdrv_unref(bs);
 }
 
-static void coroutine_fn test_block_job_run(void *opaque)
+static int coroutine_fn test_block_job_run(Job *job, Error **errp)
 {
-    TestBlockJob *s = opaque;
-    BlockJob *job = &s->common;
+    TestBlockJob *s = container_of(job, TestBlockJob, common.job);
 
     while (s->iterations--) {
         if (s->use_timer) {
-            job_sleep_ns(&job->job, 0);
+            job_sleep_ns(job, 0);
         } else {
-            job_yield(&job->job);
+            job_yield(job);
         }
 
-        if (job_is_cancelled(&job->job)) {
+        if (job_is_cancelled(job)) {
             break;
         }
     }
 
-    job_defer_to_main_loop(&job->job, test_block_job_complete,
+    job_defer_to_main_loop(job, test_block_job_complete,
                            (void *)(intptr_t)s->rc);
+    return s->rc;
 }
 
 typedef struct {
@@ -80,7 +80,7 @@ static const BlockJobDriver test_block_job_driver = {
         .free          = block_job_free,
         .user_resume   = block_job_user_resume,
         .drain         = block_job_drain,
-        .start         = test_block_job_run,
+        .run           = test_block_job_run,
     },
 };
 
diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
index cb42f06e61..b0462bfdec 100644
--- a/tests/test-blockjob.c
+++ b/tests/test-blockjob.c
@@ -176,9 +176,9 @@ static void cancel_job_complete(Job *job, Error **errp)
     s->should_complete = true;
 }
 
-static void coroutine_fn cancel_job_start(void *opaque)
+static int coroutine_fn cancel_job_run(Job *job, Error **errp)
 {
-    CancelJob *s = opaque;
+    CancelJob *s = container_of(job, CancelJob, common.job);
 
     while (!s->should_complete) {
         if (job_is_cancelled(&s->common.job)) {
@@ -194,6 +194,7 @@ static void coroutine_fn cancel_job_start(void *opaque)
 
  defer:
     job_defer_to_main_loop(&s->common.job, cancel_job_completed, s);
+    return 0;
 }
 
 static const BlockJobDriver test_cancel_driver = {
@@ -202,7 +203,7 @@ static const BlockJobDriver test_cancel_driver = {
         .free          = block_job_free,
         .user_resume   = block_job_user_resume,
         .drain         = block_job_drain,
-        .start         = cancel_job_start,
+        .run           = cancel_job_run,
         .complete      = cancel_job_complete,
     },
 };
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v3 2/9] jobs: canonize Error object
  2018-08-30  1:57 [Qemu-devel] [PATCH v3 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 1/9] jobs: change start callback to run callback John Snow
@ 2018-08-30  1:57 ` John Snow
  2018-08-30 19:58   ` Eric Blake
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 3/9] jobs: add exit shim John Snow
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: John Snow @ 2018-08-30  1:57 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Jeff Cody, Stefan Hajnoczi, Max Reitz, jtc, John Snow

Jobs presently use both an Error object in the case of the create job,
and char strings in the case of generic errors elsewhere.

Unify the two paths as just j->err, and remove the extra argument from
job_completed. The integer error code for job_completed is kept for now,
to be removed shortly in a separate patch.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c            |  2 +-
 block/commit.c            |  2 +-
 block/create.c            |  5 ++---
 block/mirror.c            |  2 +-
 block/stream.c            |  2 +-
 include/qemu/job.h        | 14 ++++++++------
 job-qmp.c                 |  5 +++--
 job.c                     | 18 ++++++------------
 tests/test-bdrv-drain.c   |  2 +-
 tests/test-blockjob-txn.c |  2 +-
 tests/test-blockjob.c     |  2 +-
 11 files changed, 26 insertions(+), 30 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 5d47781840..1e965d54e5 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -388,7 +388,7 @@ static void backup_complete(Job *job, void *opaque)
 {
     BackupCompleteData *data = opaque;
 
-    job_completed(job, data->ret, NULL);
+    job_completed(job, data->ret);
     g_free(data);
 }
 
diff --git a/block/commit.c b/block/commit.c
index a0ea86ff64..4a17bb73ec 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -117,7 +117,7 @@ static void commit_complete(Job *job, void *opaque)
      * bdrv_set_backing_hd() to fail. */
     block_job_remove_all_bdrv(bjob);
 
-    job_completed(job, ret, NULL);
+    job_completed(job, ret);
     g_free(data);
 
     /* If bdrv_drop_intermediate() didn't already do that, remove the commit
diff --git a/block/create.c b/block/create.c
index 04733c3618..26a385c6c7 100644
--- a/block/create.c
+++ b/block/create.c
@@ -35,14 +35,13 @@ typedef struct BlockdevCreateJob {
     BlockDriver *drv;
     BlockdevCreateOptions *opts;
     int ret;
-    Error *err;
 } BlockdevCreateJob;
 
 static void blockdev_create_complete(Job *job, void *opaque)
 {
     BlockdevCreateJob *s = container_of(job, BlockdevCreateJob, common);
 
-    job_completed(job, s->ret, s->err);
+    job_completed(job, s->ret);
 }
 
 static int coroutine_fn blockdev_create_run(Job *job, Error **errp)
@@ -50,7 +49,7 @@ static int coroutine_fn blockdev_create_run(Job *job, Error **errp)
     BlockdevCreateJob *s = container_of(job, BlockdevCreateJob, common);
 
     job_progress_set_remaining(&s->common, 1);
-    s->ret = s->drv->bdrv_co_create(s->opts, &s->err);
+    s->ret = s->drv->bdrv_co_create(s->opts, errp);
     job_progress_update(&s->common, 1);
 
     qapi_free_BlockdevCreateOptions(s->opts);
diff --git a/block/mirror.c b/block/mirror.c
index 691763db41..be5dc6b7b0 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -710,7 +710,7 @@ static void mirror_exit(Job *job, void *opaque)
     blk_insert_bs(bjob->blk, mirror_top_bs, &error_abort);
 
     bs_opaque->job = NULL;
-    job_completed(job, data->ret, NULL);
+    job_completed(job, data->ret);
 
     g_free(data);
     bdrv_drained_end(src);
diff --git a/block/stream.c b/block/stream.c
index b4b987df7e..26a775386b 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -93,7 +93,7 @@ out:
     }
 
     g_free(s->backing_file_str);
-    job_completed(job, data->ret, NULL);
+    job_completed(job, data->ret);
     g_free(data);
 }
 
diff --git a/include/qemu/job.h b/include/qemu/job.h
index 9cf463d228..e0e99870a1 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -124,12 +124,16 @@ typedef struct Job {
     /** Estimated progress_current value at the completion of the job */
     int64_t progress_total;
 
-    /** Error string for a failed job (NULL if, and only if, job->ret == 0) */
-    char *error;
-
     /** ret code passed to job_completed. */
     int ret;
 
+    /**
+     * Error object for a failed job.
+     * If job->ret is nonzero and an error object was not set, it will be set
+     * to strerror(-job->ret) during job_completed.
+     */
+    Error *err;
+
     /** The completion function that will be called when the job completes.  */
     BlockCompletionFunc *cb;
 
@@ -484,15 +488,13 @@ void job_transition_to_ready(Job *job);
 /**
  * @job: The job being completed.
  * @ret: The status code.
- * @error: The error message for a failing job (only with @ret < 0). If @ret is
- *         negative, but NULL is given for @error, strerror() is used.
  *
  * Marks @job as completed. If @ret is non-zero, the job transaction it is part
  * of is aborted. If @ret is zero, the job moves into the WAITING state. If it
  * is the last job to complete in its transaction, all jobs in the transaction
  * move from WAITING to PENDING.
  */
-void job_completed(Job *job, int ret, Error *error);
+void job_completed(Job *job, int ret);
 
 /** Asynchronously complete the specified @job. */
 void job_complete(Job *job, Error **errp);
diff --git a/job-qmp.c b/job-qmp.c
index 410775df61..a969b2bbf0 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -146,8 +146,9 @@ static JobInfo *job_query_single(Job *job, Error **errp)
         .status             = job->status,
         .current_progress   = job->progress_current,
         .total_progress     = job->progress_total,
-        .has_error          = !!job->error,
-        .error              = g_strdup(job->error),
+        .has_error          = !!job->err,
+        .error              = job->err ? \
+                              g_strdup(error_get_pretty(job->err)) : NULL,
     };
 
     return info;
diff --git a/job.c b/job.c
index 76988f6678..bc1d970df4 100644
--- a/job.c
+++ b/job.c
@@ -369,7 +369,7 @@ void job_unref(Job *job)
 
         QLIST_REMOVE(job, job_list);
 
-        g_free(job->error);
+        error_free(job->err);
         g_free(job->id);
         g_free(job);
     }
@@ -546,7 +546,7 @@ static void coroutine_fn job_co_entry(void *opaque)
 
     assert(job && job->driver && job->driver->run);
     job_pause_point(job);
-    job->ret = job->driver->run(job, NULL);
+    job->ret = job->driver->run(job, &job->err);
 }
 
 
@@ -666,8 +666,8 @@ static void job_update_rc(Job *job)
         job->ret = -ECANCELED;
     }
     if (job->ret) {
-        if (!job->error) {
-            job->error = g_strdup(strerror(-job->ret));
+        if (!job->err) {
+            error_setg(&job->err, "%s", g_strdup(strerror(-job->ret)));
         }
         job_state_transition(job, JOB_STATUS_ABORTING);
     }
@@ -865,17 +865,11 @@ static void job_completed_txn_success(Job *job)
     }
 }
 
-void job_completed(Job *job, int ret, Error *error)
+void job_completed(Job *job, int ret)
 {
     assert(job && job->txn && !job_is_completed(job));
 
     job->ret = ret;
-    if (error) {
-        assert(job->ret < 0);
-        job->error = g_strdup(error_get_pretty(error));
-        error_free(error);
-    }
-
     job_update_rc(job);
     trace_job_completed(job, ret, job->ret);
     if (job->ret) {
@@ -893,7 +887,7 @@ void job_cancel(Job *job, bool force)
     }
     job_cancel_async(job, force);
     if (!job_started(job)) {
-        job_completed(job, -ECANCELED, NULL);
+        job_completed(job, -ECANCELED);
     } else if (job->deferred_to_main_loop) {
         job_completed_txn_abort(job);
     } else {
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index a7533861f6..00604dfc0c 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -754,7 +754,7 @@ typedef struct TestBlockJob {
 
 static void test_job_completed(Job *job, void *opaque)
 {
-    job_completed(job, 0, NULL);
+    job_completed(job, 0);
 }
 
 static int coroutine_fn test_job_run(Job *job, Error **errp)
diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
index 3194924071..82cedee78b 100644
--- a/tests/test-blockjob-txn.c
+++ b/tests/test-blockjob-txn.c
@@ -34,7 +34,7 @@ static void test_block_job_complete(Job *job, void *opaque)
         rc = -ECANCELED;
     }
 
-    job_completed(job, rc, NULL);
+    job_completed(job, rc);
     bdrv_unref(bs);
 }
 
diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
index b0462bfdec..408a226939 100644
--- a/tests/test-blockjob.c
+++ b/tests/test-blockjob.c
@@ -167,7 +167,7 @@ static void cancel_job_completed(Job *job, void *opaque)
 {
     CancelJob *s = opaque;
     s->completed = true;
-    job_completed(job, 0, NULL);
+    job_completed(job, 0);
 }
 
 static void cancel_job_complete(Job *job, Error **errp)
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v3 3/9] jobs: add exit shim
  2018-08-30  1:57 [Qemu-devel] [PATCH v3 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 1/9] jobs: change start callback to run callback John Snow
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 2/9] jobs: canonize Error object John Snow
@ 2018-08-30  1:57 ` John Snow
  2018-08-31 13:48   ` Jeff Cody
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 4/9] block/commit: utilize job_exit shim John Snow
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: John Snow @ 2018-08-30  1:57 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Jeff Cody, Stefan Hajnoczi, Max Reitz, jtc, John Snow

All jobs do the same thing when they leave their running loop:
- Store the return code in a structure
- wait to receive this structure in the main thread
- signal job completion via job_completed

Few jobs do anything beyond exactly this. Consolidate this exit
logic for a net reduction in SLOC.

More seriously, when we utilize job_defer_to_main_loop_bh to call
a function that calls job_completed, job_finalize_single will run
in a context where it has recursively taken the aio_context lock,
which can cause hangs if it puts down a reference that causes a flush.

You can observe this in practice by looking at mirror_exit's careful
placement of job_completed and bdrv_unref calls.

If we centralize job exiting, we can signal job completion from outside
of the aio_context, which should allow for job cleanup code to run with
only one lock, which makes cleanup callbacks less tricky to write.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 include/qemu/job.h | 11 +++++++++++
 job.c              | 18 ++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index e0e99870a1..1144d671a1 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -208,6 +208,17 @@ struct JobDriver {
      */
     void (*drain)(Job *job);
 
+    /**
+     * If the callback is not NULL, exit will be invoked from the main thread
+     * when the job's coroutine has finished, but before transactional
+     * convergence; before @prepare or @abort.
+     *
+     * FIXME TODO: This callback is only temporary to transition remaining jobs
+     * to prepare/commit/abort/clean callbacks and will be removed before 3.1.
+     * is released.
+     */
+    void (*exit)(Job *job);
+
     /**
      * If the callback is not NULL, prepare will be invoked when all the jobs
      * belonging to the same transaction complete; or upon this job's completion
diff --git a/job.c b/job.c
index bc1d970df4..bc8dad4e71 100644
--- a/job.c
+++ b/job.c
@@ -535,6 +535,18 @@ void job_drain(Job *job)
     }
 }
 
+static void job_exit(void *opaque)
+{
+    Job *job = (Job *)opaque;
+    AioContext *aio_context = job->aio_context;
+
+    if (job->driver->exit) {
+        aio_context_acquire(aio_context);
+        job->driver->exit(job);
+        aio_context_release(aio_context);
+    }
+    job_completed(job, job->ret);
+}
 
 /**
  * All jobs must allow a pause point before entering their job proper. This
@@ -547,6 +559,12 @@ static void coroutine_fn job_co_entry(void *opaque)
     assert(job && job->driver && job->driver->run);
     job_pause_point(job);
     job->ret = job->driver->run(job, &job->err);
+    if (!job->deferred_to_main_loop) {
+        job->deferred_to_main_loop = true;
+        aio_bh_schedule_oneshot(qemu_get_aio_context(),
+                                job_exit,
+                                job);
+    }
 }
 
 
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v3 4/9] block/commit: utilize job_exit shim
  2018-08-30  1:57 [Qemu-devel] [PATCH v3 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
                   ` (2 preceding siblings ...)
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 3/9] jobs: add exit shim John Snow
@ 2018-08-30  1:57 ` John Snow
  2018-08-31 13:58   ` Jeff Cody
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 5/9] block/mirror: " John Snow
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: John Snow @ 2018-08-30  1:57 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Jeff Cody, Stefan Hajnoczi, Max Reitz, jtc, John Snow

Change the manual deferment to commit_complete into the implicit
callback to job_exit, renaming commit_complete to commit_exit.

This conversion does change the timing of when job_completed is
called to after the bdrv_replace_node and bdrv_unref calls, which
could have implications for bjob->blk which will now be put down
after this cleanup.

Kevin highlights that we did not take any permissions for that backend
at job creation time, so it is safe to reorder these operations.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/commit.c | 22 +++++-----------------
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 4a17bb73ec..da69165de3 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -68,19 +68,13 @@ static int coroutine_fn commit_populate(BlockBackend *bs, BlockBackend *base,
     return 0;
 }
 
-typedef struct {
-    int ret;
-} CommitCompleteData;
-
-static void commit_complete(Job *job, void *opaque)
+static void commit_exit(Job *job)
 {
     CommitBlockJob *s = container_of(job, CommitBlockJob, common.job);
     BlockJob *bjob = &s->common;
-    CommitCompleteData *data = opaque;
     BlockDriverState *top = blk_bs(s->top);
     BlockDriverState *base = blk_bs(s->base);
     BlockDriverState *commit_top_bs = s->commit_top_bs;
-    int ret = data->ret;
     bool remove_commit_top_bs = false;
 
     /* Make sure commit_top_bs and top stay around until bdrv_replace_node() */
@@ -91,10 +85,10 @@ static void commit_complete(Job *job, void *opaque)
      * the normal backing chain can be restored. */
     blk_unref(s->base);
 
-    if (!job_is_cancelled(job) && ret == 0) {
+    if (!job_is_cancelled(job) && job->ret == 0) {
         /* success */
-        ret = bdrv_drop_intermediate(s->commit_top_bs, base,
-                                     s->backing_file_str);
+        job->ret = bdrv_drop_intermediate(s->commit_top_bs, base,
+                                          s->backing_file_str);
     } else {
         /* XXX Can (or should) we somehow keep 'consistent read' blocked even
          * after the failed/cancelled commit job is gone? If we already wrote
@@ -117,9 +111,6 @@ static void commit_complete(Job *job, void *opaque)
      * bdrv_set_backing_hd() to fail. */
     block_job_remove_all_bdrv(bjob);
 
-    job_completed(job, ret);
-    g_free(data);
-
     /* If bdrv_drop_intermediate() didn't already do that, remove the commit
      * filter driver from the backing chain. Do this as the final step so that
      * the 'consistent read' permission can be granted.  */
@@ -137,7 +128,6 @@ static void commit_complete(Job *job, void *opaque)
 static int coroutine_fn commit_run(Job *job, Error **errp)
 {
     CommitBlockJob *s = container_of(job, CommitBlockJob, common.job);
-    CommitCompleteData *data;
     int64_t offset;
     uint64_t delay_ns = 0;
     int ret = 0;
@@ -210,9 +200,6 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
 out:
     qemu_vfree(buf);
 
-    data = g_malloc(sizeof(*data));
-    data->ret = ret;
-    job_defer_to_main_loop(&s->common.job, commit_complete, data);
     return ret;
 }
 
@@ -224,6 +211,7 @@ static const BlockJobDriver commit_job_driver = {
         .user_resume   = block_job_user_resume,
         .drain         = block_job_drain,
         .run           = commit_run,
+        .exit          = commit_exit,
     },
 };
 
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v3 5/9] block/mirror: utilize job_exit shim
  2018-08-30  1:57 [Qemu-devel] [PATCH v3 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
                   ` (3 preceding siblings ...)
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 4/9] block/commit: utilize job_exit shim John Snow
@ 2018-08-30  1:57 ` John Snow
  2018-08-31 13:23   ` Max Reitz
  2018-08-31 14:09   ` Jeff Cody
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 6/9] jobs: " John Snow
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 24+ messages in thread
From: John Snow @ 2018-08-30  1:57 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Jeff Cody, Stefan Hajnoczi, Max Reitz, jtc, John Snow

Change the manual deferment to mirror_exit into the implicit
callback to job_exit and the mirror_exit callback.

This does change the order of some bdrv_unref calls and job_completed,
but thanks to the new context in which we call .exit, this is safe to
defer the possible flushing of any nodes to the job_finalize_single
cleanup stage.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/mirror.c | 29 +++++++++++------------------
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index be5dc6b7b0..b8941db6c1 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -607,26 +607,22 @@ static void mirror_wait_for_all_io(MirrorBlockJob *s)
     }
 }
 
-typedef struct {
-    int ret;
-} MirrorExitData;
-
-static void mirror_exit(Job *job, void *opaque)
+static void mirror_exit(Job *job)
 {
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
     BlockJob *bjob = &s->common;
-    MirrorExitData *data = opaque;
     MirrorBDSOpaque *bs_opaque = s->mirror_top_bs->opaque;
     AioContext *replace_aio_context = NULL;
     BlockDriverState *src = s->mirror_top_bs->backing->bs;
     BlockDriverState *target_bs = blk_bs(s->target);
     BlockDriverState *mirror_top_bs = s->mirror_top_bs;
     Error *local_err = NULL;
+    int ret = job->ret;
 
     bdrv_release_dirty_bitmap(src, s->dirty_bitmap);
 
-    /* Make sure that the source BDS doesn't go away before we called
-     * job_completed(). */
+    /* Make sure that the source BDS doesn't go away during bdrv_replace_node,
+     * before we can call bdrv_drained_end */
     bdrv_ref(src);
     bdrv_ref(mirror_top_bs);
     bdrv_ref(target_bs);
@@ -652,7 +648,7 @@ static void mirror_exit(Job *job, void *opaque)
             bdrv_set_backing_hd(target_bs, backing, &local_err);
             if (local_err) {
                 error_report_err(local_err);
-                data->ret = -EPERM;
+                ret = -EPERM;
             }
         }
     }
@@ -662,7 +658,7 @@ static void mirror_exit(Job *job, void *opaque)
         aio_context_acquire(replace_aio_context);
     }
 
-    if (s->should_complete && data->ret == 0) {
+    if (s->should_complete && ret == 0) {
         BlockDriverState *to_replace = src;
         if (s->to_replace) {
             to_replace = s->to_replace;
@@ -679,7 +675,7 @@ static void mirror_exit(Job *job, void *opaque)
         bdrv_drained_end(target_bs);
         if (local_err) {
             error_report_err(local_err);
-            data->ret = -EPERM;
+            ret = -EPERM;
         }
     }
     if (s->to_replace) {
@@ -710,12 +706,12 @@ static void mirror_exit(Job *job, void *opaque)
     blk_insert_bs(bjob->blk, mirror_top_bs, &error_abort);
 
     bs_opaque->job = NULL;
-    job_completed(job, data->ret);
 
-    g_free(data);
     bdrv_drained_end(src);
     bdrv_unref(mirror_top_bs);
     bdrv_unref(src);
+
+    job->ret = ret;
 }
 
 static void mirror_throttle(MirrorBlockJob *s)
@@ -815,7 +811,6 @@ static int mirror_flush(MirrorBlockJob *s)
 static int coroutine_fn mirror_run(Job *job, Error **errp)
 {
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
-    MirrorExitData *data;
     BlockDriverState *bs = s->mirror_top_bs->backing->bs;
     BlockDriverState *target_bs = blk_bs(s->target);
     bool need_drain = true;
@@ -1035,14 +1030,10 @@ immediate_exit:
     g_free(s->in_flight_bitmap);
     bdrv_dirty_iter_free(s->dbi);
 
-    data = g_malloc(sizeof(*data));
-    data->ret = ret;
-
     if (need_drain) {
         bdrv_drained_begin(bs);
     }
 
-    job_defer_to_main_loop(&s->common.job, mirror_exit, data);
     return ret;
 }
 
@@ -1141,6 +1132,7 @@ static const BlockJobDriver mirror_job_driver = {
         .user_resume            = block_job_user_resume,
         .drain                  = block_job_drain,
         .run                    = mirror_run,
+        .exit                   = mirror_exit,
         .pause                  = mirror_pause,
         .complete               = mirror_complete,
     },
@@ -1157,6 +1149,7 @@ static const BlockJobDriver commit_active_job_driver = {
         .user_resume            = block_job_user_resume,
         .drain                  = block_job_drain,
         .run                    = mirror_run,
+        .exit                   = mirror_exit,
         .pause                  = mirror_pause,
         .complete               = mirror_complete,
     },
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v3 6/9] jobs: utilize job_exit shim
  2018-08-30  1:57 [Qemu-devel] [PATCH v3 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
                   ` (4 preceding siblings ...)
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 5/9] block/mirror: " John Snow
@ 2018-08-30  1:57 ` John Snow
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 7/9] block/backup: make function variables consistently named John Snow
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: John Snow @ 2018-08-30  1:57 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Jeff Cody, Stefan Hajnoczi, Max Reitz, jtc, John Snow

Utilize the job_exit shim by not calling job_defer_to_main_loop, and
where applicable, converting the deferred callback into the job_exit
callback.

This converts backup, stream, create, and the unit tests all at once.
Most of these jobs do not see any changes to the order in which they
clean up their resources, except the test-blockjob-txn test, which
now puts down its bs before job_completed is called.

This is safe for the same reason the reordering in the mirror job is
safe, because job_completed no longer runs under two locks, making
the unref safe even if it causes a flush.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/backup.c            | 16 ----------------
 block/create.c            | 14 +++-----------
 block/stream.c            | 22 +++++++---------------
 tests/test-bdrv-drain.c   |  6 ------
 tests/test-blockjob-txn.c | 11 ++---------
 tests/test-blockjob.c     | 10 ++++------
 6 files changed, 16 insertions(+), 63 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 1e965d54e5..a67b7fa96b 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -380,18 +380,6 @@ static BlockErrorAction backup_error_action(BackupBlockJob *job,
     }
 }
 
-typedef struct {
-    int ret;
-} BackupCompleteData;
-
-static void backup_complete(Job *job, void *opaque)
-{
-    BackupCompleteData *data = opaque;
-
-    job_completed(job, data->ret);
-    g_free(data);
-}
-
 static bool coroutine_fn yield_and_check(BackupBlockJob *job)
 {
     uint64_t delay_ns;
@@ -483,7 +471,6 @@ static void backup_incremental_init_copy_bitmap(BackupBlockJob *job)
 static int coroutine_fn backup_run(Job *opaque_job, Error **errp)
 {
     BackupBlockJob *job = container_of(opaque_job, BackupBlockJob, common.job);
-    BackupCompleteData *data;
     BlockDriverState *bs = blk_bs(job->common.blk);
     int64_t offset, nb_clusters;
     int ret = 0;
@@ -584,9 +571,6 @@ static int coroutine_fn backup_run(Job *opaque_job, Error **errp)
     qemu_co_rwlock_unlock(&job->flush_rwlock);
     hbitmap_free(job->copy_bitmap);
 
-    data = g_malloc(sizeof(*data));
-    data->ret = ret;
-    job_defer_to_main_loop(&job->common.job, backup_complete, data);
     return ret;
 }
 
diff --git a/block/create.c b/block/create.c
index 26a385c6c7..95341219ef 100644
--- a/block/create.c
+++ b/block/create.c
@@ -34,28 +34,20 @@ typedef struct BlockdevCreateJob {
     Job common;
     BlockDriver *drv;
     BlockdevCreateOptions *opts;
-    int ret;
 } BlockdevCreateJob;
 
-static void blockdev_create_complete(Job *job, void *opaque)
-{
-    BlockdevCreateJob *s = container_of(job, BlockdevCreateJob, common);
-
-    job_completed(job, s->ret);
-}
-
 static int coroutine_fn blockdev_create_run(Job *job, Error **errp)
 {
     BlockdevCreateJob *s = container_of(job, BlockdevCreateJob, common);
+    int ret;
 
     job_progress_set_remaining(&s->common, 1);
-    s->ret = s->drv->bdrv_co_create(s->opts, errp);
+    ret = s->drv->bdrv_co_create(s->opts, errp);
     job_progress_update(&s->common, 1);
 
     qapi_free_BlockdevCreateOptions(s->opts);
-    job_defer_to_main_loop(&s->common, blockdev_create_complete, NULL);
 
-    return s->ret;
+    return ret;
 }
 
 static const JobDriver blockdev_create_job_driver = {
diff --git a/block/stream.c b/block/stream.c
index 26a775386b..67e1e72e23 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -54,20 +54,16 @@ static int coroutine_fn stream_populate(BlockBackend *blk,
     return blk_co_preadv(blk, offset, qiov.size, &qiov, BDRV_REQ_COPY_ON_READ);
 }
 
-typedef struct {
-    int ret;
-} StreamCompleteData;
-
-static void stream_complete(Job *job, void *opaque)
+static void stream_exit(Job *job)
 {
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
     BlockJob *bjob = &s->common;
-    StreamCompleteData *data = opaque;
     BlockDriverState *bs = blk_bs(bjob->blk);
     BlockDriverState *base = s->base;
     Error *local_err = NULL;
+    int ret = job->ret;
 
-    if (!job_is_cancelled(job) && bs->backing && data->ret == 0) {
+    if (!job_is_cancelled(job) && bs->backing && ret == 0) {
         const char *base_id = NULL, *base_fmt = NULL;
         if (base) {
             base_id = s->backing_file_str;
@@ -75,11 +71,11 @@ static void stream_complete(Job *job, void *opaque)
                 base_fmt = base->drv->format_name;
             }
         }
-        data->ret = bdrv_change_backing_file(bs, base_id, base_fmt);
+        ret = bdrv_change_backing_file(bs, base_id, base_fmt);
         bdrv_set_backing_hd(bs, base, &local_err);
         if (local_err) {
             error_report_err(local_err);
-            data->ret = -EPERM;
+            ret = -EPERM;
             goto out;
         }
     }
@@ -93,14 +89,12 @@ out:
     }
 
     g_free(s->backing_file_str);
-    job_completed(job, data->ret);
-    g_free(data);
+    job->ret = ret;
 }
 
 static int coroutine_fn stream_run(Job *job, Error **errp)
 {
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
-    StreamCompleteData *data;
     BlockBackend *blk = s->common.blk;
     BlockDriverState *bs = blk_bs(blk);
     BlockDriverState *base = s->base;
@@ -203,9 +197,6 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 
 out:
     /* Modify backing chain and close BDSes in main loop */
-    data = g_malloc(sizeof(*data));
-    data->ret = ret;
-    job_defer_to_main_loop(&s->common.job, stream_complete, data);
     return ret;
 }
 
@@ -215,6 +206,7 @@ static const BlockJobDriver stream_job_driver = {
         .job_type      = JOB_TYPE_STREAM,
         .free          = block_job_free,
         .run           = stream_run,
+        .exit          = stream_exit,
         .user_resume   = block_job_user_resume,
         .drain         = block_job_drain,
     },
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 00604dfc0c..9bcb3c72af 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -752,11 +752,6 @@ typedef struct TestBlockJob {
     bool should_complete;
 } TestBlockJob;
 
-static void test_job_completed(Job *job, void *opaque)
-{
-    job_completed(job, 0);
-}
-
 static int coroutine_fn test_job_run(Job *job, Error **errp)
 {
     TestBlockJob *s = container_of(job, TestBlockJob, common.job);
@@ -770,7 +765,6 @@ static int coroutine_fn test_job_run(Job *job, Error **errp)
         job_pause_point(&s->common.job);
     }
 
-    job_defer_to_main_loop(&s->common.job, test_job_completed, NULL);
     return 0;
 }
 
diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
index 82cedee78b..ef29f35e44 100644
--- a/tests/test-blockjob-txn.c
+++ b/tests/test-blockjob-txn.c
@@ -24,17 +24,11 @@ typedef struct {
     int *result;
 } TestBlockJob;
 
-static void test_block_job_complete(Job *job, void *opaque)
+static void test_block_job_exit(Job *job)
 {
     BlockJob *bjob = container_of(job, BlockJob, job);
     BlockDriverState *bs = blk_bs(bjob->blk);
-    int rc = (intptr_t)opaque;
 
-    if (job_is_cancelled(job)) {
-        rc = -ECANCELED;
-    }
-
-    job_completed(job, rc);
     bdrv_unref(bs);
 }
 
@@ -54,8 +48,6 @@ static int coroutine_fn test_block_job_run(Job *job, Error **errp)
         }
     }
 
-    job_defer_to_main_loop(job, test_block_job_complete,
-                           (void *)(intptr_t)s->rc);
     return s->rc;
 }
 
@@ -81,6 +73,7 @@ static const BlockJobDriver test_block_job_driver = {
         .user_resume   = block_job_user_resume,
         .drain         = block_job_drain,
         .run           = test_block_job_run,
+        .exit          = test_block_job_exit,
     },
 };
 
diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
index 408a226939..ad4a65bc78 100644
--- a/tests/test-blockjob.c
+++ b/tests/test-blockjob.c
@@ -163,11 +163,10 @@ typedef struct CancelJob {
     bool completed;
 } CancelJob;
 
-static void cancel_job_completed(Job *job, void *opaque)
+static void cancel_job_exit(Job *job)
 {
-    CancelJob *s = opaque;
+    CancelJob *s = container_of(job, CancelJob, common.job);
     s->completed = true;
-    job_completed(job, 0);
 }
 
 static void cancel_job_complete(Job *job, Error **errp)
@@ -182,7 +181,7 @@ static int coroutine_fn cancel_job_run(Job *job, Error **errp)
 
     while (!s->should_complete) {
         if (job_is_cancelled(&s->common.job)) {
-            goto defer;
+            return 0;
         }
 
         if (!job_is_ready(&s->common.job) && s->should_converge) {
@@ -192,8 +191,6 @@ static int coroutine_fn cancel_job_run(Job *job, Error **errp)
         job_sleep_ns(&s->common.job, 100000);
     }
 
- defer:
-    job_defer_to_main_loop(&s->common.job, cancel_job_completed, s);
     return 0;
 }
 
@@ -204,6 +201,7 @@ static const BlockJobDriver test_cancel_driver = {
         .user_resume   = block_job_user_resume,
         .drain         = block_job_drain,
         .run           = cancel_job_run,
+        .exit          = cancel_job_exit,
         .complete      = cancel_job_complete,
     },
 };
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v3 7/9] block/backup: make function variables consistently named
  2018-08-30  1:57 [Qemu-devel] [PATCH v3 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
                   ` (5 preceding siblings ...)
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 6/9] jobs: " John Snow
@ 2018-08-30  1:57 ` John Snow
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 8/9] jobs: remove ret argument to job_completed; privatize it John Snow
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: John Snow @ 2018-08-30  1:57 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Jeff Cody, Stefan Hajnoczi, Max Reitz, jtc, John Snow

Rename opaque_job to job to be consistent with other job implementations.
Rename 'job', the BackupBlockJob object, to 's' to also be consistent.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/backup.c | 62 +++++++++++++++++++++++++++++-----------------------------
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index a67b7fa96b..4d084f6ca6 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -468,59 +468,59 @@ static void backup_incremental_init_copy_bitmap(BackupBlockJob *job)
     bdrv_dirty_iter_free(dbi);
 }
 
-static int coroutine_fn backup_run(Job *opaque_job, Error **errp)
+static int coroutine_fn backup_run(Job *job, Error **errp)
 {
-    BackupBlockJob *job = container_of(opaque_job, BackupBlockJob, common.job);
-    BlockDriverState *bs = blk_bs(job->common.blk);
+    BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
+    BlockDriverState *bs = blk_bs(s->common.blk);
     int64_t offset, nb_clusters;
     int ret = 0;
 
-    QLIST_INIT(&job->inflight_reqs);
-    qemu_co_rwlock_init(&job->flush_rwlock);
+    QLIST_INIT(&s->inflight_reqs);
+    qemu_co_rwlock_init(&s->flush_rwlock);
 
-    nb_clusters = DIV_ROUND_UP(job->len, job->cluster_size);
-    job_progress_set_remaining(&job->common.job, job->len);
+    nb_clusters = DIV_ROUND_UP(s->len, s->cluster_size);
+    job_progress_set_remaining(job, s->len);
 
-    job->copy_bitmap = hbitmap_alloc(nb_clusters, 0);
-    if (job->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
-        backup_incremental_init_copy_bitmap(job);
+    s->copy_bitmap = hbitmap_alloc(nb_clusters, 0);
+    if (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
+        backup_incremental_init_copy_bitmap(s);
     } else {
-        hbitmap_set(job->copy_bitmap, 0, nb_clusters);
+        hbitmap_set(s->copy_bitmap, 0, nb_clusters);
     }
 
 
-    job->before_write.notify = backup_before_write_notify;
-    bdrv_add_before_write_notifier(bs, &job->before_write);
+    s->before_write.notify = backup_before_write_notify;
+    bdrv_add_before_write_notifier(bs, &s->before_write);
 
-    if (job->sync_mode == MIRROR_SYNC_MODE_NONE) {
+    if (s->sync_mode == MIRROR_SYNC_MODE_NONE) {
         /* All bits are set in copy_bitmap to allow any cluster to be copied.
          * This does not actually require them to be copied. */
-        while (!job_is_cancelled(&job->common.job)) {
+        while (!job_is_cancelled(job)) {
             /* Yield until the job is cancelled.  We just let our before_write
              * notify callback service CoW requests. */
-            job_yield(&job->common.job);
+            job_yield(job);
         }
-    } else if (job->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
-        ret = backup_run_incremental(job);
+    } else if (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
+        ret = backup_run_incremental(s);
     } else {
         /* Both FULL and TOP SYNC_MODE's require copying.. */
-        for (offset = 0; offset < job->len;
-             offset += job->cluster_size) {
+        for (offset = 0; offset < s->len;
+             offset += s->cluster_size) {
             bool error_is_read;
             int alloced = 0;
 
-            if (yield_and_check(job)) {
+            if (yield_and_check(s)) {
                 break;
             }
 
-            if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
+            if (s->sync_mode == MIRROR_SYNC_MODE_TOP) {
                 int i;
                 int64_t n;
 
                 /* Check to see if these blocks are already in the
                  * backing file. */
 
-                for (i = 0; i < job->cluster_size;) {
+                for (i = 0; i < s->cluster_size;) {
                     /* bdrv_is_allocated() only returns true/false based
                      * on the first set of sectors it comes across that
                      * are are all in the same state.
@@ -529,7 +529,7 @@ static int coroutine_fn backup_run(Job *opaque_job, Error **errp)
                      * needed but at some point that is always the case. */
                     alloced =
                         bdrv_is_allocated(bs, offset + i,
-                                          job->cluster_size - i, &n);
+                                          s->cluster_size - i, &n);
                     i += n;
 
                     if (alloced || n == 0) {
@@ -547,29 +547,29 @@ static int coroutine_fn backup_run(Job *opaque_job, Error **errp)
             if (alloced < 0) {
                 ret = alloced;
             } else {
-                ret = backup_do_cow(job, offset, job->cluster_size,
+                ret = backup_do_cow(s, offset, s->cluster_size,
                                     &error_is_read, false);
             }
             if (ret < 0) {
                 /* Depending on error action, fail now or retry cluster */
                 BlockErrorAction action =
-                    backup_error_action(job, error_is_read, -ret);
+                    backup_error_action(s, error_is_read, -ret);
                 if (action == BLOCK_ERROR_ACTION_REPORT) {
                     break;
                 } else {
-                    offset -= job->cluster_size;
+                    offset -= s->cluster_size;
                     continue;
                 }
             }
         }
     }
 
-    notifier_with_return_remove(&job->before_write);
+    notifier_with_return_remove(&s->before_write);
 
     /* wait until pending backup_do_cow() calls have completed */
-    qemu_co_rwlock_wrlock(&job->flush_rwlock);
-    qemu_co_rwlock_unlock(&job->flush_rwlock);
-    hbitmap_free(job->copy_bitmap);
+    qemu_co_rwlock_wrlock(&s->flush_rwlock);
+    qemu_co_rwlock_unlock(&s->flush_rwlock);
+    hbitmap_free(s->copy_bitmap);
 
     return ret;
 }
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v3 8/9] jobs: remove ret argument to job_completed; privatize it
  2018-08-30  1:57 [Qemu-devel] [PATCH v3 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
                   ` (6 preceding siblings ...)
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 7/9] block/backup: make function variables consistently named John Snow
@ 2018-08-30  1:57 ` John Snow
  2018-08-31 13:25   ` Max Reitz
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 9/9] jobs: remove job_defer_to_main_loop John Snow
  2018-08-31 14:12 ` [Qemu-devel] [PATCH v3 0/9] jobs: Job Exit Refactoring Pt 1 Max Reitz
  9 siblings, 1 reply; 24+ messages in thread
From: John Snow @ 2018-08-30  1:57 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Jeff Cody, Stefan Hajnoczi, Max Reitz, jtc, John Snow

Jobs are now expected to return their retcode on the stack, from the
.run callback, so we can remove that argument.

job_cancel does not need to set -ECANCELED because job_completed will
update the return code itself if the job was canceled.

While we're here, make job_completed static to job.c and remove it from
job.h; move the documentation of return code to the .run() callback and
to the job->ret property, accordingly.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 include/qemu/job.h | 28 +++++++++++++++-------------
 job.c              | 11 ++++++-----
 trace-events       |  2 +-
 3 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 1144d671a1..23395c17fa 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -124,7 +124,11 @@ typedef struct Job {
     /** Estimated progress_current value at the completion of the job */
     int64_t progress_total;
 
-    /** ret code passed to job_completed. */
+    /**
+     * Return code from @run and/or @prepare callback(s).
+     * Not final until the job has reached the CONCLUDED status.
+     * 0 on success, -errno on failure.
+     */
     int ret;
 
     /**
@@ -172,7 +176,16 @@ struct JobDriver {
     /** Enum describing the operation */
     JobType job_type;
 
-    /** Mandatory: Entrypoint for the Coroutine. */
+    /**
+     * Mandatory: Entrypoint for the Coroutine.
+     *
+     * This callback will be invoked when moving from CREATED to RUNNING.
+     *
+     * If this callback returns nonzero, the job transaction it is part of is
+     * aborted. If it returns zero, the job moves into the WAITING state. If it
+     * is the last job to complete in its transaction, all jobs in the
+     * transaction move from WAITING to PENDING.
+     */
     int coroutine_fn (*run)(Job *job, Error **errp);
 
     /**
@@ -496,17 +509,6 @@ void job_early_fail(Job *job);
 /** Moves the @job from RUNNING to READY */
 void job_transition_to_ready(Job *job);
 
-/**
- * @job: The job being completed.
- * @ret: The status code.
- *
- * Marks @job as completed. If @ret is non-zero, the job transaction it is part
- * of is aborted. If @ret is zero, the job moves into the WAITING state. If it
- * is the last job to complete in its transaction, all jobs in the transaction
- * move from WAITING to PENDING.
- */
-void job_completed(Job *job, int ret);
-
 /** Asynchronously complete the specified @job. */
 void job_complete(Job *job, Error **errp);
 
diff --git a/job.c b/job.c
index bc8dad4e71..213042b762 100644
--- a/job.c
+++ b/job.c
@@ -535,6 +535,8 @@ void job_drain(Job *job)
     }
 }
 
+static void job_completed(Job *job);
+
 static void job_exit(void *opaque)
 {
     Job *job = (Job *)opaque;
@@ -545,7 +547,7 @@ static void job_exit(void *opaque)
         job->driver->exit(job);
         aio_context_release(aio_context);
     }
-    job_completed(job, job->ret);
+    job_completed(job);
 }
 
 /**
@@ -883,13 +885,12 @@ static void job_completed_txn_success(Job *job)
     }
 }
 
-void job_completed(Job *job, int ret)
+static void job_completed(Job *job)
 {
     assert(job && job->txn && !job_is_completed(job));
 
-    job->ret = ret;
     job_update_rc(job);
-    trace_job_completed(job, ret, job->ret);
+    trace_job_completed(job, job->ret);
     if (job->ret) {
         job_completed_txn_abort(job);
     } else {
@@ -905,7 +906,7 @@ void job_cancel(Job *job, bool force)
     }
     job_cancel_async(job, force);
     if (!job_started(job)) {
-        job_completed(job, -ECANCELED);
+        job_completed(job);
     } else if (job->deferred_to_main_loop) {
         job_completed_txn_abort(job);
     } else {
diff --git a/trace-events b/trace-events
index c445f54773..4fd2cb4b97 100644
--- a/trace-events
+++ b/trace-events
@@ -107,7 +107,7 @@ gdbstub_err_checksum_incorrect(uint8_t expected, uint8_t got) "got command packe
 # job.c
 job_state_transition(void *job,  int ret, const char *legal, const char *s0, const char *s1) "job %p (ret: %d) attempting %s transition (%s-->%s)"
 job_apply_verb(void *job, const char *state, const char *verb, const char *legal) "job %p in state %s; applying verb %s (%s)"
-job_completed(void *job, int ret, int jret) "job %p ret %d corrected ret %d"
+job_completed(void *job, int ret) "job %p ret %d"
 
 # job-qmp.c
 qmp_job_cancel(void *job) "job %p"
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v3 9/9] jobs: remove job_defer_to_main_loop
  2018-08-30  1:57 [Qemu-devel] [PATCH v3 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
                   ` (7 preceding siblings ...)
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 8/9] jobs: remove ret argument to job_completed; privatize it John Snow
@ 2018-08-30  1:57 ` John Snow
  2018-08-31 14:12 ` [Qemu-devel] [PATCH v3 0/9] jobs: Job Exit Refactoring Pt 1 Max Reitz
  9 siblings, 0 replies; 24+ messages in thread
From: John Snow @ 2018-08-30  1:57 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: kwolf, Jeff Cody, Stefan Hajnoczi, Max Reitz, jtc, John Snow

Now that the job infrastructure is handling the job_completed call for
all implemented jobs, we can remove the interface that allowed jobs to
schedule their own completion.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 include/qemu/job.h | 17 -----------------
 job.c              | 40 ++--------------------------------------
 2 files changed, 2 insertions(+), 55 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 23395c17fa..e0cff702b7 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -568,23 +568,6 @@ void job_finalize(Job *job, Error **errp);
  */
 void job_dismiss(Job **job, Error **errp);
 
-typedef void JobDeferToMainLoopFn(Job *job, void *opaque);
-
-/**
- * @job: The job
- * @fn: The function to run in the main loop
- * @opaque: The opaque value that is passed to @fn
- *
- * This function must be called by the main job coroutine just before it
- * returns.  @fn is executed in the main loop with the job AioContext acquired.
- *
- * Block jobs must call bdrv_unref(), bdrv_close(), and anything that uses
- * bdrv_drain_all() in the main loop.
- *
- * The @job AioContext is held while @fn executes.
- */
-void job_defer_to_main_loop(Job *job, JobDeferToMainLoopFn *fn, void *opaque);
-
 /**
  * Synchronously finishes the given @job. If @finish is given, it is called to
  * trigger completion or cancellation of the job.
diff --git a/job.c b/job.c
index 213042b762..01dd97fee3 100644
--- a/job.c
+++ b/job.c
@@ -561,12 +561,8 @@ static void coroutine_fn job_co_entry(void *opaque)
     assert(job && job->driver && job->driver->run);
     job_pause_point(job);
     job->ret = job->driver->run(job, &job->err);
-    if (!job->deferred_to_main_loop) {
-        job->deferred_to_main_loop = true;
-        aio_bh_schedule_oneshot(qemu_get_aio_context(),
-                                job_exit,
-                                job);
-    }
+    job->deferred_to_main_loop = true;
+    aio_bh_schedule_oneshot(qemu_get_aio_context(), job_exit, job);
 }
 
 
@@ -969,38 +965,6 @@ void job_complete(Job *job, Error **errp)
     job->driver->complete(job, errp);
 }
 
-
-typedef struct {
-    Job *job;
-    JobDeferToMainLoopFn *fn;
-    void *opaque;
-} JobDeferToMainLoopData;
-
-static void job_defer_to_main_loop_bh(void *opaque)
-{
-    JobDeferToMainLoopData *data = opaque;
-    Job *job = data->job;
-    AioContext *aio_context = job->aio_context;
-
-    aio_context_acquire(aio_context);
-    data->fn(data->job, data->opaque);
-    aio_context_release(aio_context);
-
-    g_free(data);
-}
-
-void job_defer_to_main_loop(Job *job, JobDeferToMainLoopFn *fn, void *opaque)
-{
-    JobDeferToMainLoopData *data = g_malloc(sizeof(*data));
-    data->job = job;
-    data->fn = fn;
-    data->opaque = opaque;
-    job->deferred_to_main_loop = true;
-
-    aio_bh_schedule_oneshot(qemu_get_aio_context(),
-                            job_defer_to_main_loop_bh, data);
-}
-
 int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
 {
     Error *local_err = NULL;
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/9] jobs: canonize Error object
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 2/9] jobs: canonize Error object John Snow
@ 2018-08-30 19:58   ` Eric Blake
  2018-08-31  6:08     ` Markus Armbruster
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Blake @ 2018-08-30 19:58 UTC (permalink / raw)
  To: John Snow, qemu-block, qemu-devel
  Cc: kwolf, Jeff Cody, Max Reitz, jtc, Stefan Hajnoczi

On 08/29/2018 08:57 PM, John Snow wrote:
> Jobs presently use both an Error object in the case of the create job,
> and char strings in the case of generic errors elsewhere.
> 
> Unify the two paths as just j->err, and remove the extra argument from
> job_completed. The integer error code for job_completed is kept for now,
> to be removed shortly in a separate patch.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---

> +++ b/job.c

> @@ -666,8 +666,8 @@ static void job_update_rc(Job *job)
>           job->ret = -ECANCELED;
>       }
>       if (job->ret) {
> -        if (!job->error) {
> -            job->error = g_strdup(strerror(-job->ret));
> +        if (!job->err) {
> +            error_setg(&job->err, "%s", g_strdup(strerror(-job->ret)));

Memleak. Drop the g_strdup(), and just directly pass strerror() results 
to error_setg().  (I guess we can't quite use error_setg_errno() unless 
we add additional text beyond the strerror() results).

With that fixed,
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/9] jobs: canonize Error object
  2018-08-30 19:58   ` Eric Blake
@ 2018-08-31  6:08     ` Markus Armbruster
  2018-08-31 15:23       ` John Snow
  0 siblings, 1 reply; 24+ messages in thread
From: Markus Armbruster @ 2018-08-31  6:08 UTC (permalink / raw)
  To: Eric Blake
  Cc: John Snow, qemu-block, qemu-devel, kwolf, Jeff Cody, jtc,
	Stefan Hajnoczi, Max Reitz

Eric Blake <eblake@redhat.com> writes:

> On 08/29/2018 08:57 PM, John Snow wrote:
>> Jobs presently use both an Error object in the case of the create job,
>> and char strings in the case of generic errors elsewhere.
>>
>> Unify the two paths as just j->err, and remove the extra argument from
>> job_completed. The integer error code for job_completed is kept for now,
>> to be removed shortly in a separate patch.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>
>> +++ b/job.c
>
>> @@ -666,8 +666,8 @@ static void job_update_rc(Job *job)
>>           job->ret = -ECANCELED;
>>       }
>>       if (job->ret) {
>> -        if (!job->error) {
>> -            job->error = g_strdup(strerror(-job->ret));
>> +        if (!job->err) {
>> +            error_setg(&job->err, "%s", g_strdup(strerror(-job->ret)));
>
> Memleak. Drop the g_strdup(), and just directly pass strerror()
> results to error_setg().  (I guess we can't quite use
> error_setg_errno() unless we add additional text beyond the strerror()
> results).

Adding such text might well be an improvement.  I'm not telling you to
do so (not having looked at the context myself), just to think about it.

> With that fixed,
> Reviewed-by: Eric Blake <eblake@redhat.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/9] block/mirror: utilize job_exit shim
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 5/9] block/mirror: " John Snow
@ 2018-08-31 13:23   ` Max Reitz
  2018-08-31 14:09   ` Jeff Cody
  1 sibling, 0 replies; 24+ messages in thread
From: Max Reitz @ 2018-08-31 13:23 UTC (permalink / raw)
  To: John Snow, qemu-block, qemu-devel; +Cc: kwolf, Jeff Cody, Stefan Hajnoczi, jtc

[-- Attachment #1: Type: text/plain, Size: 611 bytes --]

On 2018-08-30 03:57, John Snow wrote:
> Change the manual deferment to mirror_exit into the implicit
> callback to job_exit and the mirror_exit callback.
> 
> This does change the order of some bdrv_unref calls and job_completed,
> but thanks to the new context in which we call .exit, this is safe to
> defer the possible flushing of any nodes to the job_finalize_single
> cleanup stage.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  block/mirror.c | 29 +++++++++++------------------
>  1 file changed, 11 insertions(+), 18 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 8/9] jobs: remove ret argument to job_completed; privatize it
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 8/9] jobs: remove ret argument to job_completed; privatize it John Snow
@ 2018-08-31 13:25   ` Max Reitz
  0 siblings, 0 replies; 24+ messages in thread
From: Max Reitz @ 2018-08-31 13:25 UTC (permalink / raw)
  To: John Snow, qemu-block, qemu-devel; +Cc: kwolf, Jeff Cody, Stefan Hajnoczi, jtc

[-- Attachment #1: Type: text/plain, Size: 781 bytes --]

On 2018-08-30 03:57, John Snow wrote:
> Jobs are now expected to return their retcode on the stack, from the
> .run callback, so we can remove that argument.
> 
> job_cancel does not need to set -ECANCELED because job_completed will
> update the return code itself if the job was canceled.
> 
> While we're here, make job_completed static to job.c and remove it from
> job.h; move the documentation of return code to the .run() callback and
> to the job->ret property, accordingly.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  include/qemu/job.h | 28 +++++++++++++++-------------
>  job.c              | 11 ++++++-----
>  trace-events       |  2 +-
>  3 files changed, 22 insertions(+), 19 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/9] jobs: change start callback to run callback
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 1/9] jobs: change start callback to run callback John Snow
@ 2018-08-31 13:27   ` Jeff Cody
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff Cody @ 2018-08-31 13:27 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, qemu-devel, kwolf, Stefan Hajnoczi, Max Reitz

On Wed, Aug 29, 2018 at 09:57:26PM -0400, John Snow wrote:
> Presently we codify the entry point for a job as the "start" callback,
> but a more apt name would be "run" to clarify the idea that when this
> function returns we consider the job to have "finished," except for
> any cleanup which occurs in separate callbacks later.
> 
> As part of this clarification, change the signature to include an error
> object and a return code. The error ptr is not yet used, and the return
> code while captured, will be overwritten by actions in the job_completed
> function.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>

Reviewed-by: Jeff Cody <jcody@redhat.com>

> ---
>  block/backup.c            |  7 ++++---
>  block/commit.c            |  7 ++++---
>  block/create.c            |  8 +++++---
>  block/mirror.c            | 10 ++++++----
>  block/stream.c            |  7 ++++---
>  include/qemu/job.h        |  2 +-
>  job.c                     |  6 +++---
>  tests/test-bdrv-drain.c   |  7 ++++---
>  tests/test-blockjob-txn.c | 16 ++++++++--------
>  tests/test-blockjob.c     |  7 ++++---
>  10 files changed, 43 insertions(+), 34 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 8630d32926..5d47781840 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -480,9 +480,9 @@ static void backup_incremental_init_copy_bitmap(BackupBlockJob *job)
>      bdrv_dirty_iter_free(dbi);
>  }
>  
> -static void coroutine_fn backup_run(void *opaque)
> +static int coroutine_fn backup_run(Job *opaque_job, Error **errp)
>  {
> -    BackupBlockJob *job = opaque;
> +    BackupBlockJob *job = container_of(opaque_job, BackupBlockJob, common.job);
>      BackupCompleteData *data;
>      BlockDriverState *bs = blk_bs(job->common.blk);
>      int64_t offset, nb_clusters;
> @@ -587,6 +587,7 @@ static void coroutine_fn backup_run(void *opaque)
>      data = g_malloc(sizeof(*data));
>      data->ret = ret;
>      job_defer_to_main_loop(&job->common.job, backup_complete, data);
> +    return ret;
>  }
>  
>  static const BlockJobDriver backup_job_driver = {
> @@ -596,7 +597,7 @@ static const BlockJobDriver backup_job_driver = {
>          .free                   = block_job_free,
>          .user_resume            = block_job_user_resume,
>          .drain                  = block_job_drain,
> -        .start                  = backup_run,
> +        .run                    = backup_run,
>          .commit                 = backup_commit,
>          .abort                  = backup_abort,
>          .clean                  = backup_clean,
> diff --git a/block/commit.c b/block/commit.c
> index eb414579bd..a0ea86ff64 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -134,9 +134,9 @@ static void commit_complete(Job *job, void *opaque)
>      bdrv_unref(top);
>  }
>  
> -static void coroutine_fn commit_run(void *opaque)
> +static int coroutine_fn commit_run(Job *job, Error **errp)
>  {
> -    CommitBlockJob *s = opaque;
> +    CommitBlockJob *s = container_of(job, CommitBlockJob, common.job);
>      CommitCompleteData *data;
>      int64_t offset;
>      uint64_t delay_ns = 0;
> @@ -213,6 +213,7 @@ out:
>      data = g_malloc(sizeof(*data));
>      data->ret = ret;
>      job_defer_to_main_loop(&s->common.job, commit_complete, data);
> +    return ret;
>  }
>  
>  static const BlockJobDriver commit_job_driver = {
> @@ -222,7 +223,7 @@ static const BlockJobDriver commit_job_driver = {
>          .free          = block_job_free,
>          .user_resume   = block_job_user_resume,
>          .drain         = block_job_drain,
> -        .start         = commit_run,
> +        .run           = commit_run,
>      },
>  };
>  
> diff --git a/block/create.c b/block/create.c
> index 915cd41bcc..04733c3618 100644
> --- a/block/create.c
> +++ b/block/create.c
> @@ -45,9 +45,9 @@ static void blockdev_create_complete(Job *job, void *opaque)
>      job_completed(job, s->ret, s->err);
>  }
>  
> -static void coroutine_fn blockdev_create_run(void *opaque)
> +static int coroutine_fn blockdev_create_run(Job *job, Error **errp)
>  {
> -    BlockdevCreateJob *s = opaque;
> +    BlockdevCreateJob *s = container_of(job, BlockdevCreateJob, common);
>  
>      job_progress_set_remaining(&s->common, 1);
>      s->ret = s->drv->bdrv_co_create(s->opts, &s->err);
> @@ -55,12 +55,14 @@ static void coroutine_fn blockdev_create_run(void *opaque)
>  
>      qapi_free_BlockdevCreateOptions(s->opts);
>      job_defer_to_main_loop(&s->common, blockdev_create_complete, NULL);
> +
> +    return s->ret;
>  }
>  
>  static const JobDriver blockdev_create_job_driver = {
>      .instance_size = sizeof(BlockdevCreateJob),
>      .job_type      = JOB_TYPE_CREATE,
> -    .start         = blockdev_create_run,
> +    .run           = blockdev_create_run,
>  };
>  
>  void qmp_blockdev_create(const char *job_id, BlockdevCreateOptions *options,
> diff --git a/block/mirror.c b/block/mirror.c
> index 6cc10df5c9..691763db41 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -812,9 +812,9 @@ static int mirror_flush(MirrorBlockJob *s)
>      return ret;
>  }
>  
> -static void coroutine_fn mirror_run(void *opaque)
> +static int coroutine_fn mirror_run(Job *job, Error **errp)
>  {
> -    MirrorBlockJob *s = opaque;
> +    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
>      MirrorExitData *data;
>      BlockDriverState *bs = s->mirror_top_bs->backing->bs;
>      BlockDriverState *target_bs = blk_bs(s->target);
> @@ -1041,7 +1041,9 @@ immediate_exit:
>      if (need_drain) {
>          bdrv_drained_begin(bs);
>      }
> +
>      job_defer_to_main_loop(&s->common.job, mirror_exit, data);
> +    return ret;
>  }
>  
>  static void mirror_complete(Job *job, Error **errp)
> @@ -1138,7 +1140,7 @@ static const BlockJobDriver mirror_job_driver = {
>          .free                   = block_job_free,
>          .user_resume            = block_job_user_resume,
>          .drain                  = block_job_drain,
> -        .start                  = mirror_run,
> +        .run                    = mirror_run,
>          .pause                  = mirror_pause,
>          .complete               = mirror_complete,
>      },
> @@ -1154,7 +1156,7 @@ static const BlockJobDriver commit_active_job_driver = {
>          .free                   = block_job_free,
>          .user_resume            = block_job_user_resume,
>          .drain                  = block_job_drain,
> -        .start                  = mirror_run,
> +        .run                    = mirror_run,
>          .pause                  = mirror_pause,
>          .complete               = mirror_complete,
>      },
> diff --git a/block/stream.c b/block/stream.c
> index 9264b68a1e..b4b987df7e 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -97,9 +97,9 @@ out:
>      g_free(data);
>  }
>  
> -static void coroutine_fn stream_run(void *opaque)
> +static int coroutine_fn stream_run(Job *job, Error **errp)
>  {
> -    StreamBlockJob *s = opaque;
> +    StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>      StreamCompleteData *data;
>      BlockBackend *blk = s->common.blk;
>      BlockDriverState *bs = blk_bs(blk);
> @@ -206,6 +206,7 @@ out:
>      data = g_malloc(sizeof(*data));
>      data->ret = ret;
>      job_defer_to_main_loop(&s->common.job, stream_complete, data);
> +    return ret;
>  }
>  
>  static const BlockJobDriver stream_job_driver = {
> @@ -213,7 +214,7 @@ static const BlockJobDriver stream_job_driver = {
>          .instance_size = sizeof(StreamBlockJob),
>          .job_type      = JOB_TYPE_STREAM,
>          .free          = block_job_free,
> -        .start         = stream_run,
> +        .run           = stream_run,
>          .user_resume   = block_job_user_resume,
>          .drain         = block_job_drain,
>      },
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index 18c9223e31..9cf463d228 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -169,7 +169,7 @@ struct JobDriver {
>      JobType job_type;
>  
>      /** Mandatory: Entrypoint for the Coroutine. */
> -    CoroutineEntry *start;
> +    int coroutine_fn (*run)(Job *job, Error **errp);
>  
>      /**
>       * If the callback is not NULL, it will be invoked when the job transitions
> diff --git a/job.c b/job.c
> index e36ebaafd8..76988f6678 100644
> --- a/job.c
> +++ b/job.c
> @@ -544,16 +544,16 @@ static void coroutine_fn job_co_entry(void *opaque)
>  {
>      Job *job = opaque;
>  
> -    assert(job && job->driver && job->driver->start);
> +    assert(job && job->driver && job->driver->run);
>      job_pause_point(job);
> -    job->driver->start(job);
> +    job->ret = job->driver->run(job, NULL);
>  }
>  
>  
>  void job_start(Job *job)
>  {
>      assert(job && !job_started(job) && job->paused &&
> -           job->driver && job->driver->start);
> +           job->driver && job->driver->run);
>      job->co = qemu_coroutine_create(job_co_entry, job);
>      job->pause_count--;
>      job->busy = true;
> diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
> index 17bb8508ae..a7533861f6 100644
> --- a/tests/test-bdrv-drain.c
> +++ b/tests/test-bdrv-drain.c
> @@ -757,9 +757,9 @@ static void test_job_completed(Job *job, void *opaque)
>      job_completed(job, 0, NULL);
>  }
>  
> -static void coroutine_fn test_job_start(void *opaque)
> +static int coroutine_fn test_job_run(Job *job, Error **errp)
>  {
> -    TestBlockJob *s = opaque;
> +    TestBlockJob *s = container_of(job, TestBlockJob, common.job);
>  
>      job_transition_to_ready(&s->common.job);
>      while (!s->should_complete) {
> @@ -771,6 +771,7 @@ static void coroutine_fn test_job_start(void *opaque)
>      }
>  
>      job_defer_to_main_loop(&s->common.job, test_job_completed, NULL);
> +    return 0;
>  }
>  
>  static void test_job_complete(Job *job, Error **errp)
> @@ -785,7 +786,7 @@ BlockJobDriver test_job_driver = {
>          .free           = block_job_free,
>          .user_resume    = block_job_user_resume,
>          .drain          = block_job_drain,
> -        .start          = test_job_start,
> +        .run            = test_job_run,
>          .complete       = test_job_complete,
>      },
>  };
> diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
> index 58d9b87fb2..3194924071 100644
> --- a/tests/test-blockjob-txn.c
> +++ b/tests/test-blockjob-txn.c
> @@ -38,25 +38,25 @@ static void test_block_job_complete(Job *job, void *opaque)
>      bdrv_unref(bs);
>  }
>  
> -static void coroutine_fn test_block_job_run(void *opaque)
> +static int coroutine_fn test_block_job_run(Job *job, Error **errp)
>  {
> -    TestBlockJob *s = opaque;
> -    BlockJob *job = &s->common;
> +    TestBlockJob *s = container_of(job, TestBlockJob, common.job);
>  
>      while (s->iterations--) {
>          if (s->use_timer) {
> -            job_sleep_ns(&job->job, 0);
> +            job_sleep_ns(job, 0);
>          } else {
> -            job_yield(&job->job);
> +            job_yield(job);
>          }
>  
> -        if (job_is_cancelled(&job->job)) {
> +        if (job_is_cancelled(job)) {
>              break;
>          }
>      }
>  
> -    job_defer_to_main_loop(&job->job, test_block_job_complete,
> +    job_defer_to_main_loop(job, test_block_job_complete,
>                             (void *)(intptr_t)s->rc);
> +    return s->rc;
>  }
>  
>  typedef struct {
> @@ -80,7 +80,7 @@ static const BlockJobDriver test_block_job_driver = {
>          .free          = block_job_free,
>          .user_resume   = block_job_user_resume,
>          .drain         = block_job_drain,
> -        .start         = test_block_job_run,
> +        .run           = test_block_job_run,
>      },
>  };
>  
> diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
> index cb42f06e61..b0462bfdec 100644
> --- a/tests/test-blockjob.c
> +++ b/tests/test-blockjob.c
> @@ -176,9 +176,9 @@ static void cancel_job_complete(Job *job, Error **errp)
>      s->should_complete = true;
>  }
>  
> -static void coroutine_fn cancel_job_start(void *opaque)
> +static int coroutine_fn cancel_job_run(Job *job, Error **errp)
>  {
> -    CancelJob *s = opaque;
> +    CancelJob *s = container_of(job, CancelJob, common.job);
>  
>      while (!s->should_complete) {
>          if (job_is_cancelled(&s->common.job)) {
> @@ -194,6 +194,7 @@ static void coroutine_fn cancel_job_start(void *opaque)
>  
>   defer:
>      job_defer_to_main_loop(&s->common.job, cancel_job_completed, s);
> +    return 0;
>  }
>  
>  static const BlockJobDriver test_cancel_driver = {
> @@ -202,7 +203,7 @@ static const BlockJobDriver test_cancel_driver = {
>          .free          = block_job_free,
>          .user_resume   = block_job_user_resume,
>          .drain         = block_job_drain,
> -        .start         = cancel_job_start,
> +        .run           = cancel_job_run,
>          .complete      = cancel_job_complete,
>      },
>  };
> -- 
> 2.14.4
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/9] jobs: add exit shim
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 3/9] jobs: add exit shim John Snow
@ 2018-08-31 13:48   ` Jeff Cody
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff Cody @ 2018-08-31 13:48 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, qemu-devel, kwolf, Stefan Hajnoczi, Max Reitz

On Wed, Aug 29, 2018 at 09:57:28PM -0400, John Snow wrote:
> All jobs do the same thing when they leave their running loop:
> - Store the return code in a structure
> - wait to receive this structure in the main thread
> - signal job completion via job_completed
> 
> Few jobs do anything beyond exactly this. Consolidate this exit
> logic for a net reduction in SLOC.
> 
> More seriously, when we utilize job_defer_to_main_loop_bh to call
> a function that calls job_completed, job_finalize_single will run
> in a context where it has recursively taken the aio_context lock,
> which can cause hangs if it puts down a reference that causes a flush.
> 
> You can observe this in practice by looking at mirror_exit's careful
> placement of job_completed and bdrv_unref calls.
> 
> If we centralize job exiting, we can signal job completion from outside
> of the aio_context, which should allow for job cleanup code to run with
> only one lock, which makes cleanup callbacks less tricky to write.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>

Reviewed-by: Jeff Cody <jcody@redhat.com>

> ---
>  include/qemu/job.h | 11 +++++++++++
>  job.c              | 18 ++++++++++++++++++
>  2 files changed, 29 insertions(+)
> 
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index e0e99870a1..1144d671a1 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -208,6 +208,17 @@ struct JobDriver {
>       */
>      void (*drain)(Job *job);
>  
> +    /**
> +     * If the callback is not NULL, exit will be invoked from the main thread
> +     * when the job's coroutine has finished, but before transactional
> +     * convergence; before @prepare or @abort.
> +     *
> +     * FIXME TODO: This callback is only temporary to transition remaining jobs
> +     * to prepare/commit/abort/clean callbacks and will be removed before 3.1.
> +     * is released.
> +     */
> +    void (*exit)(Job *job);
> +
>      /**
>       * If the callback is not NULL, prepare will be invoked when all the jobs
>       * belonging to the same transaction complete; or upon this job's completion
> diff --git a/job.c b/job.c
> index bc1d970df4..bc8dad4e71 100644
> --- a/job.c
> +++ b/job.c
> @@ -535,6 +535,18 @@ void job_drain(Job *job)
>      }
>  }
>  
> +static void job_exit(void *opaque)
> +{
> +    Job *job = (Job *)opaque;
> +    AioContext *aio_context = job->aio_context;
> +
> +    if (job->driver->exit) {
> +        aio_context_acquire(aio_context);
> +        job->driver->exit(job);
> +        aio_context_release(aio_context);
> +    }
> +    job_completed(job, job->ret);
> +}
>  
>  /**
>   * All jobs must allow a pause point before entering their job proper. This
> @@ -547,6 +559,12 @@ static void coroutine_fn job_co_entry(void *opaque)
>      assert(job && job->driver && job->driver->run);
>      job_pause_point(job);
>      job->ret = job->driver->run(job, &job->err);
> +    if (!job->deferred_to_main_loop) {
> +        job->deferred_to_main_loop = true;
> +        aio_bh_schedule_oneshot(qemu_get_aio_context(),
> +                                job_exit,
> +                                job);
> +    }
>  }
>  
>  
> -- 
> 2.14.4
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/9] block/commit: utilize job_exit shim
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 4/9] block/commit: utilize job_exit shim John Snow
@ 2018-08-31 13:58   ` Jeff Cody
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff Cody @ 2018-08-31 13:58 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, qemu-devel, kwolf, Stefan Hajnoczi, Max Reitz

On Wed, Aug 29, 2018 at 09:57:29PM -0400, John Snow wrote:
> Change the manual deferment to commit_complete into the implicit
> callback to job_exit, renaming commit_complete to commit_exit.
> 
> This conversion does change the timing of when job_completed is
> called to after the bdrv_replace_node and bdrv_unref calls, which
> could have implications for bjob->blk which will now be put down
> after this cleanup.
> 
> Kevin highlights that we did not take any permissions for that backend
> at job creation time, so it is safe to reorder these operations.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/commit.c | 22 +++++-----------------
>  1 file changed, 5 insertions(+), 17 deletions(-)
> 
> diff --git a/block/commit.c b/block/commit.c
> index 4a17bb73ec..da69165de3 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -68,19 +68,13 @@ static int coroutine_fn commit_populate(BlockBackend *bs, BlockBackend *base,
>      return 0;
>  }
>  
> -typedef struct {
> -    int ret;
> -} CommitCompleteData;
> -
> -static void commit_complete(Job *job, void *opaque)
> +static void commit_exit(Job *job)
>  {
>      CommitBlockJob *s = container_of(job, CommitBlockJob, common.job);
>      BlockJob *bjob = &s->common;
> -    CommitCompleteData *data = opaque;
>      BlockDriverState *top = blk_bs(s->top);
>      BlockDriverState *base = blk_bs(s->base);
>      BlockDriverState *commit_top_bs = s->commit_top_bs;
> -    int ret = data->ret;
>      bool remove_commit_top_bs = false;
>  
>      /* Make sure commit_top_bs and top stay around until bdrv_replace_node() */
> @@ -91,10 +85,10 @@ static void commit_complete(Job *job, void *opaque)
>       * the normal backing chain can be restored. */
>      blk_unref(s->base);
>  
> -    if (!job_is_cancelled(job) && ret == 0) {
> +    if (!job_is_cancelled(job) && job->ret == 0) {
>          /* success */
> -        ret = bdrv_drop_intermediate(s->commit_top_bs, base,
> -                                     s->backing_file_str);
> +        job->ret = bdrv_drop_intermediate(s->commit_top_bs, base,
> +                                          s->backing_file_str);
>      } else {
>          /* XXX Can (or should) we somehow keep 'consistent read' blocked even
>           * after the failed/cancelled commit job is gone? If we already wrote
> @@ -117,9 +111,6 @@ static void commit_complete(Job *job, void *opaque)
>       * bdrv_set_backing_hd() to fail. */
>      block_job_remove_all_bdrv(bjob);
>  
> -    job_completed(job, ret);
> -    g_free(data);
> -

Not having to allocate, track, and free this cumbersome return value for
each of these job specific completions now is pretty nice.

Reviewed-by: Jeff Cody <jcody@redhat.com>


>      /* If bdrv_drop_intermediate() didn't already do that, remove the commit
>       * filter driver from the backing chain. Do this as the final step so that
>       * the 'consistent read' permission can be granted.  */
> @@ -137,7 +128,6 @@ static void commit_complete(Job *job, void *opaque)
>  static int coroutine_fn commit_run(Job *job, Error **errp)
>  {
>      CommitBlockJob *s = container_of(job, CommitBlockJob, common.job);
> -    CommitCompleteData *data;
>      int64_t offset;
>      uint64_t delay_ns = 0;
>      int ret = 0;
> @@ -210,9 +200,6 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
>  out:
>      qemu_vfree(buf);
>  
> -    data = g_malloc(sizeof(*data));
> -    data->ret = ret;
> -    job_defer_to_main_loop(&s->common.job, commit_complete, data);
>      return ret;
>  }
>  
> @@ -224,6 +211,7 @@ static const BlockJobDriver commit_job_driver = {
>          .user_resume   = block_job_user_resume,
>          .drain         = block_job_drain,
>          .run           = commit_run,
> +        .exit          = commit_exit,
>      },
>  };
>  
> -- 
> 2.14.4
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/9] block/mirror: utilize job_exit shim
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 5/9] block/mirror: " John Snow
  2018-08-31 13:23   ` Max Reitz
@ 2018-08-31 14:09   ` Jeff Cody
  1 sibling, 0 replies; 24+ messages in thread
From: Jeff Cody @ 2018-08-31 14:09 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, qemu-devel, kwolf, Stefan Hajnoczi, Max Reitz

On Wed, Aug 29, 2018 at 09:57:30PM -0400, John Snow wrote:
> Change the manual deferment to mirror_exit into the implicit
> callback to job_exit and the mirror_exit callback.
> 
> This does change the order of some bdrv_unref calls and job_completed,
> but thanks to the new context in which we call .exit, this is safe to
> defer the possible flushing of any nodes to the job_finalize_single
> cleanup stage.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>

Reviewed-by: Jeff Cody <jcody@redhat.com>

> ---
>  block/mirror.c | 29 +++++++++++------------------
>  1 file changed, 11 insertions(+), 18 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index be5dc6b7b0..b8941db6c1 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -607,26 +607,22 @@ static void mirror_wait_for_all_io(MirrorBlockJob *s)
>      }
>  }
>  
> -typedef struct {
> -    int ret;
> -} MirrorExitData;
> -
> -static void mirror_exit(Job *job, void *opaque)
> +static void mirror_exit(Job *job)
>  {
>      MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
>      BlockJob *bjob = &s->common;
> -    MirrorExitData *data = opaque;
>      MirrorBDSOpaque *bs_opaque = s->mirror_top_bs->opaque;
>      AioContext *replace_aio_context = NULL;
>      BlockDriverState *src = s->mirror_top_bs->backing->bs;
>      BlockDriverState *target_bs = blk_bs(s->target);
>      BlockDriverState *mirror_top_bs = s->mirror_top_bs;
>      Error *local_err = NULL;
> +    int ret = job->ret;
>  
>      bdrv_release_dirty_bitmap(src, s->dirty_bitmap);
>  
> -    /* Make sure that the source BDS doesn't go away before we called
> -     * job_completed(). */
> +    /* Make sure that the source BDS doesn't go away during bdrv_replace_node,
> +     * before we can call bdrv_drained_end */
>      bdrv_ref(src);
>      bdrv_ref(mirror_top_bs);
>      bdrv_ref(target_bs);
> @@ -652,7 +648,7 @@ static void mirror_exit(Job *job, void *opaque)
>              bdrv_set_backing_hd(target_bs, backing, &local_err);
>              if (local_err) {
>                  error_report_err(local_err);
> -                data->ret = -EPERM;
> +                ret = -EPERM;
>              }
>          }
>      }
> @@ -662,7 +658,7 @@ static void mirror_exit(Job *job, void *opaque)
>          aio_context_acquire(replace_aio_context);
>      }
>  
> -    if (s->should_complete && data->ret == 0) {
> +    if (s->should_complete && ret == 0) {
>          BlockDriverState *to_replace = src;
>          if (s->to_replace) {
>              to_replace = s->to_replace;
> @@ -679,7 +675,7 @@ static void mirror_exit(Job *job, void *opaque)
>          bdrv_drained_end(target_bs);
>          if (local_err) {
>              error_report_err(local_err);
> -            data->ret = -EPERM;
> +            ret = -EPERM;
>          }
>      }
>      if (s->to_replace) {
> @@ -710,12 +706,12 @@ static void mirror_exit(Job *job, void *opaque)
>      blk_insert_bs(bjob->blk, mirror_top_bs, &error_abort);
>  
>      bs_opaque->job = NULL;
> -    job_completed(job, data->ret);
>  
> -    g_free(data);
>      bdrv_drained_end(src);
>      bdrv_unref(mirror_top_bs);
>      bdrv_unref(src);
> +
> +    job->ret = ret;
>  }
>  
>  static void mirror_throttle(MirrorBlockJob *s)
> @@ -815,7 +811,6 @@ static int mirror_flush(MirrorBlockJob *s)
>  static int coroutine_fn mirror_run(Job *job, Error **errp)
>  {
>      MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
> -    MirrorExitData *data;
>      BlockDriverState *bs = s->mirror_top_bs->backing->bs;
>      BlockDriverState *target_bs = blk_bs(s->target);
>      bool need_drain = true;
> @@ -1035,14 +1030,10 @@ immediate_exit:
>      g_free(s->in_flight_bitmap);
>      bdrv_dirty_iter_free(s->dbi);
>  
> -    data = g_malloc(sizeof(*data));
> -    data->ret = ret;
> -
>      if (need_drain) {
>          bdrv_drained_begin(bs);
>      }
>  
> -    job_defer_to_main_loop(&s->common.job, mirror_exit, data);
>      return ret;
>  }
>  
> @@ -1141,6 +1132,7 @@ static const BlockJobDriver mirror_job_driver = {
>          .user_resume            = block_job_user_resume,
>          .drain                  = block_job_drain,
>          .run                    = mirror_run,
> +        .exit                   = mirror_exit,
>          .pause                  = mirror_pause,
>          .complete               = mirror_complete,
>      },
> @@ -1157,6 +1149,7 @@ static const BlockJobDriver commit_active_job_driver = {
>          .user_resume            = block_job_user_resume,
>          .drain                  = block_job_drain,
>          .run                    = mirror_run,
> +        .exit                   = mirror_exit,
>          .pause                  = mirror_pause,
>          .complete               = mirror_complete,
>      },
> -- 
> 2.14.4
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/9] jobs: Job Exit Refactoring Pt 1
  2018-08-30  1:57 [Qemu-devel] [PATCH v3 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
                   ` (8 preceding siblings ...)
  2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 9/9] jobs: remove job_defer_to_main_loop John Snow
@ 2018-08-31 14:12 ` Max Reitz
  9 siblings, 0 replies; 24+ messages in thread
From: Max Reitz @ 2018-08-31 14:12 UTC (permalink / raw)
  To: John Snow, qemu-block, qemu-devel; +Cc: kwolf, Jeff Cody, Stefan Hajnoczi, jtc

[-- Attachment #1: Type: text/plain, Size: 1271 bytes --]

On 2018-08-30 03:57, John Snow wrote:
> This is part one of a two part series that refactors the exit logic
> of jobs.
> 
> Part one removes job_defer_to_main_loop.
> Part two removes the job->exit() callback introduced in part one.
> 
> It's redundant to have each job manage deferring to the main loop
> itself. Unifying this makes sense from an API standpoint.
> Doing so also allows us to remove a tricky case where the completion
> code is called under an aio_context lock, which then calls the
> finalization code which is itself executed under a second aio_context
> lock, leading to deadlocks.
> 
> Removing this recursive lock acquisition is necessary for converting
> mirror to only modify its graph post-finalization, but it's also just
> safer and will bite us less in the future.
> 
> This series introduces a job->exit callback, but after jobs are
> fully transitioned to using the .commit/.abort callbacks in Pt 2,
> this new completion callback will be removed again. It's only here
> as a crutch to let us investigate the completion refactoring in Pt 2
> more carefully.

Thanks, dropped the superfluous g_strdup() in patch 2 and applied to my
block branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/9] jobs: canonize Error object
  2018-08-31  6:08     ` Markus Armbruster
@ 2018-08-31 15:23       ` John Snow
  2018-09-01  7:54         ` Markus Armbruster
  0 siblings, 1 reply; 24+ messages in thread
From: John Snow @ 2018-08-31 15:23 UTC (permalink / raw)
  To: Markus Armbruster, Eric Blake
  Cc: qemu-block, qemu-devel, kwolf, Jeff Cody, jtc, Stefan Hajnoczi,
	Max Reitz



On 08/31/2018 02:08 AM, Markus Armbruster wrote:
> Eric Blake <eblake@redhat.com> writes:
> 
>> On 08/29/2018 08:57 PM, John Snow wrote:
>>> Jobs presently use both an Error object in the case of the create job,
>>> and char strings in the case of generic errors elsewhere.
>>>
>>> Unify the two paths as just j->err, and remove the extra argument from
>>> job_completed. The integer error code for job_completed is kept for now,
>>> to be removed shortly in a separate patch.
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>> ---
>>
>>> +++ b/job.c
>>
>>> @@ -666,8 +666,8 @@ static void job_update_rc(Job *job)
>>>           job->ret = -ECANCELED;
>>>       }
>>>       if (job->ret) {
>>> -        if (!job->error) {
>>> -            job->error = g_strdup(strerror(-job->ret));
>>> +        if (!job->err) {
>>> +            error_setg(&job->err, "%s", g_strdup(strerror(-job->ret)));
>>
>> Memleak. Drop the g_strdup(), and just directly pass strerror()
>> results to error_setg().  (I guess we can't quite use
>> error_setg_errno() unless we add additional text beyond the strerror()
>> results).
> 
> Adding such text might well be an improvement.  I'm not telling you to
> do so (not having looked at the context myself), just to think about it.
> 

In this case, and I agree with Kevin who suggested it; we ought to be
moving away from the retcode in general and using first-class error
objects for all of our jobs anyway.

In this case, the job has failed with a retcode and we wish to give the
user some hope of understanding why, but at this point in the code all
we know is what the strerror can tell us, so a generic prefix like "The
job failed" is not helpful because it will already be clear by events
and other things that the job failed.

The only text I can think of that would be useful is: "The job failed
and didn't give a more specific error message. Please email
qemu-devel@nongnu.org and harass the authors until they fix it. Anyway,
nice chatting to you, the generic error message is: %s"

--js

>> With that fixed,
>> Reviewed-by: Eric Blake <eblake@redhat.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/9] jobs: canonize Error object
  2018-08-31 15:23       ` John Snow
@ 2018-09-01  7:54         ` Markus Armbruster
  2018-09-03 12:22           ` Kevin Wolf
  0 siblings, 1 reply; 24+ messages in thread
From: Markus Armbruster @ 2018-09-01  7:54 UTC (permalink / raw)
  To: John Snow
  Cc: Eric Blake, kwolf, qemu-block, Jeff Cody, qemu-devel, Max Reitz,
	jtc, Stefan Hajnoczi

John Snow <jsnow@redhat.com> writes:

> On 08/31/2018 02:08 AM, Markus Armbruster wrote:
>> Eric Blake <eblake@redhat.com> writes:
>> 
>>> On 08/29/2018 08:57 PM, John Snow wrote:
>>>> Jobs presently use both an Error object in the case of the create job,
>>>> and char strings in the case of generic errors elsewhere.
>>>>
>>>> Unify the two paths as just j->err, and remove the extra argument from
>>>> job_completed. The integer error code for job_completed is kept for now,
>>>> to be removed shortly in a separate patch.
>>>>
>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>> ---
>>>
>>>> +++ b/job.c
>>>
>>>> @@ -666,8 +666,8 @@ static void job_update_rc(Job *job)
>>>>           job->ret = -ECANCELED;
>>>>       }
>>>>       if (job->ret) {
>>>> -        if (!job->error) {
>>>> -            job->error = g_strdup(strerror(-job->ret));
>>>> +        if (!job->err) {
>>>> +            error_setg(&job->err, "%s", g_strdup(strerror(-job->ret)));
>>>
>>> Memleak. Drop the g_strdup(), and just directly pass strerror()
>>> results to error_setg().  (I guess we can't quite use
>>> error_setg_errno() unless we add additional text beyond the strerror()
>>> results).
>> 
>> Adding such text might well be an improvement.  I'm not telling you to
>> do so (not having looked at the context myself), just to think about it.
>> 
>
> In this case, and I agree with Kevin who suggested it; we ought to be
> moving away from the retcode in general and using first-class error
> objects for all of our jobs anyway.
>
> In this case, the job has failed with a retcode and we wish to give the
> user some hope of understanding why, but at this point in the code all
> we know is what the strerror can tell us, so a generic prefix like "The
> job failed" is not helpful because it will already be clear by events
> and other things that the job failed.
>
> The only text I can think of that would be useful is: "The job failed
> and didn't give a more specific error message. Please email
> qemu-devel@nongnu.org and harass the authors until they fix it. Anyway,
> nice chatting to you, the generic error message is: %s"

That might well be an improvement ;)

Since I don't have a realistic example ready, I'm making up a contrieved
one:

--> {"execute": "job-frobnicate"}
<-- {"error": {"class": "GenericError", "desc": "Device or resource busy"}}

Would a reply

<-- {"error": {"class": "GenericError", "desc": "Job failed: Device or resource busy"}}

be better, worse, or a wash?

If it's a wash, maintainability breaks the tie.  So let's have a look at
the code.  It's either

            error_setg(&job->err, "%s", strerror(-job->ret));

or

            error_setg_errno(&job->err, -job->ret, "Job failed");

I'd prefer the latter, because it's the common way to put an errno code
into an Error object, and it lets me grep for the message more easily.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/9] jobs: canonize Error object
  2018-09-01  7:54         ` Markus Armbruster
@ 2018-09-03 12:22           ` Kevin Wolf
  2018-09-03 14:11             ` Markus Armbruster
  0 siblings, 1 reply; 24+ messages in thread
From: Kevin Wolf @ 2018-09-03 12:22 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: John Snow, Eric Blake, qemu-block, Jeff Cody, qemu-devel,
	Max Reitz, jtc, Stefan Hajnoczi

Am 01.09.2018 um 09:54 hat Markus Armbruster geschrieben:
> John Snow <jsnow@redhat.com> writes:
> 
> > On 08/31/2018 02:08 AM, Markus Armbruster wrote:
> >> Eric Blake <eblake@redhat.com> writes:
> >> 
> >>> On 08/29/2018 08:57 PM, John Snow wrote:
> >>>> Jobs presently use both an Error object in the case of the create job,
> >>>> and char strings in the case of generic errors elsewhere.
> >>>>
> >>>> Unify the two paths as just j->err, and remove the extra argument from
> >>>> job_completed. The integer error code for job_completed is kept for now,
> >>>> to be removed shortly in a separate patch.
> >>>>
> >>>> Signed-off-by: John Snow <jsnow@redhat.com>
> >>>> ---
> >>>
> >>>> +++ b/job.c
> >>>
> >>>> @@ -666,8 +666,8 @@ static void job_update_rc(Job *job)
> >>>>           job->ret = -ECANCELED;
> >>>>       }
> >>>>       if (job->ret) {
> >>>> -        if (!job->error) {
> >>>> -            job->error = g_strdup(strerror(-job->ret));
> >>>> +        if (!job->err) {
> >>>> +            error_setg(&job->err, "%s", g_strdup(strerror(-job->ret)));
> >>>
> >>> Memleak. Drop the g_strdup(), and just directly pass strerror()
> >>> results to error_setg().  (I guess we can't quite use
> >>> error_setg_errno() unless we add additional text beyond the strerror()
> >>> results).
> >> 
> >> Adding such text might well be an improvement.  I'm not telling you to
> >> do so (not having looked at the context myself), just to think about it.
> >> 
> >
> > In this case, and I agree with Kevin who suggested it; we ought to be
> > moving away from the retcode in general and using first-class error
> > objects for all of our jobs anyway.
> >
> > In this case, the job has failed with a retcode and we wish to give the
> > user some hope of understanding why, but at this point in the code all
> > we know is what the strerror can tell us, so a generic prefix like "The
> > job failed" is not helpful because it will already be clear by events
> > and other things that the job failed.
> >
> > The only text I can think of that would be useful is: "The job failed
> > and didn't give a more specific error message. Please email
> > qemu-devel@nongnu.org and harass the authors until they fix it. Anyway,
> > nice chatting to you, the generic error message is: %s"
> 
> That might well be an improvement ;)
> 
> Since I don't have a realistic example ready, I'm making up a contrieved
> one:
> 
> --> {"execute": "job-frobnicate"}
> <-- {"error": {"class": "GenericError", "desc": "Device or resource busy"}}
> 
> Would a reply
> 
> <-- {"error": {"class": "GenericError", "desc": "Job failed: Device or resource busy"}}
> 
> be better, worse, or a wash?
> 
> If it's a wash, maintainability breaks the tie.  So let's have a look at
> the code.  It's either
> 
>             error_setg(&job->err, "%s", strerror(-job->ret));
> 
> or
> 
>             error_setg_errno(&job->err, -job->ret, "Job failed");
> 
> I'd prefer the latter, because it's the common way to put an errno code
> into an Error object, and it lets me grep for the message more easily.

Basically, if I call "job-frobnicate", I don't want an error message to
tell me "job-frobnicate failed: foo". I already know that I called
"job-frobnicate", so the actually useful message is only "foo".

A prefix like "job-frobnicate failed" should be added when it's one of
multiple possible error sources. For example, if a higher level monitor
command internally involves job-frobnicate, but also three other
operations, this is the place where the prefix should be added to any
error message returned by job-frobnicate so that we can distinguish
which part of the operation failed.

Kevin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/9] jobs: canonize Error object
  2018-09-03 12:22           ` Kevin Wolf
@ 2018-09-03 14:11             ` Markus Armbruster
  2018-09-04 16:09               ` John Snow
  0 siblings, 1 reply; 24+ messages in thread
From: Markus Armbruster @ 2018-09-03 14:11 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, John Snow, Jeff Cody, qemu-devel, Max Reitz, jtc,
	Stefan Hajnoczi

Kevin Wolf <kwolf@redhat.com> writes:

> Am 01.09.2018 um 09:54 hat Markus Armbruster geschrieben:
>> John Snow <jsnow@redhat.com> writes:
>> 
>> > On 08/31/2018 02:08 AM, Markus Armbruster wrote:
>> >> Eric Blake <eblake@redhat.com> writes:
>> >> 
>> >>> On 08/29/2018 08:57 PM, John Snow wrote:
>> >>>> Jobs presently use both an Error object in the case of the create job,
>> >>>> and char strings in the case of generic errors elsewhere.
>> >>>>
>> >>>> Unify the two paths as just j->err, and remove the extra argument from
>> >>>> job_completed. The integer error code for job_completed is kept for now,
>> >>>> to be removed shortly in a separate patch.
>> >>>>
>> >>>> Signed-off-by: John Snow <jsnow@redhat.com>
>> >>>> ---
>> >>>
>> >>>> +++ b/job.c
>> >>>
>> >>>> @@ -666,8 +666,8 @@ static void job_update_rc(Job *job)
>> >>>>           job->ret = -ECANCELED;
>> >>>>       }
>> >>>>       if (job->ret) {
>> >>>> -        if (!job->error) {
>> >>>> -            job->error = g_strdup(strerror(-job->ret));
>> >>>> +        if (!job->err) {
>> >>>> +            error_setg(&job->err, "%s", g_strdup(strerror(-job->ret)));
>> >>>
>> >>> Memleak. Drop the g_strdup(), and just directly pass strerror()
>> >>> results to error_setg().  (I guess we can't quite use
>> >>> error_setg_errno() unless we add additional text beyond the strerror()
>> >>> results).
>> >> 
>> >> Adding such text might well be an improvement.  I'm not telling you to
>> >> do so (not having looked at the context myself), just to think about it.
>> >> 
>> >
>> > In this case, and I agree with Kevin who suggested it; we ought to be
>> > moving away from the retcode in general and using first-class error
>> > objects for all of our jobs anyway.
>> >
>> > In this case, the job has failed with a retcode and we wish to give the
>> > user some hope of understanding why, but at this point in the code all
>> > we know is what the strerror can tell us, so a generic prefix like "The
>> > job failed" is not helpful because it will already be clear by events
>> > and other things that the job failed.
>> >
>> > The only text I can think of that would be useful is: "The job failed
>> > and didn't give a more specific error message. Please email
>> > qemu-devel@nongnu.org and harass the authors until they fix it. Anyway,
>> > nice chatting to you, the generic error message is: %s"
>> 
>> That might well be an improvement ;)
>> 
>> Since I don't have a realistic example ready, I'm making up a contrieved
>> one:
>> 
>> --> {"execute": "job-frobnicate"}
>> <-- {"error": {"class": "GenericError", "desc": "Device or resource busy"}}
>> 
>> Would a reply
>> 
>> <-- {"error": {"class": "GenericError", "desc": "Job failed: Device or resource busy"}}
>> 
>> be better, worse, or a wash?
>> 
>> If it's a wash, maintainability breaks the tie.  So let's have a look at
>> the code.  It's either
>> 
>>             error_setg(&job->err, "%s", strerror(-job->ret));
>> 
>> or
>> 
>>             error_setg_errno(&job->err, -job->ret, "Job failed");
>> 
>> I'd prefer the latter, because it's the common way to put an errno code
>> into an Error object, and it lets me grep for the message more easily.
>
> Basically, if I call "job-frobnicate", I don't want an error message to
> tell me "job-frobnicate failed: foo". I already know that I called
> "job-frobnicate", so the actually useful message is only "foo".
>
> A prefix like "job-frobnicate failed" should be added when it's one of
> multiple possible error sources. For example, if a higher level monitor
> command internally involves job-frobnicate, but also three other
> operations, this is the place where the prefix should be added to any
> error message returned by job-frobnicate so that we can distinguish
> which part of the operation failed.

John's point is that the error message is bad no matter what.

My point is that if it's bad no matter what, we can just as well emit it
the usual way, with error_setg_errno(), rather than playing games with
strerror().

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/9] jobs: canonize Error object
  2018-09-03 14:11             ` Markus Armbruster
@ 2018-09-04 16:09               ` John Snow
  0 siblings, 0 replies; 24+ messages in thread
From: John Snow @ 2018-09-04 16:09 UTC (permalink / raw)
  To: Markus Armbruster, Kevin Wolf
  Cc: qemu-block, Jeff Cody, qemu-devel, Max Reitz, jtc, Stefan Hajnoczi



On 09/03/2018 10:11 AM, Markus Armbruster wrote:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
>> Am 01.09.2018 um 09:54 hat Markus Armbruster geschrieben:
>>> John Snow <jsnow@redhat.com> writes:
>>>
>>>> On 08/31/2018 02:08 AM, Markus Armbruster wrote:
>>>>> Eric Blake <eblake@redhat.com> writes:
>>>>>
>>>>>> On 08/29/2018 08:57 PM, John Snow wrote:
>>>>>>> Jobs presently use both an Error object in the case of the create job,
>>>>>>> and char strings in the case of generic errors elsewhere.
>>>>>>>
>>>>>>> Unify the two paths as just j->err, and remove the extra argument from
>>>>>>> job_completed. The integer error code for job_completed is kept for now,
>>>>>>> to be removed shortly in a separate patch.
>>>>>>>
>>>>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>>>>> ---
>>>>>>
>>>>>>> +++ b/job.c
>>>>>>
>>>>>>> @@ -666,8 +666,8 @@ static void job_update_rc(Job *job)
>>>>>>>           job->ret = -ECANCELED;
>>>>>>>       }
>>>>>>>       if (job->ret) {
>>>>>>> -        if (!job->error) {
>>>>>>> -            job->error = g_strdup(strerror(-job->ret));
>>>>>>> +        if (!job->err) {
>>>>>>> +            error_setg(&job->err, "%s", g_strdup(strerror(-job->ret)));
>>>>>>
>>>>>> Memleak. Drop the g_strdup(), and just directly pass strerror()
>>>>>> results to error_setg().  (I guess we can't quite use
>>>>>> error_setg_errno() unless we add additional text beyond the strerror()
>>>>>> results).
>>>>>
>>>>> Adding such text might well be an improvement.  I'm not telling you to
>>>>> do so (not having looked at the context myself), just to think about it.
>>>>>
>>>>
>>>> In this case, and I agree with Kevin who suggested it; we ought to be
>>>> moving away from the retcode in general and using first-class error
>>>> objects for all of our jobs anyway.
>>>>
>>>> In this case, the job has failed with a retcode and we wish to give the
>>>> user some hope of understanding why, but at this point in the code all
>>>> we know is what the strerror can tell us, so a generic prefix like "The
>>>> job failed" is not helpful because it will already be clear by events
>>>> and other things that the job failed.
>>>>
>>>> The only text I can think of that would be useful is: "The job failed
>>>> and didn't give a more specific error message. Please email
>>>> qemu-devel@nongnu.org and harass the authors until they fix it. Anyway,
>>>> nice chatting to you, the generic error message is: %s"
>>>
>>> That might well be an improvement ;)
>>>
>>> Since I don't have a realistic example ready, I'm making up a contrieved
>>> one:
>>>
>>> --> {"execute": "job-frobnicate"}
>>> <-- {"error": {"class": "GenericError", "desc": "Device or resource busy"}}
>>>
>>> Would a reply
>>>
>>> <-- {"error": {"class": "GenericError", "desc": "Job failed: Device or resource busy"}}
>>>
>>> be better, worse, or a wash?
>>>
>>> If it's a wash, maintainability breaks the tie.  So let's have a look at
>>> the code.  It's either
>>>
>>>             error_setg(&job->err, "%s", strerror(-job->ret));
>>>
>>> or
>>>
>>>             error_setg_errno(&job->err, -job->ret, "Job failed");
>>>
>>> I'd prefer the latter, because it's the common way to put an errno code
>>> into an Error object, and it lets me grep for the message more easily.
>>
>> Basically, if I call "job-frobnicate", I don't want an error message to
>> tell me "job-frobnicate failed: foo". I already know that I called
>> "job-frobnicate", so the actually useful message is only "foo".
>>
>> A prefix like "job-frobnicate failed" should be added when it's one of
>> multiple possible error sources. For example, if a higher level monitor
>> command internally involves job-frobnicate, but also three other
>> operations, this is the place where the prefix should be added to any
>> error message returned by job-frobnicate so that we can distinguish
>> which part of the operation failed.
> 
> John's point is that the error message is bad no matter what.
> 
> My point is that if it's bad no matter what, we can just as well emit it
> the usual way, with error_setg_errno(), rather than playing games with
> strerror().
> 

Not worth it, in my opinion. This way keeps visible behavior consistent
until we fully eradicate the retcode.

--js

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-09-04 16:09 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-30  1:57 [Qemu-devel] [PATCH v3 0/9] jobs: Job Exit Refactoring Pt 1 John Snow
2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 1/9] jobs: change start callback to run callback John Snow
2018-08-31 13:27   ` Jeff Cody
2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 2/9] jobs: canonize Error object John Snow
2018-08-30 19:58   ` Eric Blake
2018-08-31  6:08     ` Markus Armbruster
2018-08-31 15:23       ` John Snow
2018-09-01  7:54         ` Markus Armbruster
2018-09-03 12:22           ` Kevin Wolf
2018-09-03 14:11             ` Markus Armbruster
2018-09-04 16:09               ` John Snow
2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 3/9] jobs: add exit shim John Snow
2018-08-31 13:48   ` Jeff Cody
2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 4/9] block/commit: utilize job_exit shim John Snow
2018-08-31 13:58   ` Jeff Cody
2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 5/9] block/mirror: " John Snow
2018-08-31 13:23   ` Max Reitz
2018-08-31 14:09   ` Jeff Cody
2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 6/9] jobs: " John Snow
2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 7/9] block/backup: make function variables consistently named John Snow
2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 8/9] jobs: remove ret argument to job_completed; privatize it John Snow
2018-08-31 13:25   ` Max Reitz
2018-08-30  1:57 ` [Qemu-devel] [PATCH v3 9/9] jobs: remove job_defer_to_main_loop John Snow
2018-08-31 14:12 ` [Qemu-devel] [PATCH v3 0/9] jobs: Job Exit Refactoring Pt 1 Max Reitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.