All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/16] job: replace AioContext lock with job_mutex
@ 2022-01-05 14:01 Emanuele Giuseppe Esposito
  2022-01-05 14:01 ` [PATCH v3 01/16] job.c: make job_mutex and job_lock/unlock() public Emanuele Giuseppe Esposito
                   ` (16 more replies)
  0 siblings, 17 replies; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-05 14:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

In this series, we want to remove the AioContext lock and instead
use the already existent job_mutex to protect the job structures
and list. This is part of the work to get rid of AioContext lock
usage in favour of smaller granularity locks.

In order to simplify reviewer's job, job lock/unlock functions and
macros are added as empty prototypes (nop) in patch 1.
They are converted to use the actual job mutex only in the last
patch, 14. In this way we can freely create locking sections
without worrying about deadlocks with the aiocontext lock.

Patch 2 defines what fields in the job structure need protection,
and patches 3-4 categorize respectively locked and unlocked
functions in the job API.

Patch 5-9 are in preparation to the job locks, they try to reduce
the aiocontext critical sections and other minor fixes.

Patch 10-13 introduces the (nop) job lock into the job API and
its users, following the comments and categorizations done in
patch 2-3-4.

Patch 14 makes the prototypes in patch 1 use the job_mutex and
removes all aiocontext lock at the same time.

Tested this series by running unit tests, qemu-iotests and qtests
(x86_64).

This serie is based on my previous series "block layer: split
block APIs in global state and I/O".

Based-on: <20211124064418.3120601-1-eesposit@redhat.com>
---
v3:
* add "_locked" suffix to the functions called under job_mutex lock
* rename _job_lock in real_job_lock
* job_mutex is now public, and drivers like monitor use it directly
* introduce and protect job_get_aio_context
* remove mirror-specific APIs and just use WITH_JOB_GUARD
* more extensive use of WITH_JOB_GUARD and JOB_LOCK_GUARD

RFC v2:
* use JOB_LOCK_GUARD and WITH_JOB_LOCK_GUARD
* mu(u)ltiple typos in commit messages
* job API split patches are sent separately in another series
* use of empty job_{lock/unlock} and JOB_LOCK_GUARD/WITH_JOB_LOCK_GUARD
  to avoid deadlocks and simplify the reviewer job
* move patch 11 (block_job_query: remove atomic read) as last

Emanuele Giuseppe Esposito (16):
  job.c: make job_mutex and job_lock/unlock() public
  job.h: categorize fields in struct Job
  job.h: define locked functions
  job.h: define unlocked functions
  block/mirror.c: use of job helpers in drivers to avoid TOC/TOU
  job.c: make job_event_* functions static
  job.c: move inner aiocontext lock in callbacks
  aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED
  jobs: remove aiocontext locks since the functions are under BQL
  jobs: protect jobs with job_lock/unlock
  jobs: document all static functions and add _locked() suffix
  jobs: use job locks and helpers also in the unit tests
  jobs: add job lock in find_* functions
  job.c: use job_get_aio_context()
  job.c: enable job lock/unlock and remove Aiocontext locks
  block_job_query: remove atomic read

 block.c                          |  18 +-
 block/commit.c                   |   4 +-
 block/mirror.c                   |  21 +-
 block/replication.c              |  10 +-
 blockdev.c                       | 112 ++----
 blockjob.c                       | 122 +++---
 include/block/aio-wait.h         |  15 +-
 include/qemu/job.h               | 317 +++++++++++----
 job-qmp.c                        |  74 ++--
 job.c                            | 656 +++++++++++++++++++------------
 monitor/qmp-cmds.c               |   6 +-
 qemu-img.c                       |  41 +-
 tests/unit/test-bdrv-drain.c     |  46 +--
 tests/unit/test-block-iothread.c |  14 +-
 tests/unit/test-blockjob-txn.c   |  24 +-
 tests/unit/test-blockjob.c       |  98 ++---
 16 files changed, 947 insertions(+), 631 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 01/16] job.c: make job_mutex and job_lock/unlock() public
  2022-01-05 14:01 [PATCH v3 00/16] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
@ 2022-01-05 14:01 ` Emanuele Giuseppe Esposito
  2022-01-19  9:56   ` Paolo Bonzini
  2022-01-05 14:01 ` [PATCH v3 02/16] job.h: categorize fields in struct Job Emanuele Giuseppe Esposito
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-05 14:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

job mutex will be used to protect the job struct elements and list,
replacing AioContext locks.

Right now use a shared lock for all jobs, in order to keep things
simple. Once the AioContext lock is gone, we can introduce per-job
locks.

To simplify the switch from aiocontext to job lock, introduce
*nop* lock/unlock functions and macros. Once everything is protected
by jobs, we can add the mutex and remove the aiocontext.

Since job_mutex is already being used, add static
real_job_{lock/unlock}.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 include/qemu/job.h | 24 ++++++++++++++++++++++++
 job.c              | 35 +++++++++++++++++++++++------------
 2 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 915ceff425..8d0d370dda 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -312,6 +312,30 @@ typedef enum JobCreateFlags {
     JOB_MANUAL_DISMISS = 0x04,
 } JobCreateFlags;
 
+extern QemuMutex job_mutex;
+
+#define JOB_LOCK_GUARD() /* QEMU_LOCK_GUARD(&job_mutex) */
+
+#define WITH_JOB_LOCK_GUARD() /* WITH_QEMU_LOCK_GUARD(&job_mutex) */
+
+/**
+ * job_lock:
+ *
+ * Take the mutex protecting the list of jobs and their status.
+ * Most functions called by the monitor need to call job_lock
+ * and job_unlock manually.  On the other hand, function called
+ * by the block jobs themselves and by the block layer will take the
+ * lock for you.
+ */
+void job_lock(void);
+
+/**
+ * job_unlock:
+ *
+ * Release the mutex protecting the list of jobs and their status.
+ */
+void job_unlock(void);
+
 /**
  * Allocate and return a new job transaction. Jobs can be added to the
  * transaction using job_txn_add_job().
diff --git a/job.c b/job.c
index e048037099..ccf737a179 100644
--- a/job.c
+++ b/job.c
@@ -32,6 +32,12 @@
 #include "trace/trace-root.h"
 #include "qapi/qapi-events-job.h"
 
+/*
+ * job_mutex protects the jobs list, but also makes the
+ * struct job fields thread-safe.
+ */
+QemuMutex job_mutex;
+
 static QLIST_HEAD(, Job) jobs = QLIST_HEAD_INITIALIZER(jobs);
 
 /* Job State Transition Table */
@@ -74,17 +80,22 @@ struct JobTxn {
     int refcnt;
 };
 
-/* Right now, this mutex is only needed to synchronize accesses to job->busy
- * and job->sleep_timer, such as concurrent calls to job_do_yield and
- * job_enter. */
-static QemuMutex job_mutex;
+void job_lock(void)
+{
+    /* nop */
+}
+
+void job_unlock(void)
+{
+    /* nop */
+}
 
-static void job_lock(void)
+static void real_job_lock(void)
 {
     qemu_mutex_lock(&job_mutex);
 }
 
-static void job_unlock(void)
+static void real_job_unlock(void)
 {
     qemu_mutex_unlock(&job_mutex);
 }
@@ -449,21 +460,21 @@ void job_enter_cond(Job *job, bool(*fn)(Job *job))
         return;
     }
 
-    job_lock();
+    real_job_lock();
     if (job->busy) {
-        job_unlock();
+        real_job_unlock();
         return;
     }
 
     if (fn && !fn(job)) {
-        job_unlock();
+        real_job_unlock();
         return;
     }
 
     assert(!job->deferred_to_main_loop);
     timer_del(&job->sleep_timer);
     job->busy = true;
-    job_unlock();
+    real_job_unlock();
     aio_co_enter(job->aio_context, job->co);
 }
 
@@ -480,13 +491,13 @@ void job_enter(Job *job)
  * called explicitly. */
 static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
 {
-    job_lock();
+    real_job_lock();
     if (ns != -1) {
         timer_mod(&job->sleep_timer, ns);
     }
     job->busy = false;
     job_event_idle(job);
-    job_unlock();
+    real_job_unlock();
     qemu_coroutine_yield();
 
     /* Set by job_enter_cond() before re-entering the coroutine.  */
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 02/16] job.h: categorize fields in struct Job
  2022-01-05 14:01 [PATCH v3 00/16] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
  2022-01-05 14:01 ` [PATCH v3 01/16] job.c: make job_mutex and job_lock/unlock() public Emanuele Giuseppe Esposito
@ 2022-01-05 14:01 ` Emanuele Giuseppe Esposito
  2022-01-19  9:57   ` Paolo Bonzini
  2022-01-05 14:01 ` [PATCH v3 03/16] job.h: define locked functions Emanuele Giuseppe Esposito
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-05 14:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

Categorize the fields in struct Job to understand which ones
need to be protected by the job mutex and which don't.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 include/qemu/job.h | 63 +++++++++++++++++++++++++++-------------------
 1 file changed, 37 insertions(+), 26 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 8d0d370dda..0d348ff186 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -40,27 +40,52 @@ typedef struct JobTxn JobTxn;
  * Long-running operation.
  */
 typedef struct Job {
+
+    /* Fields set at initialization (job_create), and never modified */
+
     /** The ID of the job. May be NULL for internal jobs. */
     char *id;
 
-    /** The type of this job. */
+    /**
+     * The type of this job.
+     * All callbacks are called with job_mutex *not* held.
+     */
     const JobDriver *driver;
 
-    /** Reference count of the block job */
-    int refcnt;
-
-    /** Current state; See @JobStatus for details. */
-    JobStatus status;
-
-    /** AioContext to run the job coroutine in */
-    AioContext *aio_context;
-
     /**
      * The coroutine that executes the job.  If not NULL, it is reentered when
      * busy is false and the job is cancelled.
+     * Initialized in job_start()
      */
     Coroutine *co;
 
+    /** True if this job should automatically finalize itself */
+    bool auto_finalize;
+
+    /** True if this job should automatically dismiss itself */
+    bool auto_dismiss;
+
+    /** The completion function that will be called when the job completes.  */
+    BlockCompletionFunc *cb;
+
+    /** The opaque value that is passed to the completion function.  */
+    void *opaque;
+
+    /* ProgressMeter API is thread-safe */
+    ProgressMeter progress;
+
+
+    /** Protected by job_mutex */
+
+    /** AioContext to run the job coroutine in */
+    AioContext *aio_context;
+
+    /** Reference count of the block job */
+    int refcnt;
+
+    /** Current state; See @JobStatus for details. */
+    JobStatus status;
+
     /**
      * Timer that is used by @job_sleep_ns. Accessed under job_mutex (in
      * job.c).
@@ -76,7 +101,7 @@ typedef struct Job {
     /**
      * Set to false by the job while the coroutine has yielded and may be
      * re-entered by job_enter(). There may still be I/O or event loop activity
-     * pending. Accessed under block_job_mutex (in blockjob.c).
+     * pending. Accessed under job_mutex.
      *
      * When the job is deferred to the main loop, busy is true as long as the
      * bottom half is still pending.
@@ -112,14 +137,6 @@ typedef struct Job {
     /** Set to true when the job has deferred work to the main loop. */
     bool deferred_to_main_loop;
 
-    /** True if this job should automatically finalize itself */
-    bool auto_finalize;
-
-    /** True if this job should automatically dismiss itself */
-    bool auto_dismiss;
-
-    ProgressMeter progress;
-
     /**
      * Return code from @run and/or @prepare callback(s).
      * Not final until the job has reached the CONCLUDED status.
@@ -134,12 +151,6 @@ typedef struct Job {
      */
     Error *err;
 
-    /** The completion function that will be called when the job completes.  */
-    BlockCompletionFunc *cb;
-
-    /** The opaque value that is passed to the completion function.  */
-    void *opaque;
-
     /** Notifiers called when a cancelled job is finalised */
     NotifierList on_finalize_cancelled;
 
@@ -167,6 +178,7 @@ typedef struct Job {
 
 /**
  * Callbacks and other information about a Job driver.
+ * All callbacks are invoked with job_mutex *not* held.
  */
 struct JobDriver {
 
@@ -481,7 +493,6 @@ void job_yield(Job *job);
  */
 void coroutine_fn job_sleep_ns(Job *job, int64_t ns);
 
-
 /** Returns the JobType of a given Job. */
 JobType job_type(const Job *job);
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 03/16] job.h: define locked functions
  2022-01-05 14:01 [PATCH v3 00/16] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
  2022-01-05 14:01 ` [PATCH v3 01/16] job.c: make job_mutex and job_lock/unlock() public Emanuele Giuseppe Esposito
  2022-01-05 14:01 ` [PATCH v3 02/16] job.h: categorize fields in struct Job Emanuele Giuseppe Esposito
@ 2022-01-05 14:01 ` Emanuele Giuseppe Esposito
  2022-01-19 10:44   ` Paolo Bonzini
  2022-01-05 14:01 ` [PATCH v3 04/16] job.h: define unlocked functions Emanuele Giuseppe Esposito
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-05 14:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

These functions assume that the job lock is held by the
caller, to avoid TOC/TOU conditions. Therefore, their
name must end with _locked.

Introduce also additional helpers that define _locked
functions (useful when the job_mutex is globally applied).

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block.c                          |   2 +-
 block/replication.c              |   4 +-
 blockdev.c                       |  32 +++----
 blockjob.c                       |  16 ++--
 include/qemu/job.h               | 153 +++++++++++++++++++++---------
 job-qmp.c                        |  26 +++---
 job.c                            | 155 +++++++++++++++++--------------
 qemu-img.c                       |  10 +-
 tests/unit/test-bdrv-drain.c     |   2 +-
 tests/unit/test-block-iothread.c |   4 +-
 tests/unit/test-blockjob-txn.c   |  14 +--
 tests/unit/test-blockjob.c       |  30 +++---
 12 files changed, 263 insertions(+), 185 deletions(-)

diff --git a/block.c b/block.c
index ca70bcc807..8fcd525fa0 100644
--- a/block.c
+++ b/block.c
@@ -4976,7 +4976,7 @@ static void bdrv_close(BlockDriverState *bs)
 
 void bdrv_close_all(void)
 {
-    assert(job_next(NULL) == NULL);
+    assert(job_next_locked(NULL) == NULL);
     assert(qemu_in_main_thread());
 
     /* Drop references from requests still in flight, such as canceled block
diff --git a/block/replication.c b/block/replication.c
index 55c8f894aa..5215c328c1 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -149,7 +149,7 @@ static void replication_close(BlockDriverState *bs)
     if (s->stage == BLOCK_REPLICATION_FAILOVER) {
         commit_job = &s->commit_job->job;
         assert(commit_job->aio_context == qemu_get_current_aio_context());
-        job_cancel_sync(commit_job, false);
+        job_cancel_sync_locked(commit_job, false);
     }
 
     if (s->mode == REPLICATION_MODE_SECONDARY) {
@@ -726,7 +726,7 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
          * disk, secondary disk in backup_job_completed().
          */
         if (s->backup_job) {
-            job_cancel_sync(&s->backup_job->job, true);
+            job_cancel_sync_locked(&s->backup_job->job, true);
         }
 
         if (!failover) {
diff --git a/blockdev.c b/blockdev.c
index a3b9aeb3c2..11fd651bde 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -160,7 +160,7 @@ void blockdev_mark_auto_del(BlockBackend *blk)
             AioContext *aio_context = job->job.aio_context;
             aio_context_acquire(aio_context);
 
-            job_cancel(&job->job, false);
+            job_cancel_locked(&job->job, false);
 
             aio_context_release(aio_context);
         }
@@ -1832,7 +1832,7 @@ static void drive_backup_abort(BlkActionState *common)
         aio_context = bdrv_get_aio_context(state->bs);
         aio_context_acquire(aio_context);
 
-        job_cancel_sync(&state->job->job, true);
+        job_cancel_sync_locked(&state->job->job, true);
 
         aio_context_release(aio_context);
     }
@@ -1933,7 +1933,7 @@ static void blockdev_backup_abort(BlkActionState *common)
         aio_context = bdrv_get_aio_context(state->bs);
         aio_context_acquire(aio_context);
 
-        job_cancel_sync(&state->job->job, true);
+        job_cancel_sync_locked(&state->job->job, true);
 
         aio_context_release(aio_context);
     }
@@ -2382,7 +2382,7 @@ exit:
     if (!has_props) {
         qapi_free_TransactionProperties(props);
     }
-    job_txn_unref(block_job_txn);
+    job_txn_unref_locked(block_job_txn);
 }
 
 BlockDirtyBitmapSha256 *qmp_x_debug_block_dirty_bitmap_sha256(const char *node,
@@ -3347,14 +3347,14 @@ void qmp_block_job_cancel(const char *device,
         force = false;
     }
 
-    if (job_user_paused(&job->job) && !force) {
+    if (job_user_paused_locked(&job->job) && !force) {
         error_setg(errp, "The block job for device '%s' is currently paused",
                    device);
         goto out;
     }
 
     trace_qmp_block_job_cancel(job);
-    job_user_cancel(&job->job, force, errp);
+    job_user_cancel_locked(&job->job, force, errp);
 out:
     aio_context_release(aio_context);
 }
@@ -3369,7 +3369,7 @@ void qmp_block_job_pause(const char *device, Error **errp)
     }
 
     trace_qmp_block_job_pause(job);
-    job_user_pause(&job->job, errp);
+    job_user_pause_locked(&job->job, errp);
     aio_context_release(aio_context);
 }
 
@@ -3383,7 +3383,7 @@ void qmp_block_job_resume(const char *device, Error **errp)
     }
 
     trace_qmp_block_job_resume(job);
-    job_user_resume(&job->job, errp);
+    job_user_resume_locked(&job->job, errp);
     aio_context_release(aio_context);
 }
 
@@ -3397,7 +3397,7 @@ void qmp_block_job_complete(const char *device, Error **errp)
     }
 
     trace_qmp_block_job_complete(job);
-    job_complete(&job->job, errp);
+    job_complete_locked(&job->job, errp);
     aio_context_release(aio_context);
 }
 
@@ -3411,16 +3411,16 @@ void qmp_block_job_finalize(const char *id, Error **errp)
     }
 
     trace_qmp_block_job_finalize(job);
-    job_ref(&job->job);
-    job_finalize(&job->job, errp);
+    job_ref_locked(&job->job);
+    job_finalize_locked(&job->job, errp);
 
     /*
-     * Job's context might have changed via job_finalize (and job_txn_apply
-     * automatically acquires the new one), so make sure we release the correct
-     * one.
+     * Job's context might have changed via job_finalize_locked
+     * (and job_txn_apply automatically acquires the new one),
+     * so make sure we release the correct one.
      */
     aio_context = blk_get_aio_context(job->blk);
-    job_unref(&job->job);
+    job_unref_locked(&job->job);
     aio_context_release(aio_context);
 }
 
@@ -3436,7 +3436,7 @@ void qmp_block_job_dismiss(const char *id, Error **errp)
 
     trace_qmp_block_job_dismiss(bjob);
     job = &bjob->job;
-    job_dismiss(&job, errp);
+    job_dismiss_locked(&job, errp);
     aio_context_release(aio_context);
 }
 
diff --git a/blockjob.c b/blockjob.c
index 74476af473..5b5d7f26b3 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -65,7 +65,7 @@ BlockJob *block_job_next(BlockJob *bjob)
     assert(qemu_in_main_thread());
 
     do {
-        job = job_next(job);
+        job = job_next_locked(job);
     } while (job && !is_block_job(job));
 
     return job ? container_of(job, BlockJob, job) : NULL;
@@ -73,7 +73,7 @@ BlockJob *block_job_next(BlockJob *bjob)
 
 BlockJob *block_job_get(const char *id)
 {
-    Job *job = job_get(id);
+    Job *job = job_get_locked(id);
     assert(qemu_in_main_thread());
 
     if (job && is_block_job(job)) {
@@ -103,7 +103,7 @@ static char *child_job_get_parent_desc(BdrvChild *c)
 static void child_job_drained_begin(BdrvChild *c)
 {
     BlockJob *job = c->opaque;
-    job_pause(&job->job);
+    job_pause_locked(&job->job);
 }
 
 static bool child_job_drained_poll(BdrvChild *c)
@@ -115,7 +115,7 @@ static bool child_job_drained_poll(BdrvChild *c)
     /* An inactive or completed job doesn't have any pending requests. Jobs
      * with !job->busy are either already paused or have a pause point after
      * being reentered, so no job driver code will run before they pause. */
-    if (!job->busy || job_is_completed(job)) {
+    if (!job->busy || job_is_completed_locked(job)) {
         return false;
     }
 
@@ -131,7 +131,7 @@ static bool child_job_drained_poll(BdrvChild *c)
 static void child_job_drained_end(BdrvChild *c, int *drained_end_counter)
 {
     BlockJob *job = c->opaque;
-    job_resume(&job->job);
+    job_resume_locked(&job->job);
 }
 
 static bool child_job_can_set_aio_ctx(BdrvChild *c, AioContext *ctx,
@@ -279,7 +279,7 @@ bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
 
     assert(qemu_in_main_thread());
 
-    if (job_apply_verb(&job->job, JOB_VERB_SET_SPEED, errp) < 0) {
+    if (job_apply_verb_locked(&job->job, JOB_VERB_SET_SPEED, errp) < 0) {
         return false;
     }
     if (speed < 0) {
@@ -301,7 +301,7 @@ bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
     }
 
     /* kick only if a timer is pending */
-    job_enter_cond(&job->job, job_timer_pending);
+    job_enter_cond_locked(&job->job, job_timer_pending);
 
     return true;
 }
@@ -553,7 +553,7 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
     }
     if (action == BLOCK_ERROR_ACTION_STOP) {
         if (!job->job.user_paused) {
-            job_pause(&job->job);
+            job_pause_locked(&job->job);
             /* make the pause user visible, which will be resumed from QMP. */
             job->job.user_paused = true;
         }
diff --git a/include/qemu/job.h b/include/qemu/job.h
index 0d348ff186..0d1c4d1bb1 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -350,7 +350,7 @@ void job_unlock(void);
 
 /**
  * Allocate and return a new job transaction. Jobs can be added to the
- * transaction using job_txn_add_job().
+ * transaction using job_txn_add_job_locked().
  *
  * The transaction is automatically freed when the last job completes or is
  * cancelled.
@@ -362,22 +362,25 @@ void job_unlock(void);
 JobTxn *job_txn_new(void);
 
 /**
- * Release a reference that was previously acquired with job_txn_add_job or
- * job_txn_new. If it's the last reference to the object, it will be freed.
+ * Release a reference that was previously acquired with
+ * job_txn_add_job_locked or job_txn_new.
+ * If it's the last reference to the object, it will be freed.
  */
-void job_txn_unref(JobTxn *txn);
+void job_txn_unref_locked(JobTxn *txn);
 
 /**
  * @txn: The transaction (may be NULL)
  * @job: Job to add to the transaction
  *
  * Add @job to the transaction.  The @job must not already be in a transaction.
- * The caller must call either job_txn_unref() or job_completed() to release
- * the reference that is automatically grabbed here.
+ * The caller must call either job_txn_unref_locked() or job_completed()
+ * to release the reference that is automatically grabbed here.
  *
  * If @txn is NULL, the function does nothing.
+ *
+ * Called between job_lock and job_unlock.
  */
-void job_txn_add_job(JobTxn *txn, Job *job);
+void job_txn_add_job_locked(JobTxn *txn, Job *job);
 
 /**
  * Create a new long-running job and return it.
@@ -396,16 +399,20 @@ void *job_create(const char *job_id, const JobDriver *driver, JobTxn *txn,
                  void *opaque, Error **errp);
 
 /**
- * Add a reference to Job refcnt, it will be decreased with job_unref, and then
- * be freed if it comes to be the last reference.
+ * Add a reference to Job refcnt, it will be decreased with job_unref_locked,
+ * and then be freed if it comes to be the last reference.
+ *
+ * Called between job_lock and job_unlock.
  */
-void job_ref(Job *job);
+void job_ref_locked(Job *job);
 
 /**
- * Release a reference that was previously acquired with job_ref() or
+ * Release a reference that was previously acquired with job_ref_locked() or
  * job_create(). If it's the last reference to the object, it will be freed.
+ *
+ * Called between job_lock and job_unlock, but might release it temporarly.
  */
-void job_unref(Job *job);
+void job_unref_locked(Job *job);
 
 /**
  * @job: The job that has made progress
@@ -450,8 +457,10 @@ void job_event_completed(Job *job);
  * Conditionally enter the job coroutine if the job is ready to run, not
  * already busy and fn() returns true. fn() is called while under the job_lock
  * critical section.
+ *
+ * Called between job_lock and job_unlock, but it releases the lock temporarly.
  */
-void job_enter_cond(Job *job, bool(*fn)(Job *job));
+void job_enter_cond_locked(Job *job, bool(*fn)(Job *job));
 
 /**
  * @job: A job that has not yet been started.
@@ -471,8 +480,9 @@ void job_enter(Job *job);
 /**
  * @job: The job that is ready to pause.
  *
- * Pause now if job_pause() has been called. Jobs that perform lots of I/O
- * must call this between requests so that the job can be paused.
+ * Pause now if job_pause_locked() has been called.
+ * Jobs that perform lots of I/O must call this between
+ * requests so that the job can be paused.
  */
 void coroutine_fn job_pause_point(Job *job);
 
@@ -511,79 +521,117 @@ bool job_is_cancelled(Job *job);
  */
 bool job_cancel_requested(Job *job);
 
-/** Returns whether the job is in a completed state. */
-bool job_is_completed(Job *job);
+/**
+ * Returns whether the job is in a completed state.
+ * Called between job_lock and job_unlock.
+ */
+bool job_is_completed_locked(Job *job);
 
-/** Returns whether the job is ready to be completed. */
+/**
+ * Returns whether the job is ready to be completed.
+ * Called with job_mutex *not* held.
+ */
 bool job_is_ready(Job *job);
 
+/** Same as job_is_ready(), but assumes job_lock is held. */
+bool job_is_ready_locked(Job *job);
+
 /**
  * Request @job to pause at the next pause point. Must be paired with
- * job_resume(). If the job is supposed to be resumed by user action, call
- * job_user_pause() instead.
+ * job_resume_locked(). If the job is supposed to be resumed by user action,
+ * call job_user_pause_locked() instead.
+ *
+ * Called between job_lock and job_unlock.
  */
-void job_pause(Job *job);
+void job_pause_locked(Job *job);
 
-/** Resumes a @job paused with job_pause. */
-void job_resume(Job *job);
+/**
+ * Resumes a @job paused with job_pause_locked.
+ * Called between job_lock and job_unlock.
+ */
+void job_resume_locked(Job *job);
 
 /**
  * Asynchronously pause the specified @job.
- * Do not allow a resume until a matching call to job_user_resume.
+ * Do not allow a resume until a matching call to job_user_resume_locked.
+ *
+ * Called between job_lock and job_unlock.
  */
-void job_user_pause(Job *job, Error **errp);
+void job_user_pause_locked(Job *job, Error **errp);
 
-/** Returns true if the job is user-paused. */
-bool job_user_paused(Job *job);
+/**
+ * Returns true if the job is user-paused.
+ * Called between job_lock and job_unlock.
+ */
+bool job_user_paused_locked(Job *job);
 
 /**
  * Resume the specified @job.
- * Must be paired with a preceding job_user_pause.
+ * Must be paired with a preceding job_user_pause_locked.
+ *
+ * Called between job_lock and job_unlock, but might release it temporarly.
  */
-void job_user_resume(Job *job, Error **errp);
+void job_user_resume_locked(Job *job, Error **errp);
 
 /**
  * Get the next element from the list of block jobs after @job, or the
  * first one if @job is %NULL.
  *
  * Returns the requested job, or %NULL if there are no more jobs left.
+ *
+ * Called between job_lock and job_unlock.
  */
-Job *job_next(Job *job);
+Job *job_next_locked(Job *job);
 
 /**
  * Get the job identified by @id (which must not be %NULL).
  *
  * Returns the requested job, or %NULL if it doesn't exist.
+ *
+ * Called between job_lock and job_unlock.
  */
-Job *job_get(const char *id);
+Job *job_get_locked(const char *id);
 
 /**
  * Check whether the verb @verb can be applied to @job in its current state.
  * Returns 0 if the verb can be applied; otherwise errp is set and -EPERM
  * returned.
+ *
+ * Called between job_lock and job_unlock.
  */
-int job_apply_verb(Job *job, JobVerb verb, Error **errp);
+int job_apply_verb_locked(Job *job, JobVerb verb, Error **errp);
 
 /** The @job could not be started, free it. */
 void job_early_fail(Job *job);
 
+/** Same as job_early_fail(), but assumes job_lock is held. */
+void job_early_fail_locked(Job *job);
+
 /** Moves the @job from RUNNING to READY */
 void job_transition_to_ready(Job *job);
 
-/** Asynchronously complete the specified @job. */
-void job_complete(Job *job, Error **errp);
+/**
+ * Asynchronously complete the specified @job.
+ * Called between job_lock and job_unlock, but it releases the lock temporarly.
+ */
+void job_complete_locked(Job *job, Error **errp);
 
 /**
  * Asynchronously cancel the specified @job. If @force is true, the job should
  * be cancelled immediately without waiting for a consistent state.
+ *
+ * Called between job_lock and job_unlock.
  */
-void job_cancel(Job *job, bool force);
+void job_cancel_locked(Job *job, bool force);
 
 /**
- * Cancels the specified job like job_cancel(), but may refuse to do so if the
- * operation isn't meaningful in the current state of the job.
+ * Cancels the specified job like job_cancel_locked(),
+ * but may refuse to do so if the operation isn't meaningful
+ * in the current state of the job.
+ *
+ * Called between job_lock and job_unlock.
  */
-void job_user_cancel(Job *job, bool force, Error **errp);
+void job_user_cancel_locked(Job *job, bool force, Error **errp);
 
 /**
  * Synchronously cancel the @job.  The completion callback is called
@@ -596,14 +644,20 @@ void job_user_cancel(Job *job, bool force, Error **errp);
  *
  * Callers must hold the AioContext lock of job->aio_context.
  */
-int job_cancel_sync(Job *job, bool force);
+int job_cancel_sync_locked(Job *job, bool force);
 
-/** Synchronously force-cancels all jobs using job_cancel_sync(). */
+/**
+ * Synchronously force-cancels all jobs using job_cancel_sync_locked().
+ *
+ * Called with job_lock *not* held, unlike most other APIs consumed
+ * by the monitor! This is primarly to avoid adding unnecessary lock-unlock
+ * patterns in the caller.
+ */
 void job_cancel_sync_all(void);
 
 /**
  * @job: The job to be completed.
- * @errp: Error object which may be set by job_complete(); this is not
+ * @errp: Error object which may be set by job_complete_locked(); this is not
  *        necessarily set on every error, the job return value has to be
  *        checked as well.
  *
@@ -614,8 +668,10 @@ void job_cancel_sync_all(void);
  * Returns the return value from the job.
  *
  * Callers must hold the AioContext lock of job->aio_context.
+ *
+ * Called between job_lock and job_unlock.
  */
-int job_complete_sync(Job *job, Error **errp);
+int job_complete_sync_locked(Job *job, Error **errp);
 
 /**
  * For a @job that has finished its work and is pending awaiting explicit
@@ -624,14 +680,18 @@ int job_complete_sync(Job *job, Error **errp);
  * FIXME: Make the below statement universally true:
  * For jobs that support the manual workflow mode, all graph changes that occur
  * as a result will occur after this command and before a successful reply.
+ *
+ * Called between job_lock and job_unlock.
  */
-void job_finalize(Job *job, Error **errp);
+void job_finalize_locked(Job *job, Error **errp);
 
 /**
  * Remove the concluded @job from the query list and resets the passed pointer
  * to %NULL. Returns an error if the job is not actually concluded.
+ *
+ * Called between job_lock and job_unlock.
  */
-void job_dismiss(Job **job, Error **errp);
+void job_dismiss_locked(Job **job, Error **errp);
 
 /**
  * Synchronously finishes the given @job. If @finish is given, it is called to
@@ -641,7 +701,10 @@ void job_dismiss(Job **job, Error **errp);
  * cancelled before completing, and -errno in other error cases.
  *
  * Callers must hold the AioContext lock of job->aio_context.
+ *
+ * Called between job_lock and job_unlock.
  */
-int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp);
+int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
+                           Error **errp);
 
 #endif
diff --git a/job-qmp.c b/job-qmp.c
index 829a28aa70..de4120a1d4 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -36,7 +36,7 @@ static Job *find_job(const char *id, AioContext **aio_context, Error **errp)
 
     *aio_context = NULL;
 
-    job = job_get(id);
+    job = job_get_locked(id);
     if (!job) {
         error_setg(errp, "Job not found");
         return NULL;
@@ -58,7 +58,7 @@ void qmp_job_cancel(const char *id, Error **errp)
     }
 
     trace_qmp_job_cancel(job);
-    job_user_cancel(job, true, errp);
+    job_user_cancel_locked(job, true, errp);
     aio_context_release(aio_context);
 }
 
@@ -72,7 +72,7 @@ void qmp_job_pause(const char *id, Error **errp)
     }
 
     trace_qmp_job_pause(job);
-    job_user_pause(job, errp);
+    job_user_pause_locked(job, errp);
     aio_context_release(aio_context);
 }
 
@@ -86,7 +86,7 @@ void qmp_job_resume(const char *id, Error **errp)
     }
 
     trace_qmp_job_resume(job);
-    job_user_resume(job, errp);
+    job_user_resume_locked(job, errp);
     aio_context_release(aio_context);
 }
 
@@ -100,7 +100,7 @@ void qmp_job_complete(const char *id, Error **errp)
     }
 
     trace_qmp_job_complete(job);
-    job_complete(job, errp);
+    job_complete_locked(job, errp);
     aio_context_release(aio_context);
 }
 
@@ -114,16 +114,16 @@ void qmp_job_finalize(const char *id, Error **errp)
     }
 
     trace_qmp_job_finalize(job);
-    job_ref(job);
-    job_finalize(job, errp);
+    job_ref_locked(job);
+    job_finalize_locked(job, errp);
 
     /*
-     * Job's context might have changed via job_finalize (and job_txn_apply
-     * automatically acquires the new one), so make sure we release the correct
-     * one.
+     * Job's context might have changed via job_finalize_locked
+     * (and job_txn_apply automatically acquires the new one),
+     * so make sure we release the correct one.
      */
     aio_context = job->aio_context;
-    job_unref(job);
+    job_unref_locked(job);
     aio_context_release(aio_context);
 }
 
@@ -137,7 +137,7 @@ void qmp_job_dismiss(const char *id, Error **errp)
     }
 
     trace_qmp_job_dismiss(job);
-    job_dismiss(&job, errp);
+    job_dismiss_locked(&job, errp);
     aio_context_release(aio_context);
 }
 
@@ -171,7 +171,7 @@ JobInfoList *qmp_query_jobs(Error **errp)
     JobInfoList *head = NULL, **tail = &head;
     Job *job;
 
-    for (job = job_next(NULL); job; job = job_next(job)) {
+    for (job = job_next_locked(NULL); job; job = job_next_locked(job)) {
         JobInfo *value;
         AioContext *aio_context;
 
diff --git a/job.c b/job.c
index ccf737a179..bb6ca2940c 100644
--- a/job.c
+++ b/job.c
@@ -118,14 +118,14 @@ static void job_txn_ref(JobTxn *txn)
     txn->refcnt++;
 }
 
-void job_txn_unref(JobTxn *txn)
+void job_txn_unref_locked(JobTxn *txn)
 {
     if (txn && --txn->refcnt == 0) {
         g_free(txn);
     }
 }
 
-void job_txn_add_job(JobTxn *txn, Job *job)
+void job_txn_add_job_locked(JobTxn *txn, Job *job)
 {
     if (!txn) {
         return;
@@ -142,7 +142,7 @@ static void job_txn_del_job(Job *job)
 {
     if (job->txn) {
         QLIST_REMOVE(job, txn_list);
-        job_txn_unref(job->txn);
+        job_txn_unref_locked(job->txn);
         job->txn = NULL;
     }
 }
@@ -160,7 +160,7 @@ static int job_txn_apply(Job *job, int fn(Job *))
      * we need to release it here to avoid holding the lock twice - which would
      * break AIO_WAIT_WHILE from within fn.
      */
-    job_ref(job);
+    job_ref_locked(job);
     aio_context_release(job->aio_context);
 
     QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
@@ -178,7 +178,7 @@ static int job_txn_apply(Job *job, int fn(Job *))
      * can't use a local variable to cache it.
      */
     aio_context_acquire(job->aio_context);
-    job_unref(job);
+    job_unref_locked(job);
     return rc;
 }
 
@@ -202,7 +202,7 @@ static void job_state_transition(Job *job, JobStatus s1)
     }
 }
 
-int job_apply_verb(Job *job, JobVerb verb, Error **errp)
+int job_apply_verb_locked(Job *job, JobVerb verb, Error **errp)
 {
     JobStatus s0 = job->status;
     assert(verb >= 0 && verb < JOB_VERB__MAX);
@@ -238,7 +238,7 @@ bool job_cancel_requested(Job *job)
     return job->cancelled;
 }
 
-bool job_is_ready(Job *job)
+bool job_is_ready_locked(Job *job)
 {
     switch (job->status) {
     case JOB_STATUS_UNDEFINED:
@@ -260,7 +260,13 @@ bool job_is_ready(Job *job)
     return false;
 }
 
-bool job_is_completed(Job *job)
+bool job_is_ready(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job_is_ready_locked(job);
+}
+
+bool job_is_completed_locked(Job *job)
 {
     switch (job->status) {
     case JOB_STATUS_UNDEFINED:
@@ -292,7 +298,7 @@ static bool job_should_pause(Job *job)
     return job->pause_count > 0;
 }
 
-Job *job_next(Job *job)
+Job *job_next_locked(Job *job)
 {
     if (!job) {
         return QLIST_FIRST(&jobs);
@@ -300,7 +306,7 @@ Job *job_next(Job *job)
     return QLIST_NEXT(job, job_list);
 }
 
-Job *job_get(const char *id)
+Job *job_get_locked(const char *id)
 {
     Job *job;
 
@@ -335,7 +341,7 @@ void *job_create(const char *job_id, const JobDriver *driver, JobTxn *txn,
             error_setg(errp, "Invalid job ID '%s'", job_id);
             return NULL;
         }
-        if (job_get(job_id)) {
+        if (job_get_locked(job_id)) {
             error_setg(errp, "Job ID '%s' already in use", job_id);
             return NULL;
         }
@@ -375,21 +381,21 @@ void *job_create(const char *job_id, const JobDriver *driver, JobTxn *txn,
      * consolidating the job management logic */
     if (!txn) {
         txn = job_txn_new();
-        job_txn_add_job(txn, job);
-        job_txn_unref(txn);
+        job_txn_add_job_locked(txn, job);
+        job_txn_unref_locked(txn);
     } else {
-        job_txn_add_job(txn, job);
+        job_txn_add_job_locked(txn, job);
     }
 
     return job;
 }
 
-void job_ref(Job *job)
+void job_ref_locked(Job *job)
 {
     ++job->refcnt;
 }
 
-void job_unref(Job *job)
+void job_unref_locked(Job *job)
 {
     assert(qemu_in_main_thread());
 
@@ -451,7 +457,7 @@ static void job_event_idle(Job *job)
     notifier_list_notify(&job->on_idle, job);
 }
 
-void job_enter_cond(Job *job, bool(*fn)(Job *job))
+void job_enter_cond_locked(Job *job, bool(*fn)(Job *job))
 {
     if (!job_started(job)) {
         return;
@@ -480,7 +486,7 @@ void job_enter_cond(Job *job, bool(*fn)(Job *job))
 
 void job_enter(Job *job)
 {
-    job_enter_cond(job, NULL);
+    job_enter_cond_locked(job, NULL);
 }
 
 /* Yield, and schedule a timer to reenter the coroutine after @ns nanoseconds.
@@ -500,7 +506,7 @@ static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
     real_job_unlock();
     qemu_coroutine_yield();
 
-    /* Set by job_enter_cond() before re-entering the coroutine.  */
+    /* Set by job_enter_cond_locked() before re-entering the coroutine.  */
     assert(job->busy);
 }
 
@@ -573,7 +579,7 @@ static bool job_timer_not_pending(Job *job)
     return !timer_pending(&job->sleep_timer);
 }
 
-void job_pause(Job *job)
+void job_pause_locked(Job *job)
 {
     job->pause_count++;
     if (!job->paused) {
@@ -581,7 +587,7 @@ void job_pause(Job *job)
     }
 }
 
-void job_resume(Job *job)
+void job_resume_locked(Job *job)
 {
     assert(job->pause_count > 0);
     job->pause_count--;
@@ -590,12 +596,12 @@ void job_resume(Job *job)
     }
 
     /* kick only if no timer is pending */
-    job_enter_cond(job, job_timer_not_pending);
+    job_enter_cond_locked(job, job_timer_not_pending);
 }
 
-void job_user_pause(Job *job, Error **errp)
+void job_user_pause_locked(Job *job, Error **errp)
 {
-    if (job_apply_verb(job, JOB_VERB_PAUSE, errp)) {
+    if (job_apply_verb_locked(job, JOB_VERB_PAUSE, errp)) {
         return;
     }
     if (job->user_paused) {
@@ -603,15 +609,15 @@ void job_user_pause(Job *job, Error **errp)
         return;
     }
     job->user_paused = true;
-    job_pause(job);
+    job_pause_locked(job);
 }
 
-bool job_user_paused(Job *job)
+bool job_user_paused_locked(Job *job)
 {
     return job->user_paused;
 }
 
-void job_user_resume(Job *job, Error **errp)
+void job_user_resume_locked(Job *job, Error **errp)
 {
     assert(job);
     assert(qemu_in_main_thread());
@@ -619,14 +625,14 @@ void job_user_resume(Job *job, Error **errp)
         error_setg(errp, "Can't resume a job that was not paused");
         return;
     }
-    if (job_apply_verb(job, JOB_VERB_RESUME, errp)) {
+    if (job_apply_verb_locked(job, JOB_VERB_RESUME, errp)) {
         return;
     }
     if (job->driver->user_resume) {
         job->driver->user_resume(job);
     }
     job->user_paused = false;
-    job_resume(job);
+    job_resume_locked(job);
 }
 
 static void job_do_dismiss(Job *job)
@@ -639,15 +645,15 @@ static void job_do_dismiss(Job *job)
     job_txn_del_job(job);
 
     job_state_transition(job, JOB_STATUS_NULL);
-    job_unref(job);
+    job_unref_locked(job);
 }
 
-void job_dismiss(Job **jobptr, Error **errp)
+void job_dismiss_locked(Job **jobptr, Error **errp)
 {
     Job *job = *jobptr;
     /* similarly to _complete, this is QMP-interface only. */
     assert(job->id);
-    if (job_apply_verb(job, JOB_VERB_DISMISS, errp)) {
+    if (job_apply_verb_locked(job, JOB_VERB_DISMISS, errp)) {
         return;
     }
 
@@ -655,12 +661,18 @@ void job_dismiss(Job **jobptr, Error **errp)
     *jobptr = NULL;
 }
 
-void job_early_fail(Job *job)
+void job_early_fail_locked(Job *job)
 {
     assert(job->status == JOB_STATUS_CREATED);
     job_do_dismiss(job);
 }
 
+void job_early_fail(Job *job)
+{
+    JOB_LOCK_GUARD();
+    job_early_fail_locked(job);
+}
+
 static void job_conclude(Job *job)
 {
     job_state_transition(job, JOB_STATUS_CONCLUDED);
@@ -710,7 +722,7 @@ static void job_clean(Job *job)
 
 static int job_finalize_single(Job *job)
 {
-    assert(job_is_completed(job));
+    assert(job_is_completed_locked(job));
 
     /* Ensure abort is called for late-transactional failures */
     job_update_rc(job);
@@ -795,7 +807,7 @@ static void job_completed_txn_abort(Job *job)
      * calls of AIO_WAIT_WHILE(), which could deadlock otherwise.
      * Note that the job's AioContext may change when it is finalized.
      */
-    job_ref(job);
+    job_ref_locked(job);
     aio_context_release(job->aio_context);
 
     /* Other jobs are effectively cancelled by us, set the status for
@@ -822,22 +834,22 @@ static void job_completed_txn_abort(Job *job)
          */
         ctx = other_job->aio_context;
         aio_context_acquire(ctx);
-        if (!job_is_completed(other_job)) {
+        if (!job_is_completed_locked(other_job)) {
             assert(job_cancel_requested(other_job));
-            job_finish_sync(other_job, NULL, NULL);
+            job_finish_sync_locked(other_job, NULL, NULL);
         }
         job_finalize_single(other_job);
         aio_context_release(ctx);
     }
 
     /*
-     * Use job_ref()/job_unref() so we can read the AioContext here
-     * even if the job went away during job_finalize_single().
+     * Use job_ref_locked()/job_unref_locked() so we can read the AioContext
+     * here even if the job went away during job_finalize_single().
      */
     aio_context_acquire(job->aio_context);
-    job_unref(job);
+    job_unref_locked(job);
 
-    job_txn_unref(txn);
+    job_txn_unref_locked(txn);
 }
 
 static int job_prepare(Job *job)
@@ -869,10 +881,10 @@ static void job_do_finalize(Job *job)
     }
 }
 
-void job_finalize(Job *job, Error **errp)
+void job_finalize_locked(Job *job, Error **errp)
 {
     assert(job && job->id);
-    if (job_apply_verb(job, JOB_VERB_FINALIZE, errp)) {
+    if (job_apply_verb_locked(job, JOB_VERB_FINALIZE, errp)) {
         return;
     }
     job_do_finalize(job);
@@ -905,7 +917,7 @@ static void job_completed_txn_success(Job *job)
      * txn.
      */
     QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
-        if (!job_is_completed(other_job)) {
+        if (!job_is_completed_locked(other_job)) {
             return;
         }
         assert(other_job->ret == 0);
@@ -921,7 +933,7 @@ static void job_completed_txn_success(Job *job)
 
 static void job_completed(Job *job)
 {
-    assert(job && job->txn && !job_is_completed(job));
+    assert(job && job->txn && !job_is_completed_locked(job));
 
     job_update_rc(job);
     trace_job_completed(job, job->ret);
@@ -938,7 +950,7 @@ static void job_exit(void *opaque)
     Job *job = (Job *)opaque;
     AioContext *ctx;
 
-    job_ref(job);
+    job_ref_locked(job);
     aio_context_acquire(job->aio_context);
 
     /* This is a lie, we're not quiescent, but still doing the completion
@@ -957,7 +969,7 @@ static void job_exit(void *opaque)
      * the job underneath us.
      */
     ctx = job->aio_context;
-    job_unref(job);
+    job_unref_locked(job);
     aio_context_release(ctx);
 }
 
@@ -1003,7 +1015,7 @@ void job_start(Job *job)
     aio_co_enter(job->aio_context, job->co);
 }
 
-void job_cancel(Job *job, bool force)
+void job_cancel_locked(Job *job, bool force)
 {
     if (job->status == JOB_STATUS_CONCLUDED) {
         job_do_dismiss(job);
@@ -1031,20 +1043,22 @@ void job_cancel(Job *job, bool force)
     }
 }
 
-void job_user_cancel(Job *job, bool force, Error **errp)
+void job_user_cancel_locked(Job *job, bool force, Error **errp)
 {
-    if (job_apply_verb(job, JOB_VERB_CANCEL, errp)) {
+    if (job_apply_verb_locked(job, JOB_VERB_CANCEL, errp)) {
         return;
     }
-    job_cancel(job, force);
+    job_cancel_locked(job, force);
 }
 
-/* A wrapper around job_cancel() taking an Error ** parameter so it may be
- * used with job_finish_sync() without the need for (rather nasty) function
- * pointer casts there. */
+/*
+ * A wrapper around job_cancel_locked() taking an Error ** parameter so
+ * it may be used with job_finish_sync_locked() without the
+ * need for (rather nasty) function pointer casts there.
+ */
 static void job_cancel_err(Job *job, Error **errp)
 {
-    job_cancel(job, false);
+    job_cancel_locked(job, false);
 }
 
 /**
@@ -1052,15 +1066,15 @@ static void job_cancel_err(Job *job, Error **errp)
  */
 static void job_force_cancel_err(Job *job, Error **errp)
 {
-    job_cancel(job, true);
+    job_cancel_locked(job, true);
 }
 
-int job_cancel_sync(Job *job, bool force)
+int job_cancel_sync_locked(Job *job, bool force)
 {
     if (force) {
-        return job_finish_sync(job, &job_force_cancel_err, NULL);
+        return job_finish_sync_locked(job, &job_force_cancel_err, NULL);
     } else {
-        return job_finish_sync(job, &job_cancel_err, NULL);
+        return job_finish_sync_locked(job, &job_cancel_err, NULL);
     }
 }
 
@@ -1069,25 +1083,25 @@ void job_cancel_sync_all(void)
     Job *job;
     AioContext *aio_context;
 
-    while ((job = job_next(NULL))) {
+    while ((job = job_next_locked(NULL))) {
         aio_context = job->aio_context;
         aio_context_acquire(aio_context);
-        job_cancel_sync(job, true);
+        job_cancel_sync_locked(job, true);
         aio_context_release(aio_context);
     }
 }
 
-int job_complete_sync(Job *job, Error **errp)
+int job_complete_sync_locked(Job *job, Error **errp)
 {
-    return job_finish_sync(job, job_complete, errp);
+    return job_finish_sync_locked(job, job_complete_locked, errp);
 }
 
-void job_complete(Job *job, Error **errp)
+void job_complete_locked(Job *job, Error **errp)
 {
     /* Should not be reachable via external interface for internal jobs */
     assert(job->id);
     assert(qemu_in_main_thread());
-    if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
+    if (job_apply_verb_locked(job, JOB_VERB_COMPLETE, errp)) {
         return;
     }
     if (job_cancel_requested(job) || !job->driver->complete) {
@@ -1099,26 +1113,27 @@ void job_complete(Job *job, Error **errp)
     job->driver->complete(job, errp);
 }
 
-int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
+int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
+                           Error **errp)
 {
     Error *local_err = NULL;
     int ret;
 
-    job_ref(job);
+    job_ref_locked(job);
 
     if (finish) {
         finish(job, &local_err);
     }
     if (local_err) {
         error_propagate(errp, local_err);
-        job_unref(job);
+        job_unref_locked(job);
         return -EBUSY;
     }
 
     AIO_WAIT_WHILE(job->aio_context,
-                   (job_enter(job), !job_is_completed(job)));
+                   (job_enter(job), !job_is_completed_locked(job)));
 
     ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : job->ret;
-    job_unref(job);
+    job_unref_locked(job);
     return ret;
 }
diff --git a/qemu-img.c b/qemu-img.c
index f036a1d428..09f3b11eab 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -906,7 +906,7 @@ static void run_block_job(BlockJob *job, Error **errp)
     int ret = 0;
 
     aio_context_acquire(aio_context);
-    job_ref(&job->job);
+    job_ref_locked(&job->job);
     do {
         float progress = 0.0f;
         aio_poll(aio_context, true);
@@ -917,14 +917,14 @@ static void run_block_job(BlockJob *job, Error **errp)
             progress = (float)progress_current / progress_total * 100.f;
         }
         qemu_progress_print(progress, 0);
-    } while (!job_is_ready(&job->job) && !job_is_completed(&job->job));
+    } while (!job_is_ready(&job->job) && !job_is_completed_locked(&job->job));
 
-    if (!job_is_completed(&job->job)) {
-        ret = job_complete_sync(&job->job, errp);
+    if (!job_is_completed_locked(&job->job)) {
+        ret = job_complete_sync_locked(&job->job, errp);
     } else {
         ret = job->job.ret;
     }
-    job_unref(&job->job);
+    job_unref_locked(&job->job);
     aio_context_release(aio_context);
 
     /* publish completion progress only when success */
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 2d3c17e566..3f344a0d0d 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -995,7 +995,7 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
     g_assert_true(job->job.busy); /* We're in qemu_co_sleep_ns() */
 
     aio_context_acquire(ctx);
-    ret = job_complete_sync(&job->job, &error_abort);
+    ret = job_complete_sync_locked(&job->job, &error_abort);
     g_assert_cmpint(ret, ==, (result == TEST_JOB_SUCCESS ? 0 : -EIO));
 
     if (use_iothread) {
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index aea660aeed..7e1b521d61 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -456,7 +456,7 @@ static void test_attach_blockjob(void)
     }
 
     aio_context_acquire(ctx);
-    job_complete_sync(&tjob->common.job, &error_abort);
+    job_complete_sync_locked(&tjob->common.job, &error_abort);
     blk_set_aio_context(blk, qemu_get_aio_context(), &error_abort);
     aio_context_release(ctx);
 
@@ -630,7 +630,7 @@ static void test_propagate_mirror(void)
                  BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
                  false, "filter_node", MIRROR_COPY_MODE_BACKGROUND,
                  &error_abort);
-    job = job_get("job0");
+    job = job_get_locked("job0");
     filter = bdrv_find_node("filter_node");
 
     /* Change the AioContext of src */
diff --git a/tests/unit/test-blockjob-txn.c b/tests/unit/test-blockjob-txn.c
index 8bd13b9949..5396fcef10 100644
--- a/tests/unit/test-blockjob-txn.c
+++ b/tests/unit/test-blockjob-txn.c
@@ -125,7 +125,7 @@ static void test_single_job(int expected)
     job_start(&job->job);
 
     if (expected == -ECANCELED) {
-        job_cancel(&job->job, false);
+        job_cancel_locked(&job->job, false);
     }
 
     while (result == -EINPROGRESS) {
@@ -133,7 +133,7 @@ static void test_single_job(int expected)
     }
     g_assert_cmpint(result, ==, expected);
 
-    job_txn_unref(txn);
+    job_txn_unref_locked(txn);
 }
 
 static void test_single_job_success(void)
@@ -168,13 +168,13 @@ static void test_pair_jobs(int expected1, int expected2)
     /* Release our reference now to trigger as many nice
      * use-after-free bugs as possible.
      */
-    job_txn_unref(txn);
+    job_txn_unref_locked(txn);
 
     if (expected1 == -ECANCELED) {
-        job_cancel(&job1->job, false);
+        job_cancel_locked(&job1->job, false);
     }
     if (expected2 == -ECANCELED) {
-        job_cancel(&job2->job, false);
+        job_cancel_locked(&job2->job, false);
     }
 
     while (result1 == -EINPROGRESS || result2 == -EINPROGRESS) {
@@ -227,7 +227,7 @@ static void test_pair_jobs_fail_cancel_race(void)
     job_start(&job1->job);
     job_start(&job2->job);
 
-    job_cancel(&job1->job, false);
+    job_cancel_locked(&job1->job, false);
 
     /* Now make job2 finish before the main loop kicks jobs.  This simulates
      * the race between a pending kick and another job completing.
@@ -242,7 +242,7 @@ static void test_pair_jobs_fail_cancel_race(void)
     g_assert_cmpint(result1, ==, -ECANCELED);
     g_assert_cmpint(result2, ==, -ECANCELED);
 
-    job_txn_unref(txn);
+    job_txn_unref_locked(txn);
 }
 
 int main(int argc, char **argv)
diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c
index 4c9e1bf1e5..2beed3623e 100644
--- a/tests/unit/test-blockjob.c
+++ b/tests/unit/test-blockjob.c
@@ -211,7 +211,7 @@ static CancelJob *create_common(Job **pjob)
     bjob = mk_job(blk, "Steve", &test_cancel_driver, true,
                   JOB_MANUAL_FINALIZE | JOB_MANUAL_DISMISS);
     job = &bjob->job;
-    job_ref(job);
+    job_ref_locked(job);
     assert(job->status == JOB_STATUS_CREATED);
     s = container_of(bjob, CancelJob, common);
     s->blk = blk;
@@ -230,13 +230,13 @@ static void cancel_common(CancelJob *s)
     ctx = job->job.aio_context;
     aio_context_acquire(ctx);
 
-    job_cancel_sync(&job->job, true);
+    job_cancel_sync_locked(&job->job, true);
     if (sts != JOB_STATUS_CREATED && sts != JOB_STATUS_CONCLUDED) {
         Job *dummy = &job->job;
-        job_dismiss(&dummy, &error_abort);
+        job_dismiss_locked(&dummy, &error_abort);
     }
     assert(job->job.status == JOB_STATUS_NULL);
-    job_unref(&job->job);
+    job_unref_locked(&job->job);
     destroy_blk(blk);
 
     aio_context_release(ctx);
@@ -274,7 +274,7 @@ static void test_cancel_paused(void)
     job_start(job);
     assert(job->status == JOB_STATUS_RUNNING);
 
-    job_user_pause(job, &error_abort);
+    job_user_pause_locked(job, &error_abort);
     job_enter(job);
     assert(job->status == JOB_STATUS_PAUSED);
 
@@ -312,7 +312,7 @@ static void test_cancel_standby(void)
     job_enter(job);
     assert(job->status == JOB_STATUS_READY);
 
-    job_user_pause(job, &error_abort);
+    job_user_pause_locked(job, &error_abort);
     job_enter(job);
     assert(job->status == JOB_STATUS_STANDBY);
 
@@ -333,7 +333,7 @@ static void test_cancel_pending(void)
     job_enter(job);
     assert(job->status == JOB_STATUS_READY);
 
-    job_complete(job, &error_abort);
+    job_complete_locked(job, &error_abort);
     job_enter(job);
     while (!job->deferred_to_main_loop) {
         aio_poll(qemu_get_aio_context(), true);
@@ -359,7 +359,7 @@ static void test_cancel_concluded(void)
     job_enter(job);
     assert(job->status == JOB_STATUS_READY);
 
-    job_complete(job, &error_abort);
+    job_complete_locked(job, &error_abort);
     job_enter(job);
     while (!job->deferred_to_main_loop) {
         aio_poll(qemu_get_aio_context(), true);
@@ -369,7 +369,7 @@ static void test_cancel_concluded(void)
     assert(job->status == JOB_STATUS_PENDING);
 
     aio_context_acquire(job->aio_context);
-    job_finalize(job, &error_abort);
+    job_finalize_locked(job, &error_abort);
     aio_context_release(job->aio_context);
     assert(job->status == JOB_STATUS_CONCLUDED);
 
@@ -417,7 +417,7 @@ static const BlockJobDriver test_yielding_driver = {
 };
 
 /*
- * Test that job_complete() works even on jobs that are in a paused
+ * Test that job_complete_locked() works even on jobs that are in a paused
  * state (i.e., STANDBY).
  *
  * To do this, run YieldingJob in an IO thread, get it into the READY
@@ -425,7 +425,7 @@ static const BlockJobDriver test_yielding_driver = {
  * acquire the context so the job will not be entered and will thus
  * remain on STANDBY.
  *
- * job_complete() should still work without error.
+ * job_complete_locked() should still work without error.
  *
  * Note that on the QMP interface, it is impossible to lock an IO
  * thread before a drained section ends.  In practice, the
@@ -479,16 +479,16 @@ static void test_complete_in_standby(void)
     assert(job->status == JOB_STATUS_STANDBY);
 
     /* Even though the job is on standby, this should work */
-    job_complete(job, &error_abort);
+    job_complete_locked(job, &error_abort);
 
     /* The test is done now, clean up. */
-    job_finish_sync(job, NULL, &error_abort);
+    job_finish_sync_locked(job, NULL, &error_abort);
     assert(job->status == JOB_STATUS_PENDING);
 
-    job_finalize(job, &error_abort);
+    job_finalize_locked(job, &error_abort);
     assert(job->status == JOB_STATUS_CONCLUDED);
 
-    job_dismiss(&job, &error_abort);
+    job_dismiss_locked(&job, &error_abort);
 
     destroy_blk(blk);
     aio_context_release(ctx);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 04/16] job.h: define unlocked functions
  2022-01-05 14:01 [PATCH v3 00/16] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (2 preceding siblings ...)
  2022-01-05 14:01 ` [PATCH v3 03/16] job.h: define locked functions Emanuele Giuseppe Esposito
@ 2022-01-05 14:01 ` Emanuele Giuseppe Esposito
  2022-01-05 14:01 ` [PATCH v3 05/16] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU Emanuele Giuseppe Esposito
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-05 14:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

All these functions assume that the lock is not held, and acquire
it internally.

These functions will be useful when job_lock is globally applied,
as they will allow callers to access the job struct fields
without worrying about the job lock.

Update also the comments in blockjob.c (and move them in job.c).

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 blockjob.c         | 20 -------------
 include/qemu/job.h | 68 ++++++++++++++++++++++++++++++++++++++++++--
 job.c              | 70 ++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 133 insertions(+), 25 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 5b5d7f26b3..ce356be51e 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -36,21 +36,6 @@
 #include "qemu/main-loop.h"
 #include "qemu/timer.h"
 
-/*
- * The block job API is composed of two categories of functions.
- *
- * The first includes functions used by the monitor.  The monitor is
- * peculiar in that it accesses the block job list with block_job_get, and
- * therefore needs consistency across block_job_get and the actual operation
- * (e.g. block_job_set_speed).  The consistency is achieved with
- * aio_context_acquire/release.  These functions are declared in blockjob.h.
- *
- * The second includes functions used by the block job drivers and sometimes
- * by the core block layer.  These do not care about locking, because the
- * whole coroutine runs under the AioContext lock, and are declared in
- * blockjob_int.h.
- */
-
 static bool is_block_job(Job *job)
 {
     return job_type(job) == JOB_TYPE_BACKUP ||
@@ -433,11 +418,6 @@ static void block_job_event_ready(Notifier *n, void *opaque)
 }
 
 
-/*
- * API for block job drivers and the block layer.  These functions are
- * declared in blockjob_int.h.
- */
-
 void *block_job_create(const char *job_id, const BlockJobDriver *driver,
                        JobTxn *txn, BlockDriverState *bs, uint64_t perm,
                        uint64_t shared_perm, int64_t speed, int flags,
diff --git a/include/qemu/job.h b/include/qemu/job.h
index 0d1c4d1bb1..f800b0b881 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -384,6 +384,7 @@ void job_txn_add_job_locked(JobTxn *txn, Job *job);
 
 /**
  * Create a new long-running job and return it.
+ * Called with job_mutex *not* held.
  *
  * @job_id: The id of the newly-created job, or %NULL for internal jobs
  * @driver: The class object for the newly-created job.
@@ -419,6 +420,8 @@ void job_unref_locked(Job *job);
  * @done: How much progress the job made since the last call
  *
  * Updates the progress counter of the job.
+ *
+ * Progress API is thread safe.
  */
 void job_progress_update(Job *job, uint64_t done);
 
@@ -429,6 +432,8 @@ void job_progress_update(Job *job, uint64_t done);
  *
  * Sets the expected end value of the progress counter of a job so that a
  * completion percentage can be calculated when the progress is updated.
+ *
+ * Progress API is thread safe.
  */
 void job_progress_set_remaining(Job *job, uint64_t remaining);
 
@@ -444,6 +449,8 @@ void job_progress_set_remaining(Job *job, uint64_t remaining);
  * length before, and job_progress_update() afterwards.
  * (So the operation acts as a parenthesis in regards to the main job
  * operation running in background.)
+ *
+ * Progress API is thread safe.
  */
 void job_progress_increase_remaining(Job *job, uint64_t delta);
 
@@ -467,13 +474,17 @@ void job_enter_cond_locked(Job *job, bool(*fn)(Job *job));
  *
  * Begins execution of a job.
  * Takes ownership of one reference to the job object.
+ *
+ * Called with job_mutex *not* held.
  */
 void job_start(Job *job);
 
 /**
  * @job: The job to enter.
+ * Called with job_mutex *not* held.
  *
  * Continue the specified job by entering the coroutine.
+ * Called with job_mutex lock *not* held.
  */
 void job_enter(Job *job);
 
@@ -483,6 +494,9 @@ void job_enter(Job *job);
  * Pause now if job_pause_locked() has been called.
  * Jobs that perform lots of I/O must call this between
  * requests so that the job can be paused.
+ *
+ * Called with job_mutex *not* held (we don't want the coroutine
+ * to yield with the lock held!).
  */
 void coroutine_fn job_pause_point(Job *job);
 
@@ -490,6 +504,8 @@ void coroutine_fn job_pause_point(Job *job);
  * @job: The job that calls the function.
  *
  * Yield the job coroutine.
+ * Called with job_mutex *not* held (we don't want the coroutine
+ * to yield with the lock held!).
  */
 void job_yield(Job *job);
 
@@ -500,6 +516,9 @@ void job_yield(Job *job);
  * Put the job to sleep (assuming that it wasn't canceled) for @ns
  * %QEMU_CLOCK_REALTIME nanoseconds.  Canceling the job will immediately
  * interrupt the wait.
+ *
+ * Called with job_mutex *not* held (we don't want the coroutine
+ * to yield with the lock held!).
  */
 void coroutine_fn job_sleep_ns(Job *job, int64_t ns);
 
@@ -512,12 +531,19 @@ const char *job_type_str(const Job *job);
 /** Returns true if the job should not be visible to the management layer. */
 bool job_is_internal(Job *job);
 
-/** Returns whether the job is being cancelled. */
+/**
+ * Returns whether the job is being cancelled.
+ * Called with job_mutex *not* held.
+ */
 bool job_is_cancelled(Job *job);
 
+/** Just like job_is_cancelled, but called between job_lock and job_unlock */
+bool job_is_cancelled_locked(Job *job);
+
 /**
  * Returns whether the job is scheduled for cancellation (at an
  * indefinite point).
+ * Called with job_mutex *not* held.
  */
 bool job_cancel_requested(Job *job);
 
@@ -601,13 +627,19 @@ Job *job_get_locked(const char *id);
  */
 int job_apply_verb_locked(Job *job, JobVerb verb, Error **errp);
 
-/** The @job could not be started, free it. */
+/**
+ * The @job could not be started, free it.
+ * Called with job_mutex *not* held.
+ */
 void job_early_fail(Job *job);
 
 /** Same as job_early_fail(), but assumes job_lock is held. */
 void job_early_fail_locked(Job *job);
 
-/** Moves the @job from RUNNING to READY */
+/**
+ * Moves the @job from RUNNING to READY.
+ * Called with job_mutex *not* held.
+ */
 void job_transition_to_ready(Job *job);
 
 /**
@@ -707,4 +739,34 @@ void job_dismiss_locked(Job **job, Error **errp);
 int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
                            Error **errp);
 
+/**
+ * Returns the @job->status.
+ * Called with job_mutex *not* held.
+ */
+JobStatus job_get_status(Job *job);
+
+/**
+ * Returns the @job->pause_count.
+ * Called with job_mutex *not* held.
+ */
+int job_get_pause_count(Job *job);
+
+/**
+ * Returns @job->paused.
+ * Called with job_mutex *not* held.
+ */
+bool job_get_paused(Job *job);
+
+/**
+ * Returns @job->busy.
+ * Called with job_mutex *not* held.
+ */
+bool job_get_busy(Job *job);
+
+/**
+ * Returns @job->aio_context.
+ * Called with job_mutex *not* held.
+ */
+AioContext *job_get_aio_context(Job *job);
+
 #endif
diff --git a/job.c b/job.c
index bb6ca2940c..f4e1a56705 100644
--- a/job.c
+++ b/job.c
@@ -32,6 +32,22 @@
 #include "trace/trace-root.h"
 #include "qapi/qapi-events-job.h"
 
+/*
+ * The job API is composed of two categories of functions.
+ *
+ * The first includes functions used by the monitor.  The monitor is
+ * peculiar in that it accesses the block job list with job_get, and
+ * therefore needs consistency across job_get and the actual operation
+ * (e.g. job_user_cancel). To achieve this consistency, the caller
+ * calls job_lock/job_unlock itself around the whole operation.
+ * These functions are declared in job-monitor.h.
+ *
+ *
+ * The second includes functions used by the block job drivers and sometimes
+ * by the core block layer. These delegate the locking to the callee instead,
+ * and are declared in job-driver.h.
+ */
+
 /*
  * job_mutex protects the jobs list, but also makes the
  * struct job fields thread-safe.
@@ -226,18 +242,61 @@ const char *job_type_str(const Job *job)
     return JobType_str(job_type(job));
 }
 
-bool job_is_cancelled(Job *job)
+JobStatus job_get_status(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job->status;
+}
+
+int job_get_pause_count(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job->pause_count;
+}
+
+bool job_get_paused(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job->paused;
+}
+
+bool job_get_busy(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job->busy;
+}
+
+AioContext *job_get_aio_context(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job->aio_context;
+}
+
+bool job_is_cancelled_locked(Job *job)
 {
     /* force_cancel may be true only if cancelled is true, too */
     assert(job->cancelled || !job->force_cancel);
     return job->force_cancel;
 }
 
-bool job_cancel_requested(Job *job)
+bool job_is_cancelled(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job_is_cancelled_locked(job);
+}
+
+/* Called with job_mutex held. */
+static bool job_cancel_requested_locked(Job *job)
 {
     return job->cancelled;
 }
 
+bool job_cancel_requested(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job_cancel_requested_locked(job);
+}
+
 bool job_is_ready_locked(Job *job)
 {
     switch (job->status) {
@@ -288,6 +347,13 @@ bool job_is_completed_locked(Job *job)
     return false;
 }
 
+/* Called with job_mutex lock *not* held */
+static bool job_is_completed(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job_is_completed_locked(job);
+}
+
 static bool job_started(Job *job)
 {
     return job->co;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 05/16] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU
  2022-01-05 14:01 [PATCH v3 00/16] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (3 preceding siblings ...)
  2022-01-05 14:01 ` [PATCH v3 04/16] job.h: define unlocked functions Emanuele Giuseppe Esposito
@ 2022-01-05 14:01 ` Emanuele Giuseppe Esposito
  2022-01-19 11:06   ` Paolo Bonzini
  2022-01-05 14:01 ` [PATCH v3 06/16] job.c: make job_event_* functions static Emanuele Giuseppe Esposito
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-05 14:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

Once job lock is used and aiocontext is removed, mirror has
to perform job operations under the same critical section,
using the helpers prepared in previous commit.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block/mirror.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 00089e519b..41450df55c 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -653,9 +653,13 @@ static int mirror_exit_common(Job *job)
     BlockDriverState *target_bs;
     BlockDriverState *mirror_top_bs;
     Error *local_err = NULL;
-    bool abort = job->ret < 0;
+    bool abort;
     int ret = 0;
 
+    WITH_JOB_LOCK_GUARD() {
+        abort = job->ret < 0;
+    }
+
     if (s->prepared) {
         return 0;
     }
@@ -1161,8 +1165,10 @@ static void mirror_complete(Job *job, Error **errp)
     s->should_complete = true;
 
     /* If the job is paused, it will be re-entered when it is resumed */
-    if (!job->paused) {
-        job_enter(job);
+    WITH_JOB_LOCK_GUARD() {
+        if (!job->paused) {
+            job_enter_cond_locked(job, NULL);
+        }
     }
 }
 
@@ -1182,8 +1188,11 @@ static bool mirror_drained_poll(BlockJob *job)
      * from one of our own drain sections, to avoid a deadlock waiting for
      * ourselves.
      */
-    if (!s->common.job.paused && !job_is_cancelled(&job->job) && !s->in_drain) {
-        return true;
+    WITH_JOB_LOCK_GUARD() {
+        if (!s->common.job.paused && !job_is_cancelled_locked(&job->job)
+            && !s->in_drain) {
+            return true;
+        }
     }
 
     return !!s->in_flight;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 06/16] job.c: make job_event_* functions static
  2022-01-05 14:01 [PATCH v3 00/16] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (4 preceding siblings ...)
  2022-01-05 14:01 ` [PATCH v3 05/16] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU Emanuele Giuseppe Esposito
@ 2022-01-05 14:01 ` Emanuele Giuseppe Esposito
  2022-01-05 14:01 ` [PATCH v3 07/16] job.c: move inner aiocontext lock in callbacks Emanuele Giuseppe Esposito
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-05 14:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

job_event_* functions can all be static, as they are not used
outside job.c.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/qemu/job.h |  6 ------
 job.c              | 12 ++++++++++--
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index f800b0b881..c95f9fa8d1 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -454,12 +454,6 @@ void job_progress_set_remaining(Job *job, uint64_t remaining);
  */
 void job_progress_increase_remaining(Job *job, uint64_t delta);
 
-/** To be called when a cancelled job is finalised. */
-void job_event_cancelled(Job *job);
-
-/** To be called when a successfully completed job is finalised. */
-void job_event_completed(Job *job);
-
 /**
  * Conditionally enter the job coroutine if the job is ready to run, not
  * already busy and fn() returns true. fn() is called while under the job_lock
diff --git a/job.c b/job.c
index f4e1a56705..b0dba40728 100644
--- a/job.c
+++ b/job.c
@@ -498,12 +498,20 @@ void job_progress_increase_remaining(Job *job, uint64_t delta)
     progress_increase_remaining(&job->progress, delta);
 }
 
-void job_event_cancelled(Job *job)
+/**
+ * To be called when a cancelled job is finalised.
+ * Called with job_mutex held.
+ */
+static void job_event_cancelled(Job *job)
 {
     notifier_list_notify(&job->on_finalize_cancelled, job);
 }
 
-void job_event_completed(Job *job)
+/**
+ * To be called when a successfully completed job is finalised.
+ * Called with job_mutex held.
+ */
+static void job_event_completed(Job *job)
 {
     notifier_list_notify(&job->on_finalize_completed, job);
 }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 07/16] job.c: move inner aiocontext lock in callbacks
  2022-01-05 14:01 [PATCH v3 00/16] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (5 preceding siblings ...)
  2022-01-05 14:01 ` [PATCH v3 06/16] job.c: make job_event_* functions static Emanuele Giuseppe Esposito
@ 2022-01-05 14:01 ` Emanuele Giuseppe Esposito
  2022-01-05 14:02 ` [PATCH v3 08/16] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-05 14:01 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

Instead of having the lock in job_tnx_apply, move it inside
in the callback. This will be helpful for next commits, when
we introduce job_lock/unlock pairs.

job_transition_to_pending() and job_needs_finalize() do not
need to be protected by the aiocontext lock.

No functional change intended.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 job.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/job.c b/job.c
index b0dba40728..2ee7233763 100644
--- a/job.c
+++ b/job.c
@@ -165,7 +165,6 @@ static void job_txn_del_job(Job *job)
 
 static int job_txn_apply(Job *job, int fn(Job *))
 {
-    AioContext *inner_ctx;
     Job *other_job, *next;
     JobTxn *txn = job->txn;
     int rc = 0;
@@ -180,10 +179,7 @@ static int job_txn_apply(Job *job, int fn(Job *))
     aio_context_release(job->aio_context);
 
     QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
-        inner_ctx = other_job->aio_context;
-        aio_context_acquire(inner_ctx);
         rc = fn(other_job);
-        aio_context_release(inner_ctx);
         if (rc) {
             break;
         }
@@ -796,11 +792,15 @@ static void job_clean(Job *job)
 
 static int job_finalize_single(Job *job)
 {
+    AioContext *ctx = job->aio_context;
+
     assert(job_is_completed_locked(job));
 
     /* Ensure abort is called for late-transactional failures */
     job_update_rc(job);
 
+    aio_context_acquire(ctx);
+
     if (!job->ret) {
         job_commit(job);
     } else {
@@ -808,6 +808,8 @@ static int job_finalize_single(Job *job)
     }
     job_clean(job);
 
+    aio_context_release(ctx);
+
     if (job->cb) {
         job->cb(job->opaque, job->ret);
     }
@@ -928,11 +930,16 @@ static void job_completed_txn_abort(Job *job)
 
 static int job_prepare(Job *job)
 {
+    AioContext *ctx = job->aio_context;
     assert(qemu_in_main_thread());
+
     if (job->ret == 0 && job->driver->prepare) {
+        aio_context_acquire(ctx);
         job->ret = job->driver->prepare(job);
+        aio_context_release(ctx);
         job_update_rc(job);
     }
+
     return job->ret;
 }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 08/16] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED
  2022-01-05 14:01 [PATCH v3 00/16] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (6 preceding siblings ...)
  2022-01-05 14:01 ` [PATCH v3 07/16] job.c: move inner aiocontext lock in callbacks Emanuele Giuseppe Esposito
@ 2022-01-05 14:02 ` Emanuele Giuseppe Esposito
  2022-01-05 14:02 ` [PATCH v3 09/16] jobs: remove aiocontext locks since the functions are under BQL Emanuele Giuseppe Esposito
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-05 14:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

Same as AIO_WAIT_WHILE macro, but if we are in the Main loop
do not release and then acquire ctx_ 's aiocontext.

Once all Aiocontext locks go away, this macro will replace
AIO_WAIT_WHILE.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 include/block/aio-wait.h | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index b39eefb38d..ff27fe4eab 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -59,10 +59,11 @@ typedef struct {
 extern AioWait global_aio_wait;
 
 /**
- * AIO_WAIT_WHILE:
+ * _AIO_WAIT_WHILE:
  * @ctx: the aio context, or NULL if multiple aio contexts (for which the
  *       caller does not hold a lock) are involved in the polling condition.
  * @cond: wait while this conditional expression is true
+ * @unlock: whether to unlock and then lock again @ctx
  *
  * Wait while a condition is true.  Use this to implement synchronous
  * operations that require event loop activity.
@@ -75,7 +76,7 @@ extern AioWait global_aio_wait;
  * wait on conditions between two IOThreads since that could lead to deadlock,
  * go via the main loop instead.
  */
-#define AIO_WAIT_WHILE(ctx, cond) ({                               \
+#define _AIO_WAIT_WHILE(ctx, cond, unlock) ({                      \
     bool waited_ = false;                                          \
     AioWait *wait_ = &global_aio_wait;                             \
     AioContext *ctx_ = (ctx);                                      \
@@ -90,11 +91,11 @@ extern AioWait global_aio_wait;
         assert(qemu_get_current_aio_context() ==                   \
                qemu_get_aio_context());                            \
         while ((cond)) {                                           \
-            if (ctx_) {                                            \
+            if (unlock && ctx_) {                                  \
                 aio_context_release(ctx_);                         \
             }                                                      \
             aio_poll(qemu_get_aio_context(), true);                \
-            if (ctx_) {                                            \
+            if (unlock && ctx_) {                                  \
                 aio_context_acquire(ctx_);                         \
             }                                                      \
             waited_ = true;                                        \
@@ -103,6 +104,12 @@ extern AioWait global_aio_wait;
     qatomic_dec(&wait_->num_waiters);                              \
     waited_; })
 
+#define AIO_WAIT_WHILE(ctx, cond)                                  \
+    _AIO_WAIT_WHILE(ctx, cond, true)
+
+#define AIO_WAIT_WHILE_UNLOCKED(ctx, cond)                         \
+    _AIO_WAIT_WHILE(ctx, cond, false)
+
 /**
  * aio_wait_kick:
  * Wake up the main thread if it is waiting on AIO_WAIT_WHILE().  During
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 09/16] jobs: remove aiocontext locks since the functions are under BQL
  2022-01-05 14:01 [PATCH v3 00/16] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (7 preceding siblings ...)
  2022-01-05 14:02 ` [PATCH v3 08/16] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
@ 2022-01-05 14:02 ` Emanuele Giuseppe Esposito
  2022-01-19 11:09   ` Paolo Bonzini
  2022-01-05 14:02 ` [PATCH v3 10/16] jobs: protect jobs with job_lock/unlock Emanuele Giuseppe Esposito
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-05 14:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

In preparation to the job_lock/unlock patch, remove these
aiocontext locks.
The main reason these two locks are removed here is because
they are inside a loop iterating on the jobs list. Once the
job_lock is added, it will have to protect the whole loop,
wrapping also the aiocontext acquire/release.

We don't want this, as job_lock can only be *wrapped by*
the aiocontext lock, and not vice-versa, to avoid deadlocks.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 blockdev.c | 4 ----
 job-qmp.c  | 4 ----
 2 files changed, 8 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 11fd651bde..ee35aff13a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3707,15 +3707,11 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 
     for (job = block_job_next(NULL); job; job = block_job_next(job)) {
         BlockJobInfo *value;
-        AioContext *aio_context;
 
         if (block_job_is_internal(job)) {
             continue;
         }
-        aio_context = blk_get_aio_context(job->blk);
-        aio_context_acquire(aio_context);
         value = block_job_query(job, errp);
-        aio_context_release(aio_context);
         if (!value) {
             qapi_free_BlockJobInfoList(head);
             return NULL;
diff --git a/job-qmp.c b/job-qmp.c
index de4120a1d4..f6f9840436 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -173,15 +173,11 @@ JobInfoList *qmp_query_jobs(Error **errp)
 
     for (job = job_next_locked(NULL); job; job = job_next_locked(job)) {
         JobInfo *value;
-        AioContext *aio_context;
 
         if (job_is_internal(job)) {
             continue;
         }
-        aio_context = job->aio_context;
-        aio_context_acquire(aio_context);
         value = job_query_single(job, errp);
-        aio_context_release(aio_context);
         if (!value) {
             qapi_free_JobInfoList(head);
             return NULL;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 10/16] jobs: protect jobs with job_lock/unlock
  2022-01-05 14:01 [PATCH v3 00/16] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (8 preceding siblings ...)
  2022-01-05 14:02 ` [PATCH v3 09/16] jobs: remove aiocontext locks since the functions are under BQL Emanuele Giuseppe Esposito
@ 2022-01-05 14:02 ` Emanuele Giuseppe Esposito
  2022-01-19 10:50   ` Paolo Bonzini
  2022-01-05 14:02 ` [PATCH v3 11/16] jobs: document all static functions and add _locked() suffix Emanuele Giuseppe Esposito
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-05 14:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

Introduce the job locking mechanism through the whole job API,
following the comments and requirements of job-monitor (assume
lock is held) and job-driver (lock is not held).

job_{lock/unlock} is independent from real_job_{lock/unlock}.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block.c             |  18 +++---
 block/replication.c |   8 ++-
 blockdev.c          |  17 +++++-
 blockjob.c          |  64 ++++++++++++++-------
 job-qmp.c           |   2 +
 job.c               | 132 +++++++++++++++++++++++++++++++-------------
 monitor/qmp-cmds.c  |   6 +-
 qemu-img.c          |  41 ++++++++------
 8 files changed, 199 insertions(+), 89 deletions(-)

diff --git a/block.c b/block.c
index 8fcd525fa0..fac0759422 100644
--- a/block.c
+++ b/block.c
@@ -4976,7 +4976,9 @@ static void bdrv_close(BlockDriverState *bs)
 
 void bdrv_close_all(void)
 {
-    assert(job_next_locked(NULL) == NULL);
+    WITH_JOB_LOCK_GUARD() {
+        assert(job_next_locked(NULL) == NULL);
+    }
     assert(qemu_in_main_thread());
 
     /* Drop references from requests still in flight, such as canceled block
@@ -6154,13 +6156,15 @@ XDbgBlockGraph *bdrv_get_xdbg_block_graph(Error **errp)
         }
     }
 
-    for (job = block_job_next(NULL); job; job = block_job_next(job)) {
-        GSList *el;
+    WITH_JOB_LOCK_GUARD() {
+        for (job = block_job_next(NULL); job; job = block_job_next(job)) {
+            GSList *el;
 
-        xdbg_graph_add_node(gr, job, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
-                           job->job.id);
-        for (el = job->nodes; el; el = el->next) {
-            xdbg_graph_add_edge(gr, job, (BdrvChild *)el->data);
+            xdbg_graph_add_node(gr, job, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
+                                job->job.id);
+            for (el = job->nodes; el; el = el->next) {
+                xdbg_graph_add_edge(gr, job, (BdrvChild *)el->data);
+            }
         }
     }
 
diff --git a/block/replication.c b/block/replication.c
index 5215c328c1..50ea778937 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -149,7 +149,9 @@ static void replication_close(BlockDriverState *bs)
     if (s->stage == BLOCK_REPLICATION_FAILOVER) {
         commit_job = &s->commit_job->job;
         assert(commit_job->aio_context == qemu_get_current_aio_context());
-        job_cancel_sync_locked(commit_job, false);
+        WITH_JOB_LOCK_GUARD() {
+            job_cancel_sync_locked(commit_job, false);
+        }
     }
 
     if (s->mode == REPLICATION_MODE_SECONDARY) {
@@ -726,7 +728,9 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
          * disk, secondary disk in backup_job_completed().
          */
         if (s->backup_job) {
-            job_cancel_sync_locked(&s->backup_job->job, true);
+            WITH_JOB_LOCK_GUARD() {
+                job_cancel_sync_locked(&s->backup_job->job, true);
+            }
         }
 
         if (!failover) {
diff --git a/blockdev.c b/blockdev.c
index ee35aff13a..099d57e0d2 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -155,6 +155,8 @@ void blockdev_mark_auto_del(BlockBackend *blk)
         return;
     }
 
+    JOB_LOCK_GUARD();
+
     for (job = block_job_next(NULL); job; job = block_job_next(job)) {
         if (block_job_has_bdrv(job, blk_bs(blk))) {
             AioContext *aio_context = job->job.aio_context;
@@ -1832,7 +1834,9 @@ static void drive_backup_abort(BlkActionState *common)
         aio_context = bdrv_get_aio_context(state->bs);
         aio_context_acquire(aio_context);
 
-        job_cancel_sync_locked(&state->job->job, true);
+        WITH_JOB_LOCK_GUARD() {
+            job_cancel_sync_locked(&state->job->job, true);
+        }
 
         aio_context_release(aio_context);
     }
@@ -1933,7 +1937,9 @@ static void blockdev_backup_abort(BlkActionState *common)
         aio_context = bdrv_get_aio_context(state->bs);
         aio_context_acquire(aio_context);
 
-        job_cancel_sync_locked(&state->job->job, true);
+        WITH_JOB_LOCK_GUARD() {
+            job_cancel_sync_locked(&state->job->job, true);
+        }
 
         aio_context_release(aio_context);
     }
@@ -2382,7 +2388,10 @@ exit:
     if (!has_props) {
         qapi_free_TransactionProperties(props);
     }
-    job_txn_unref_locked(block_job_txn);
+
+    WITH_JOB_LOCK_GUARD() {
+        job_txn_unref_locked(block_job_txn);
+    }
 }
 
 BlockDirtyBitmapSha256 *qmp_x_debug_block_dirty_bitmap_sha256(const char *node,
@@ -3705,6 +3714,8 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
     BlockJobInfoList *head = NULL, **tail = &head;
     BlockJob *job;
 
+    JOB_LOCK_GUARD();
+
     for (job = block_job_next(NULL); job; job = block_job_next(job)) {
         BlockJobInfo *value;
 
diff --git a/blockjob.c b/blockjob.c
index ce356be51e..e00c8d31d5 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -88,7 +88,9 @@ static char *child_job_get_parent_desc(BdrvChild *c)
 static void child_job_drained_begin(BdrvChild *c)
 {
     BlockJob *job = c->opaque;
-    job_pause_locked(&job->job);
+    WITH_JOB_LOCK_GUARD() {
+        job_pause_locked(&job->job);
+    }
 }
 
 static bool child_job_drained_poll(BdrvChild *c)
@@ -100,8 +102,10 @@ static bool child_job_drained_poll(BdrvChild *c)
     /* An inactive or completed job doesn't have any pending requests. Jobs
      * with !job->busy are either already paused or have a pause point after
      * being reentered, so no job driver code will run before they pause. */
-    if (!job->busy || job_is_completed_locked(job)) {
-        return false;
+    WITH_JOB_LOCK_GUARD() {
+        if (!job->busy || job_is_completed_locked(job)) {
+            return false;
+        }
     }
 
     /* Otherwise, assume that it isn't fully stopped yet, but allow the job to
@@ -116,7 +120,9 @@ static bool child_job_drained_poll(BdrvChild *c)
 static void child_job_drained_end(BdrvChild *c, int *drained_end_counter)
 {
     BlockJob *job = c->opaque;
-    job_resume_locked(&job->job);
+    WITH_JOB_LOCK_GUARD() {
+        job_resume_locked(&job->job);
+    }
 }
 
 static bool child_job_can_set_aio_ctx(BdrvChild *c, AioContext *ctx,
@@ -238,7 +244,13 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
 
 static void block_job_on_idle(Notifier *n, void *opaque)
 {
+    /*
+     * we can't kick with job_mutex held, but we also want
+     * to protect the notifier list.
+     */
+    job_unlock();
     aio_wait_kick();
+    job_lock();
 }
 
 bool block_job_is_internal(BlockJob *job)
@@ -278,7 +290,9 @@ bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
     job->speed = speed;
 
     if (drv->set_speed) {
+        job_unlock();
         drv->set_speed(job, speed);
+        job_lock();
     }
 
     if (speed && speed <= old_speed) {
@@ -458,13 +472,15 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     job->ready_notifier.notify = block_job_event_ready;
     job->idle_notifier.notify = block_job_on_idle;
 
-    notifier_list_add(&job->job.on_finalize_cancelled,
-                      &job->finalize_cancelled_notifier);
-    notifier_list_add(&job->job.on_finalize_completed,
-                      &job->finalize_completed_notifier);
-    notifier_list_add(&job->job.on_pending, &job->pending_notifier);
-    notifier_list_add(&job->job.on_ready, &job->ready_notifier);
-    notifier_list_add(&job->job.on_idle, &job->idle_notifier);
+    WITH_JOB_LOCK_GUARD() {
+        notifier_list_add(&job->job.on_finalize_cancelled,
+                          &job->finalize_cancelled_notifier);
+        notifier_list_add(&job->job.on_finalize_completed,
+                          &job->finalize_completed_notifier);
+        notifier_list_add(&job->job.on_pending, &job->pending_notifier);
+        notifier_list_add(&job->job.on_ready, &job->ready_notifier);
+        notifier_list_add(&job->job.on_idle, &job->idle_notifier);
+    }
 
     error_setg(&job->blocker, "block device is in use by block job: %s",
                job_type_str(&job->job));
@@ -477,11 +493,14 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     blk_set_disable_request_queuing(blk, true);
     blk_set_allow_aio_context_change(blk, true);
 
-    if (!block_job_set_speed(job, speed, errp)) {
-        job_early_fail(&job->job);
-        return NULL;
+    WITH_JOB_LOCK_GUARD() {
+        if (!block_job_set_speed(job, speed, errp)) {
+            job_early_fail_locked(&job->job);
+            return NULL;
+        }
     }
 
+
     return job;
 }
 
@@ -499,7 +518,9 @@ void block_job_user_resume(Job *job)
 {
     BlockJob *bjob = container_of(job, BlockJob, job);
     assert(qemu_in_main_thread());
-    block_job_iostatus_reset(bjob);
+    WITH_JOB_LOCK_GUARD() {
+        block_job_iostatus_reset(bjob);
+    }
 }
 
 BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
@@ -532,10 +553,15 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
                                         action);
     }
     if (action == BLOCK_ERROR_ACTION_STOP) {
-        if (!job->job.user_paused) {
-            job_pause_locked(&job->job);
-            /* make the pause user visible, which will be resumed from QMP. */
-            job->job.user_paused = true;
+        WITH_JOB_LOCK_GUARD() {
+            if (!job->job.user_paused) {
+                job_pause_locked(&job->job);
+                /*
+                 * make the pause user visible, which will be
+                 * resumed from QMP.
+                 */
+                job->job.user_paused = true;
+            }
         }
         block_job_iostatus_set_err(job, error);
     }
diff --git a/job-qmp.c b/job-qmp.c
index f6f9840436..9fa14bf761 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -171,6 +171,8 @@ JobInfoList *qmp_query_jobs(Error **errp)
     JobInfoList *head = NULL, **tail = &head;
     Job *job;
 
+    JOB_LOCK_GUARD();
+
     for (job = job_next_locked(NULL); job; job = job_next_locked(job)) {
         JobInfo *value;
 
diff --git a/job.c b/job.c
index 2ee7233763..56722a5043 100644
--- a/job.c
+++ b/job.c
@@ -394,6 +394,8 @@ void *job_create(const char *job_id, const JobDriver *driver, JobTxn *txn,
 {
     Job *job;
 
+    JOB_LOCK_GUARD();
+
     if (job_id) {
         if (flags & JOB_INTERNAL) {
             error_setg(errp, "Cannot specify job ID for internal job");
@@ -467,7 +469,9 @@ void job_unref_locked(Job *job)
         assert(!job->txn);
 
         if (job->driver->free) {
+            job_unlock();
             job->driver->free(job);
+            job_lock();
         }
 
         QLIST_REMOVE(job, job_list);
@@ -551,11 +555,14 @@ void job_enter_cond_locked(Job *job, bool(*fn)(Job *job))
     timer_del(&job->sleep_timer);
     job->busy = true;
     real_job_unlock();
+    job_unlock();
     aio_co_enter(job->aio_context, job->co);
+    job_lock();
 }
 
 void job_enter(Job *job)
 {
+    JOB_LOCK_GUARD();
     job_enter_cond_locked(job, NULL);
 }
 
@@ -574,7 +581,9 @@ static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
     job->busy = false;
     job_event_idle(job);
     real_job_unlock();
+    job_unlock();
     qemu_coroutine_yield();
+    job_lock();
 
     /* Set by job_enter_cond_locked() before re-entering the coroutine.  */
     assert(job->busy);
@@ -584,18 +593,23 @@ void coroutine_fn job_pause_point(Job *job)
 {
     assert(job && job_started(job));
 
+    job_lock();
     if (!job_should_pause(job)) {
+        job_unlock();
         return;
     }
-    if (job_is_cancelled(job)) {
+    if (job_is_cancelled_locked(job)) {
+        job_unlock();
         return;
     }
 
     if (job->driver->pause) {
+        job_unlock();
         job->driver->pause(job);
+        job_lock();
     }
 
-    if (job_should_pause(job) && !job_is_cancelled(job)) {
+    if (job_should_pause(job) && !job_is_cancelled_locked(job)) {
         JobStatus status = job->status;
         job_state_transition(job, status == JOB_STATUS_READY
                                   ? JOB_STATUS_STANDBY
@@ -605,6 +619,7 @@ void coroutine_fn job_pause_point(Job *job)
         job->paused = false;
         job_state_transition(job, status);
     }
+    job_unlock();
 
     if (job->driver->resume) {
         job->driver->resume(job);
@@ -613,15 +628,17 @@ void coroutine_fn job_pause_point(Job *job)
 
 void job_yield(Job *job)
 {
-    assert(job->busy);
+    WITH_JOB_LOCK_GUARD() {
+        assert(job->busy);
 
-    /* Check cancellation *before* setting busy = false, too!  */
-    if (job_is_cancelled(job)) {
-        return;
-    }
+        /* Check cancellation *before* setting busy = false, too!  */
+        if (job_is_cancelled_locked(job)) {
+            return;
+        }
 
-    if (!job_should_pause(job)) {
-        job_do_yield(job, -1);
+        if (!job_should_pause(job)) {
+            job_do_yield(job, -1);
+        }
     }
 
     job_pause_point(job);
@@ -629,21 +646,23 @@ void job_yield(Job *job)
 
 void coroutine_fn job_sleep_ns(Job *job, int64_t ns)
 {
-    assert(job->busy);
+    WITH_JOB_LOCK_GUARD() {
+        assert(job->busy);
 
-    /* Check cancellation *before* setting busy = false, too!  */
-    if (job_is_cancelled(job)) {
-        return;
-    }
+        /* Check cancellation *before* setting busy = false, too!  */
+        if (job_is_cancelled_locked(job)) {
+            return;
+        }
 
-    if (!job_should_pause(job)) {
-        job_do_yield(job, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + ns);
+        if (!job_should_pause(job)) {
+            job_do_yield(job, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + ns);
+        }
     }
 
     job_pause_point(job);
 }
 
-/* Assumes the block_job_mutex is held */
+/* Assumes the job_mutex is held */
 static bool job_timer_not_pending(Job *job)
 {
     return !timer_pending(&job->sleep_timer);
@@ -653,7 +672,7 @@ void job_pause_locked(Job *job)
 {
     job->pause_count++;
     if (!job->paused) {
-        job_enter(job);
+        job_enter_cond_locked(job, NULL);
     }
 }
 
@@ -699,7 +718,9 @@ void job_user_resume_locked(Job *job, Error **errp)
         return;
     }
     if (job->driver->user_resume) {
+        job_unlock();
         job->driver->user_resume(job);
+        job_lock();
     }
     job->user_paused = false;
     job_resume_locked(job);
@@ -753,7 +774,7 @@ static void job_conclude(Job *job)
 
 static void job_update_rc(Job *job)
 {
-    if (!job->ret && job_is_cancelled(job)) {
+    if (!job->ret && job_is_cancelled_locked(job)) {
         job->ret = -ECANCELED;
     }
     if (job->ret) {
@@ -769,7 +790,9 @@ static void job_commit(Job *job)
     assert(!job->ret);
     assert(qemu_in_main_thread());
     if (job->driver->commit) {
+        job_unlock();
         job->driver->commit(job);
+        job_lock();
     }
 }
 
@@ -778,7 +801,9 @@ static void job_abort(Job *job)
     assert(job->ret);
     assert(qemu_in_main_thread());
     if (job->driver->abort) {
+        job_unlock();
         job->driver->abort(job);
+        job_lock();
     }
 }
 
@@ -786,12 +811,15 @@ static void job_clean(Job *job)
 {
     assert(qemu_in_main_thread());
     if (job->driver->clean) {
+        job_unlock();
         job->driver->clean(job);
+        job_lock();
     }
 }
 
 static int job_finalize_single(Job *job)
 {
+    int job_ret;
     AioContext *ctx = job->aio_context;
 
     assert(job_is_completed_locked(job));
@@ -811,12 +839,15 @@ static int job_finalize_single(Job *job)
     aio_context_release(ctx);
 
     if (job->cb) {
-        job->cb(job->opaque, job->ret);
+        job_ret = job->ret;
+        job_unlock();
+        job->cb(job->opaque, job_ret);
+        job_lock();
     }
 
     /* Emit events only if we actually started */
     if (job_started(job)) {
-        if (job_is_cancelled(job)) {
+        if (job_is_cancelled_locked(job)) {
             job_event_cancelled(job);
         } else {
             job_event_completed(job);
@@ -832,7 +863,9 @@ static void job_cancel_async(Job *job, bool force)
 {
     assert(qemu_in_main_thread());
     if (job->driver->cancel) {
+        job_unlock();
         force = job->driver->cancel(job, force);
+        job_lock();
     } else {
         /* No .cancel() means the job will behave as if force-cancelled */
         force = true;
@@ -841,7 +874,9 @@ static void job_cancel_async(Job *job, bool force)
     if (job->user_paused) {
         /* Do not call job_enter here, the caller will handle it.  */
         if (job->driver->user_resume) {
+            job_unlock();
             job->driver->user_resume(job);
+            job_lock();
         }
         job->user_paused = false;
         assert(job->pause_count > 0);
@@ -911,7 +946,7 @@ static void job_completed_txn_abort(Job *job)
         ctx = other_job->aio_context;
         aio_context_acquire(ctx);
         if (!job_is_completed_locked(other_job)) {
-            assert(job_cancel_requested(other_job));
+            assert(job_cancel_requested_locked(other_job));
             job_finish_sync_locked(other_job, NULL, NULL);
         }
         job_finalize_single(other_job);
@@ -930,13 +965,17 @@ static void job_completed_txn_abort(Job *job)
 
 static int job_prepare(Job *job)
 {
+    int ret;
     AioContext *ctx = job->aio_context;
     assert(qemu_in_main_thread());
 
     if (job->ret == 0 && job->driver->prepare) {
+        job_unlock();
         aio_context_acquire(ctx);
-        job->ret = job->driver->prepare(job);
+        ret = job->driver->prepare(job);
         aio_context_release(ctx);
+        job_lock();
+        job->ret = ret;
         job_update_rc(job);
     }
 
@@ -982,6 +1021,7 @@ static int job_transition_to_pending(Job *job)
 
 void job_transition_to_ready(Job *job)
 {
+    JOB_LOCK_GUARD();
     job_state_transition(job, JOB_STATUS_READY);
     job_event_ready(job);
 }
@@ -1031,6 +1071,7 @@ static void job_exit(void *opaque)
     Job *job = (Job *)opaque;
     AioContext *ctx;
 
+    JOB_LOCK_GUARD();
     job_ref_locked(job);
     aio_context_acquire(job->aio_context);
 
@@ -1061,13 +1102,17 @@ static void job_exit(void *opaque)
 static void coroutine_fn job_co_entry(void *opaque)
 {
     Job *job = opaque;
+    int ret;
 
     assert(job->aio_context == qemu_get_current_aio_context());
     assert(job && job->driver && job->driver->run);
     job_pause_point(job);
-    job->ret = job->driver->run(job, &job->err);
-    job->deferred_to_main_loop = true;
-    job->busy = true;
+    ret = job->driver->run(job, &job->err);
+    WITH_JOB_LOCK_GUARD() {
+        job->ret = ret;
+        job->deferred_to_main_loop = true;
+        job->busy = true;
+    }
     aio_bh_schedule_oneshot(qemu_get_aio_context(), job_exit, job);
 }
 
@@ -1083,16 +1128,20 @@ static int job_pre_run(Job *job)
 
 void job_start(Job *job)
 {
-    assert(job && !job_started(job) && job->paused &&
-           job->driver && job->driver->run);
-    job->co = qemu_coroutine_create(job_co_entry, job);
+    WITH_JOB_LOCK_GUARD() {
+        assert(job && !job_started(job) && job->paused &&
+            job->driver && job->driver->run);
+        job->co = qemu_coroutine_create(job_co_entry, job);
+    }
     if (job_pre_run(job)) {
         return;
     }
-    job->pause_count--;
-    job->busy = true;
-    job->paused = false;
-    job_state_transition(job, JOB_STATUS_RUNNING);
+    WITH_JOB_LOCK_GUARD() {
+        job->pause_count--;
+        job->busy = true;
+        job->paused = false;
+        job_state_transition(job, JOB_STATUS_RUNNING);
+    }
     aio_co_enter(job->aio_context, job->co);
 }
 
@@ -1116,11 +1165,11 @@ void job_cancel_locked(Job *job, bool force)
          * choose to call job_is_cancelled() to show that we invoke
          * job_completed_txn_abort() only for force-cancelled jobs.)
          */
-        if (job_is_cancelled(job)) {
+        if (job_is_cancelled_locked(job)) {
             job_completed_txn_abort(job);
         }
     } else {
-        job_enter(job);
+        job_enter_cond_locked(job, NULL);
     }
 }
 
@@ -1164,6 +1213,7 @@ void job_cancel_sync_all(void)
     Job *job;
     AioContext *aio_context;
 
+    JOB_LOCK_GUARD();
     while ((job = job_next_locked(NULL))) {
         aio_context = job->aio_context;
         aio_context_acquire(aio_context);
@@ -1185,13 +1235,15 @@ void job_complete_locked(Job *job, Error **errp)
     if (job_apply_verb_locked(job, JOB_VERB_COMPLETE, errp)) {
         return;
     }
-    if (job_cancel_requested(job) || !job->driver->complete) {
+    if (job_cancel_requested_locked(job) || !job->driver->complete) {
         error_setg(errp, "The active block job '%s' cannot be completed",
                    job->id);
         return;
     }
 
+    job_unlock();
     job->driver->complete(job, errp);
+    job_lock();
 }
 
 int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
@@ -1211,10 +1263,12 @@ int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
         return -EBUSY;
     }
 
-    AIO_WAIT_WHILE(job->aio_context,
-                   (job_enter(job), !job_is_completed_locked(job)));
+    job_unlock();
+    AIO_WAIT_WHILE(job->aio_context, (job_enter(job), !job_is_completed(job)));
+    job_lock();
 
-    ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : job->ret;
+    ret = (job_is_cancelled_locked(job) && job->ret == 0) ?
+           -ECANCELED : job->ret;
     job_unref_locked(job);
     return ret;
 }
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 343353e27a..2f11d086a6 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -133,8 +133,10 @@ void qmp_cont(Error **errp)
         blk_iostatus_reset(blk);
     }
 
-    for (job = block_job_next(NULL); job; job = block_job_next(job)) {
-        block_job_iostatus_reset(job);
+    WITH_JOB_LOCK_GUARD() {
+        for (job = block_job_next(NULL); job; job = block_job_next(job)) {
+            block_job_iostatus_reset(job);
+        }
     }
 
     /* Continuing after completed migration. Images have been inactivated to
diff --git a/qemu-img.c b/qemu-img.c
index 09f3b11eab..95e2e33e61 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -906,25 +906,30 @@ static void run_block_job(BlockJob *job, Error **errp)
     int ret = 0;
 
     aio_context_acquire(aio_context);
-    job_ref_locked(&job->job);
-    do {
-        float progress = 0.0f;
-        aio_poll(aio_context, true);
+    WITH_JOB_LOCK_GUARD() {
+        job_ref_locked(&job->job);
+        do {
+            float progress = 0.0f;
+            job_unlock();
+            aio_poll(aio_context, true);
+
+            progress_get_snapshot(&job->job.progress, &progress_current,
+                                &progress_total);
+            if (progress_total) {
+                progress = (float)progress_current / progress_total * 100.f;
+            }
+            qemu_progress_print(progress, 0);
+            job_lock();
+        } while (!job_is_ready_locked(&job->job) &&
+                !job_is_completed_locked(&job->job));
 
-        progress_get_snapshot(&job->job.progress, &progress_current,
-                              &progress_total);
-        if (progress_total) {
-            progress = (float)progress_current / progress_total * 100.f;
+        if (!job_is_completed_locked(&job->job)) {
+            ret = job_complete_sync_locked(&job->job, errp);
+        } else {
+            ret = job->job.ret;
         }
-        qemu_progress_print(progress, 0);
-    } while (!job_is_ready(&job->job) && !job_is_completed_locked(&job->job));
-
-    if (!job_is_completed_locked(&job->job)) {
-        ret = job_complete_sync_locked(&job->job, errp);
-    } else {
-        ret = job->job.ret;
+        job_unref_locked(&job->job);
     }
-    job_unref_locked(&job->job);
     aio_context_release(aio_context);
 
     /* publish completion progress only when success */
@@ -1077,7 +1082,9 @@ static int img_commit(int argc, char **argv)
         bdrv_ref(bs);
     }
 
-    job = block_job_get("commit");
+    WITH_JOB_LOCK_GUARD() {
+        job = block_job_get("commit");
+    }
     assert(job);
     run_block_job(job, &local_err);
     if (local_err) {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 11/16] jobs: document all static functions and add _locked() suffix
  2022-01-05 14:01 [PATCH v3 00/16] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (9 preceding siblings ...)
  2022-01-05 14:02 ` [PATCH v3 10/16] jobs: protect jobs with job_lock/unlock Emanuele Giuseppe Esposito
@ 2022-01-05 14:02 ` Emanuele Giuseppe Esposito
  2022-01-05 14:02 ` [PATCH v3 12/16] jobs: use job locks and helpers also in the unit tests Emanuele Giuseppe Esposito
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-05 14:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

Now that we added the job_lock/unlock pairs, we can also
rename all static functions in job.c that are called with
the job mutex held as _locked(), and add a little comment
on top.

No functional change intended.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 blockjob.c |   8 ++
 job.c      | 243 +++++++++++++++++++++++++++++++----------------------
 2 files changed, 149 insertions(+), 102 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index e00c8d31d5..cf1f49f6c2 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -242,6 +242,7 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
     return 0;
 }
 
+/* Called with job_mutex lock held. */
 static void block_job_on_idle(Notifier *n, void *opaque)
 {
     /*
@@ -269,6 +270,7 @@ static bool job_timer_pending(Job *job)
     return timer_pending(&job->sleep_timer);
 }
 
+/* Called with job_mutex held. May temporarly release the lock. */
 bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
 {
     const BlockJobDriver *drv = block_job_driver(job);
@@ -310,6 +312,7 @@ int64_t block_job_ratelimit_get_delay(BlockJob *job, uint64_t n)
     return ratelimit_calculate_delay(&job->limit, n);
 }
 
+/* Called with job_mutex lock held. */
 BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
 {
     BlockJobInfo *info;
@@ -355,6 +358,7 @@ static void block_job_iostatus_set_err(BlockJob *job, int error)
     }
 }
 
+/* Called with job_mutex lock held. */
 static void block_job_event_cancelled(Notifier *n, void *opaque)
 {
     BlockJob *job = opaque;
@@ -374,6 +378,7 @@ static void block_job_event_cancelled(Notifier *n, void *opaque)
                                         job->speed);
 }
 
+/* Called with job_mutex lock held. */
 static void block_job_event_completed(Notifier *n, void *opaque)
 {
     BlockJob *job = opaque;
@@ -400,6 +405,7 @@ static void block_job_event_completed(Notifier *n, void *opaque)
                                         msg);
 }
 
+/* Called with job_mutex lock held. */
 static void block_job_event_pending(Notifier *n, void *opaque)
 {
     BlockJob *job = opaque;
@@ -412,6 +418,7 @@ static void block_job_event_pending(Notifier *n, void *opaque)
                                       job->job.id);
 }
 
+/* Called with job_mutex lock held. */
 static void block_job_event_ready(Notifier *n, void *opaque)
 {
     BlockJob *job = opaque;
@@ -504,6 +511,7 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     return job;
 }
 
+/* Called with job_mutex lock held. */
 void block_job_iostatus_reset(BlockJob *job)
 {
     assert(qemu_in_main_thread());
diff --git a/job.c b/job.c
index 56722a5043..f16a4ef542 100644
--- a/job.c
+++ b/job.c
@@ -54,6 +54,7 @@
  */
 QemuMutex job_mutex;
 
+/* Protected by job_mutex */
 static QLIST_HEAD(, Job) jobs = QLIST_HEAD_INITIALIZER(jobs);
 
 /* Job State Transition Table */
@@ -129,7 +130,8 @@ JobTxn *job_txn_new(void)
     return txn;
 }
 
-static void job_txn_ref(JobTxn *txn)
+/* Called with job_mutex held. */
+static void job_txn_ref_locked(JobTxn *txn)
 {
     txn->refcnt++;
 }
@@ -151,10 +153,11 @@ void job_txn_add_job_locked(JobTxn *txn, Job *job)
     job->txn = txn;
 
     QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
-    job_txn_ref(txn);
+    job_txn_ref_locked(txn);
 }
 
-static void job_txn_del_job(Job *job)
+/* Called with job_mutex held. */
+static void job_txn_del_job_locked(Job *job)
 {
     if (job->txn) {
         QLIST_REMOVE(job, txn_list);
@@ -163,17 +166,18 @@ static void job_txn_del_job(Job *job)
     }
 }
 
-static int job_txn_apply(Job *job, int fn(Job *))
+/* Called with job_mutex held. */
+static int job_txn_apply_locked(Job *job, int fn(Job *))
 {
     Job *other_job, *next;
     JobTxn *txn = job->txn;
     int rc = 0;
 
     /*
-     * Similar to job_completed_txn_abort, we take each job's lock before
-     * applying fn, but since we assume that outer_ctx is held by the caller,
-     * we need to release it here to avoid holding the lock twice - which would
-     * break AIO_WAIT_WHILE from within fn.
+     * Similar to job_completed_txn_abort_locked, we take each job's lock
+     * before applying fn, but since we assume that outer_ctx is held by the
+     * caller, we need to release it here to avoid holding the lock
+     * twice - which would break AIO_WAIT_WHILE from within fn.
      */
     job_ref_locked(job);
     aio_context_release(job->aio_context);
@@ -199,7 +203,8 @@ bool job_is_internal(Job *job)
     return (job->id == NULL);
 }
 
-static void job_state_transition(Job *job, JobStatus s1)
+/* Called with job_mutex held. */
+static void job_state_transition_locked(Job *job, JobStatus s1)
 {
     JobStatus s0 = job->status;
     assert(s1 >= 0 && s1 < JOB_STATUS__MAX);
@@ -355,7 +360,8 @@ static bool job_started(Job *job)
     return job->co;
 }
 
-static bool job_should_pause(Job *job)
+/* Called with job_mutex held. */
+static bool job_should_pause_locked(Job *job)
 {
     return job->pause_count > 0;
 }
@@ -381,6 +387,7 @@ Job *job_get_locked(const char *id)
     return NULL;
 }
 
+/* Called with job_mutex *not* held. */
 static void job_sleep_timer_cb(void *opaque)
 {
     Job *job = opaque;
@@ -434,7 +441,7 @@ void *job_create(const char *job_id, const JobDriver *driver, JobTxn *txn,
     notifier_list_init(&job->on_pending);
     notifier_list_init(&job->on_ready);
 
-    job_state_transition(job, JOB_STATUS_CREATED);
+    job_state_transition_locked(job, JOB_STATUS_CREATED);
     aio_timer_init(qemu_get_aio_context(), &job->sleep_timer,
                    QEMU_CLOCK_REALTIME, SCALE_NS,
                    job_sleep_timer_cb, job);
@@ -502,7 +509,7 @@ void job_progress_increase_remaining(Job *job, uint64_t delta)
  * To be called when a cancelled job is finalised.
  * Called with job_mutex held.
  */
-static void job_event_cancelled(Job *job)
+static void job_event_cancelled_locked(Job *job)
 {
     notifier_list_notify(&job->on_finalize_cancelled, job);
 }
@@ -511,22 +518,25 @@ static void job_event_cancelled(Job *job)
  * To be called when a successfully completed job is finalised.
  * Called with job_mutex held.
  */
-static void job_event_completed(Job *job)
+static void job_event_completed_locked(Job *job)
 {
     notifier_list_notify(&job->on_finalize_completed, job);
 }
 
-static void job_event_pending(Job *job)
+/* Called with job_mutex held. */
+static void job_event_pending_locked(Job *job)
 {
     notifier_list_notify(&job->on_pending, job);
 }
 
-static void job_event_ready(Job *job)
+/* Called with job_mutex held. */
+static void job_event_ready_locked(Job *job)
 {
     notifier_list_notify(&job->on_ready, job);
 }
 
-static void job_event_idle(Job *job)
+/* Called with job_mutex held. */
+static void job_event_idle_locked(Job *job)
 {
     notifier_list_notify(&job->on_idle, job);
 }
@@ -571,15 +581,18 @@ void job_enter(Job *job)
  * is allowed and cancels the timer.
  *
  * If @ns is (uint64_t) -1, no timer is scheduled and job_enter() must be
- * called explicitly. */
-static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
+ * called explicitly.
+ *
+ * Called with job_mutex held, but releases it temporarly.
+ */
+static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns)
 {
     real_job_lock();
     if (ns != -1) {
         timer_mod(&job->sleep_timer, ns);
     }
     job->busy = false;
-    job_event_idle(job);
+    job_event_idle_locked(job);
     real_job_unlock();
     job_unlock();
     qemu_coroutine_yield();
@@ -594,7 +607,7 @@ void coroutine_fn job_pause_point(Job *job)
     assert(job && job_started(job));
 
     job_lock();
-    if (!job_should_pause(job)) {
+    if (!job_should_pause_locked(job)) {
         job_unlock();
         return;
     }
@@ -609,15 +622,15 @@ void coroutine_fn job_pause_point(Job *job)
         job_lock();
     }
 
-    if (job_should_pause(job) && !job_is_cancelled_locked(job)) {
+    if (job_should_pause_locked(job) && !job_is_cancelled_locked(job)) {
         JobStatus status = job->status;
-        job_state_transition(job, status == JOB_STATUS_READY
+        job_state_transition_locked(job, status == JOB_STATUS_READY
                                   ? JOB_STATUS_STANDBY
                                   : JOB_STATUS_PAUSED);
         job->paused = true;
-        job_do_yield(job, -1);
+        job_do_yield_locked(job, -1);
         job->paused = false;
-        job_state_transition(job, status);
+        job_state_transition_locked(job, status);
     }
     job_unlock();
 
@@ -636,8 +649,8 @@ void job_yield(Job *job)
             return;
         }
 
-        if (!job_should_pause(job)) {
-            job_do_yield(job, -1);
+        if (!job_should_pause_locked(job)) {
+            job_do_yield_locked(job, -1);
         }
     }
 
@@ -654,8 +667,9 @@ void coroutine_fn job_sleep_ns(Job *job, int64_t ns)
             return;
         }
 
-        if (!job_should_pause(job)) {
-            job_do_yield(job, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + ns);
+        if (!job_should_pause_locked(job)) {
+            job_do_yield_locked(job,
+                                qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + ns);
         }
     }
 
@@ -726,16 +740,17 @@ void job_user_resume_locked(Job *job, Error **errp)
     job_resume_locked(job);
 }
 
-static void job_do_dismiss(Job *job)
+/* Called with job_mutex held. */
+static void job_do_dismiss_locked(Job *job)
 {
     assert(job);
     job->busy = false;
     job->paused = false;
     job->deferred_to_main_loop = true;
 
-    job_txn_del_job(job);
+    job_txn_del_job_locked(job);
 
-    job_state_transition(job, JOB_STATUS_NULL);
+    job_state_transition_locked(job, JOB_STATUS_NULL);
     job_unref_locked(job);
 }
 
@@ -748,14 +763,14 @@ void job_dismiss_locked(Job **jobptr, Error **errp)
         return;
     }
 
-    job_do_dismiss(job);
+    job_do_dismiss_locked(job);
     *jobptr = NULL;
 }
 
 void job_early_fail_locked(Job *job)
 {
     assert(job->status == JOB_STATUS_CREATED);
-    job_do_dismiss(job);
+    job_do_dismiss_locked(job);
 }
 
 void job_early_fail(Job *job)
@@ -764,15 +779,17 @@ void job_early_fail(Job *job)
     job_early_fail_locked(job);
 }
 
-static void job_conclude(Job *job)
+/* Called with job_mutex held. */
+static void job_conclude_locked(Job *job)
 {
-    job_state_transition(job, JOB_STATUS_CONCLUDED);
+    job_state_transition_locked(job, JOB_STATUS_CONCLUDED);
     if (job->auto_dismiss || !job_started(job)) {
-        job_do_dismiss(job);
+        job_do_dismiss_locked(job);
     }
 }
 
-static void job_update_rc(Job *job)
+/* Called with job_mutex held. */
+static void job_update_rc_locked(Job *job)
 {
     if (!job->ret && job_is_cancelled_locked(job)) {
         job->ret = -ECANCELED;
@@ -781,11 +798,12 @@ static void job_update_rc(Job *job)
         if (!job->err) {
             error_setg(&job->err, "%s", strerror(-job->ret));
         }
-        job_state_transition(job, JOB_STATUS_ABORTING);
+        job_state_transition_locked(job, JOB_STATUS_ABORTING);
     }
 }
 
-static void job_commit(Job *job)
+/* Called with job_mutex held, but releases it temporarly */
+static void job_commit_locked(Job *job)
 {
     assert(!job->ret);
     assert(qemu_in_main_thread());
@@ -796,7 +814,8 @@ static void job_commit(Job *job)
     }
 }
 
-static void job_abort(Job *job)
+/* Called with job_mutex held, but releases it temporarly */
+static void job_abort_locked(Job *job)
 {
     assert(job->ret);
     assert(qemu_in_main_thread());
@@ -807,7 +826,8 @@ static void job_abort(Job *job)
     }
 }
 
-static void job_clean(Job *job)
+/* Called with job_mutex held, but releases it temporarly */
+static void job_clean_locked(Job *job)
 {
     assert(qemu_in_main_thread());
     if (job->driver->clean) {
@@ -817,7 +837,8 @@ static void job_clean(Job *job)
     }
 }
 
-static int job_finalize_single(Job *job)
+/* Called with job_mutex held, but releases it temporarly. */
+static int job_finalize_single_locked(Job *job)
 {
     int job_ret;
     AioContext *ctx = job->aio_context;
@@ -825,16 +846,16 @@ static int job_finalize_single(Job *job)
     assert(job_is_completed_locked(job));
 
     /* Ensure abort is called for late-transactional failures */
-    job_update_rc(job);
+    job_update_rc_locked(job);
 
     aio_context_acquire(ctx);
 
     if (!job->ret) {
-        job_commit(job);
+        job_commit_locked(job);
     } else {
-        job_abort(job);
+        job_abort_locked(job);
     }
-    job_clean(job);
+    job_clean_locked(job);
 
     aio_context_release(ctx);
 
@@ -848,18 +869,19 @@ static int job_finalize_single(Job *job)
     /* Emit events only if we actually started */
     if (job_started(job)) {
         if (job_is_cancelled_locked(job)) {
-            job_event_cancelled(job);
+            job_event_cancelled_locked(job);
         } else {
-            job_event_completed(job);
+            job_event_completed_locked(job);
         }
     }
 
-    job_txn_del_job(job);
-    job_conclude(job);
+    job_txn_del_job_locked(job);
+    job_conclude_locked(job);
     return 0;
 }
 
-static void job_cancel_async(Job *job, bool force)
+/* Called with job_mutex held, but releases it temporarly. */
+static void job_cancel_async_locked(Job *job, bool force)
 {
     assert(qemu_in_main_thread());
     if (job->driver->cancel) {
@@ -897,7 +919,8 @@ static void job_cancel_async(Job *job, bool force)
     }
 }
 
-static void job_completed_txn_abort(Job *job)
+/* Called with job_mutex held. */
+static void job_completed_txn_abort_locked(Job *job)
 {
     AioContext *ctx;
     JobTxn *txn = job->txn;
@@ -910,12 +933,12 @@ static void job_completed_txn_abort(Job *job)
         return;
     }
     txn->aborting = true;
-    job_txn_ref(txn);
+    job_txn_ref_locked(txn);
 
     /*
      * We can only hold the single job's AioContext lock while calling
-     * job_finalize_single() because the finalization callbacks can involve
-     * calls of AIO_WAIT_WHILE(), which could deadlock otherwise.
+     * job_finalize_single_locked() because the finalization callbacks can
+     *  involve calls of AIO_WAIT_WHILE(), which could deadlock otherwise.
      * Note that the job's AioContext may change when it is finalized.
      */
     job_ref_locked(job);
@@ -930,10 +953,10 @@ static void job_completed_txn_abort(Job *job)
             aio_context_acquire(ctx);
             /*
              * This is a transaction: If one job failed, no result will matter.
-             * Therefore, pass force=true to terminate all other jobs as quickly
-             * as possible.
+             * Therefore, pass force=true to terminate all other jobs as
+             * quickly as possible.
              */
-            job_cancel_async(other_job, true);
+            job_cancel_async_locked(other_job, true);
             aio_context_release(ctx);
         }
     }
@@ -949,13 +972,13 @@ static void job_completed_txn_abort(Job *job)
             assert(job_cancel_requested_locked(other_job));
             job_finish_sync_locked(other_job, NULL, NULL);
         }
-        job_finalize_single(other_job);
+        job_finalize_single_locked(other_job);
         aio_context_release(ctx);
     }
 
     /*
      * Use job_ref_locked()/job_unref_locked() so we can read the AioContext
-     * here even if the job went away during job_finalize_single().
+     * here even if the job went away during job_finalize_single_locked().
      */
     aio_context_acquire(job->aio_context);
     job_unref_locked(job);
@@ -963,7 +986,8 @@ static void job_completed_txn_abort(Job *job)
     job_txn_unref_locked(txn);
 }
 
-static int job_prepare(Job *job)
+/* Called with job_mutex held, but releases it temporarly. */
+static int job_prepare_locked(Job *job)
 {
     int ret;
     AioContext *ctx = job->aio_context;
@@ -976,28 +1000,30 @@ static int job_prepare(Job *job)
         aio_context_release(ctx);
         job_lock();
         job->ret = ret;
-        job_update_rc(job);
+        job_update_rc_locked(job);
     }
 
     return job->ret;
 }
 
-static int job_needs_finalize(Job *job)
+/* Called with job_mutex held. */
+static int job_needs_finalize_locked(Job *job)
 {
     return !job->auto_finalize;
 }
 
-static void job_do_finalize(Job *job)
+/* Called with job_mutex held. */
+static void job_do_finalize_locked(Job *job)
 {
     int rc;
     assert(job && job->txn);
 
     /* prepare the transaction to complete */
-    rc = job_txn_apply(job, job_prepare);
+    rc = job_txn_apply_locked(job, job_prepare_locked);
     if (rc) {
-        job_completed_txn_abort(job);
+        job_completed_txn_abort_locked(job);
     } else {
-        job_txn_apply(job, job_finalize_single);
+        job_txn_apply_locked(job, job_finalize_single_locked);
     }
 }
 
@@ -1007,14 +1033,15 @@ void job_finalize_locked(Job *job, Error **errp)
     if (job_apply_verb_locked(job, JOB_VERB_FINALIZE, errp)) {
         return;
     }
-    job_do_finalize(job);
+    job_do_finalize_locked(job);
 }
 
-static int job_transition_to_pending(Job *job)
+/* Called with job_mutex held. */
+static int job_transition_to_pending_locked(Job *job)
 {
-    job_state_transition(job, JOB_STATUS_PENDING);
+    job_state_transition_locked(job, JOB_STATUS_PENDING);
     if (!job->auto_finalize) {
-        job_event_pending(job);
+        job_event_pending_locked(job);
     }
     return 0;
 }
@@ -1022,16 +1049,17 @@ static int job_transition_to_pending(Job *job)
 void job_transition_to_ready(Job *job)
 {
     JOB_LOCK_GUARD();
-    job_state_transition(job, JOB_STATUS_READY);
-    job_event_ready(job);
+    job_state_transition_locked(job, JOB_STATUS_READY);
+    job_event_ready_locked(job);
 }
 
-static void job_completed_txn_success(Job *job)
+/* Called with job_mutex held. */
+static void job_completed_txn_success_locked(Job *job)
 {
     JobTxn *txn = job->txn;
     Job *other_job;
 
-    job_state_transition(job, JOB_STATUS_WAITING);
+    job_state_transition_locked(job, JOB_STATUS_WAITING);
 
     /*
      * Successful completion, see if there are other running jobs in this
@@ -1044,28 +1072,32 @@ static void job_completed_txn_success(Job *job)
         assert(other_job->ret == 0);
     }
 
-    job_txn_apply(job, job_transition_to_pending);
+    job_txn_apply_locked(job, job_transition_to_pending_locked);
 
     /* If no jobs need manual finalization, automatically do so */
-    if (job_txn_apply(job, job_needs_finalize) == 0) {
-        job_do_finalize(job);
+    if (job_txn_apply_locked(job, job_needs_finalize_locked) == 0) {
+        job_do_finalize_locked(job);
     }
 }
 
-static void job_completed(Job *job)
+/* Called with job_mutex held. */
+static void job_completed_locked(Job *job)
 {
     assert(job && job->txn && !job_is_completed_locked(job));
 
-    job_update_rc(job);
+    job_update_rc_locked(job);
     trace_job_completed(job, job->ret);
     if (job->ret) {
-        job_completed_txn_abort(job);
+        job_completed_txn_abort_locked(job);
     } else {
-        job_completed_txn_success(job);
+        job_completed_txn_success_locked(job);
     }
 }
 
-/** Useful only as a type shim for aio_bh_schedule_oneshot. */
+/**
+ * Useful only as a type shim for aio_bh_schedule_oneshot.
+ *  Called with job_mutex *not* held.
+ */
 static void job_exit(void *opaque)
 {
     Job *job = (Job *)opaque;
@@ -1080,15 +1112,15 @@ static void job_exit(void *opaque)
      * drain block nodes, and if .drained_poll still returned true, we would
      * deadlock. */
     job->busy = false;
-    job_event_idle(job);
+    job_event_idle_locked(job);
 
-    job_completed(job);
+    job_completed_locked(job);
 
     /*
-     * Note that calling job_completed can move the job to a different
-     * aio_context, so we cannot cache from above. job_txn_apply takes care of
-     * acquiring the new lock, and we ref/unref to avoid job_completed freeing
-     * the job underneath us.
+     * Note that calling job_completed_locked can move the job to a different
+     * aio_context, so we cannot cache from above. job_txn_apply_locked takes
+     * care of acquiring the new lock, and we ref/unref to avoid
+     * job_completed_locked freeing the job underneath us.
      */
     ctx = job->aio_context;
     job_unref_locked(job);
@@ -1098,6 +1130,8 @@ static void job_exit(void *opaque)
 /**
  * All jobs must allow a pause point before entering their job proper. This
  * ensures that jobs can be paused prior to being started, then resumed later.
+ *
+ * Called with job_mutex *not* held.
  */
 static void coroutine_fn job_co_entry(void *opaque)
 {
@@ -1116,6 +1150,7 @@ static void coroutine_fn job_co_entry(void *opaque)
     aio_bh_schedule_oneshot(qemu_get_aio_context(), job_exit, job);
 }
 
+/* Called with job_mutex *not* held. */
 static int job_pre_run(Job *job)
 {
     assert(qemu_in_main_thread());
@@ -1140,7 +1175,7 @@ void job_start(Job *job)
         job->pause_count--;
         job->busy = true;
         job->paused = false;
-        job_state_transition(job, JOB_STATUS_RUNNING);
+        job_state_transition_locked(job, JOB_STATUS_RUNNING);
     }
     aio_co_enter(job->aio_context, job->co);
 }
@@ -1148,25 +1183,25 @@ void job_start(Job *job)
 void job_cancel_locked(Job *job, bool force)
 {
     if (job->status == JOB_STATUS_CONCLUDED) {
-        job_do_dismiss(job);
+        job_do_dismiss_locked(job);
         return;
     }
-    job_cancel_async(job, force);
+    job_cancel_async_locked(job, force);
     if (!job_started(job)) {
-        job_completed(job);
+        job_completed_locked(job);
     } else if (job->deferred_to_main_loop) {
         /*
-         * job_cancel_async() ignores soft-cancel requests for jobs
+         * job_cancel_async_locked() ignores soft-cancel requests for jobs
          * that are already done (i.e. deferred to the main loop).  We
          * have to check again whether the job is really cancelled.
          * (job_cancel_requested() and job_is_cancelled() are equivalent
-         * here, because job_cancel_async() will make soft-cancel
+         * here, because job_cancel_async_locked() will make soft-cancel
          * requests no-ops when deferred_to_main_loop is true.  We
          * choose to call job_is_cancelled() to show that we invoke
-         * job_completed_txn_abort() only for force-cancelled jobs.)
+         * job_completed_txn_abort_locked() only for force-cancelled jobs.)
          */
         if (job_is_cancelled_locked(job)) {
-            job_completed_txn_abort(job);
+            job_completed_txn_abort_locked(job);
         }
     } else {
         job_enter_cond_locked(job, NULL);
@@ -1185,16 +1220,20 @@ void job_user_cancel_locked(Job *job, bool force, Error **errp)
  * A wrapper around job_cancel_locked() taking an Error ** parameter so
  * it may be used with job_finish_sync_locked() without the
  * need for (rather nasty) function pointer casts there.
+ *
+ * Called with job_mutex held.
  */
-static void job_cancel_err(Job *job, Error **errp)
+static void job_cancel_err_locked(Job *job, Error **errp)
 {
     job_cancel_locked(job, false);
 }
 
 /**
- * Same as job_cancel_err(), but force-cancel.
+ * Same as job_cancel_err_locked(), but force-cancel.
+ *
+ * Called with job_mutex held.
  */
-static void job_force_cancel_err(Job *job, Error **errp)
+static void job_force_cancel_err_locked(Job *job, Error **errp)
 {
     job_cancel_locked(job, true);
 }
@@ -1202,9 +1241,9 @@ static void job_force_cancel_err(Job *job, Error **errp)
 int job_cancel_sync_locked(Job *job, bool force)
 {
     if (force) {
-        return job_finish_sync_locked(job, &job_force_cancel_err, NULL);
+        return job_finish_sync_locked(job, &job_force_cancel_err_locked, NULL);
     } else {
-        return job_finish_sync_locked(job, &job_cancel_err, NULL);
+        return job_finish_sync_locked(job, &job_cancel_err_locked, NULL);
     }
 }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 12/16] jobs: use job locks and helpers also in the unit tests
  2022-01-05 14:01 [PATCH v3 00/16] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (10 preceding siblings ...)
  2022-01-05 14:02 ` [PATCH v3 11/16] jobs: document all static functions and add _locked() suffix Emanuele Giuseppe Esposito
@ 2022-01-05 14:02 ` Emanuele Giuseppe Esposito
  2022-01-05 14:02 ` [PATCH v3 13/16] jobs: add job lock in find_* functions Emanuele Giuseppe Esposito
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-05 14:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

Add missing job synchronization in the unit tests, with
both explicit locks and helpers.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 tests/unit/test-bdrv-drain.c     | 40 +++++++++++-----------
 tests/unit/test-block-iothread.c |  4 +++
 tests/unit/test-blockjob-txn.c   | 10 ++++++
 tests/unit/test-blockjob.c       | 57 +++++++++++++++++++++-----------
 4 files changed, 72 insertions(+), 39 deletions(-)

diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 3f344a0d0d..c03560e63d 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -941,61 +941,63 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
         }
     }
 
-    g_assert_cmpint(job->job.pause_count, ==, 0);
-    g_assert_false(job->job.paused);
+    g_assert_cmpint(job_get_pause_count(&job->job), ==, 0);
+    g_assert_false(job_get_paused(&job->job));
     g_assert_true(tjob->running);
-    g_assert_true(job->job.busy); /* We're in qemu_co_sleep_ns() */
+    g_assert_true(job_get_busy(&job->job)); /* We're in qemu_co_sleep_ns() */
 
     do_drain_begin_unlocked(drain_type, drain_bs);
 
     if (drain_type == BDRV_DRAIN_ALL) {
         /* bdrv_drain_all() drains both src and target */
-        g_assert_cmpint(job->job.pause_count, ==, 2);
+        g_assert_cmpint(job_get_pause_count(&job->job), ==, 2);
     } else {
-        g_assert_cmpint(job->job.pause_count, ==, 1);
+        g_assert_cmpint(job_get_pause_count(&job->job), ==, 1);
     }
-    g_assert_true(job->job.paused);
-    g_assert_false(job->job.busy); /* The job is paused */
+    g_assert_true(job_get_paused(&job->job));
+    g_assert_false(job_get_busy(&job->job)); /* The job is paused */
 
     do_drain_end_unlocked(drain_type, drain_bs);
 
     if (use_iothread) {
         /* paused is reset in the I/O thread, wait for it */
-        while (job->job.paused) {
+        while (job_get_paused(&job->job)) {
             aio_poll(qemu_get_aio_context(), false);
         }
     }
 
-    g_assert_cmpint(job->job.pause_count, ==, 0);
-    g_assert_false(job->job.paused);
-    g_assert_true(job->job.busy); /* We're in qemu_co_sleep_ns() */
+    g_assert_cmpint(job_get_pause_count(&job->job), ==, 0);
+    g_assert_false(job_get_paused(&job->job));
+    g_assert_true(job_get_busy(&job->job)); /* We're in qemu_co_sleep_ns() */
 
     do_drain_begin_unlocked(drain_type, target);
 
     if (drain_type == BDRV_DRAIN_ALL) {
         /* bdrv_drain_all() drains both src and target */
-        g_assert_cmpint(job->job.pause_count, ==, 2);
+        g_assert_cmpint(job_get_pause_count(&job->job), ==, 2);
     } else {
-        g_assert_cmpint(job->job.pause_count, ==, 1);
+        g_assert_cmpint(job_get_pause_count(&job->job), ==, 1);
     }
-    g_assert_true(job->job.paused);
-    g_assert_false(job->job.busy); /* The job is paused */
+    g_assert_true(job_get_paused(&job->job));
+    g_assert_false(job_get_busy(&job->job)); /* The job is paused */
 
     do_drain_end_unlocked(drain_type, target);
 
     if (use_iothread) {
         /* paused is reset in the I/O thread, wait for it */
-        while (job->job.paused) {
+        while (job_get_paused(&job->job)) {
             aio_poll(qemu_get_aio_context(), false);
         }
     }
 
-    g_assert_cmpint(job->job.pause_count, ==, 0);
-    g_assert_false(job->job.paused);
-    g_assert_true(job->job.busy); /* We're in qemu_co_sleep_ns() */
+    g_assert_cmpint(job_get_pause_count(&job->job), ==, 0);
+    g_assert_false(job_get_paused(&job->job));
+    g_assert_true(job_get_busy(&job->job)); /* We're in qemu_co_sleep_ns() */
 
     aio_context_acquire(ctx);
+    job_lock();
     ret = job_complete_sync_locked(&job->job, &error_abort);
+    job_unlock();
     g_assert_cmpint(ret, ==, (result == TEST_JOB_SUCCESS ? 0 : -EIO));
 
     if (use_iothread) {
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index 7e1b521d61..b9309beec2 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -456,7 +456,9 @@ static void test_attach_blockjob(void)
     }
 
     aio_context_acquire(ctx);
+    job_lock();
     job_complete_sync_locked(&tjob->common.job, &error_abort);
+    job_unlock();
     blk_set_aio_context(blk, qemu_get_aio_context(), &error_abort);
     aio_context_release(ctx);
 
@@ -630,7 +632,9 @@ static void test_propagate_mirror(void)
                  BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
                  false, "filter_node", MIRROR_COPY_MODE_BACKGROUND,
                  &error_abort);
+    job_lock();
     job = job_get_locked("job0");
+    job_unlock();
     filter = bdrv_find_node("filter_node");
 
     /* Change the AioContext of src */
diff --git a/tests/unit/test-blockjob-txn.c b/tests/unit/test-blockjob-txn.c
index 5396fcef10..bd69076300 100644
--- a/tests/unit/test-blockjob-txn.c
+++ b/tests/unit/test-blockjob-txn.c
@@ -124,16 +124,20 @@ static void test_single_job(int expected)
     job = test_block_job_start(1, true, expected, &result, txn);
     job_start(&job->job);
 
+    job_lock();
     if (expected == -ECANCELED) {
         job_cancel_locked(&job->job, false);
     }
+    job_unlock();
 
     while (result == -EINPROGRESS) {
         aio_poll(qemu_get_aio_context(), true);
     }
     g_assert_cmpint(result, ==, expected);
 
+    job_lock();
     job_txn_unref_locked(txn);
+    job_unlock();
 }
 
 static void test_single_job_success(void)
@@ -168,6 +172,7 @@ static void test_pair_jobs(int expected1, int expected2)
     /* Release our reference now to trigger as many nice
      * use-after-free bugs as possible.
      */
+    job_lock();
     job_txn_unref_locked(txn);
 
     if (expected1 == -ECANCELED) {
@@ -176,6 +181,7 @@ static void test_pair_jobs(int expected1, int expected2)
     if (expected2 == -ECANCELED) {
         job_cancel_locked(&job2->job, false);
     }
+    job_unlock();
 
     while (result1 == -EINPROGRESS || result2 == -EINPROGRESS) {
         aio_poll(qemu_get_aio_context(), true);
@@ -227,7 +233,9 @@ static void test_pair_jobs_fail_cancel_race(void)
     job_start(&job1->job);
     job_start(&job2->job);
 
+    job_lock();
     job_cancel_locked(&job1->job, false);
+    job_unlock();
 
     /* Now make job2 finish before the main loop kicks jobs.  This simulates
      * the race between a pending kick and another job completing.
@@ -242,7 +250,9 @@ static void test_pair_jobs_fail_cancel_race(void)
     g_assert_cmpint(result1, ==, -ECANCELED);
     g_assert_cmpint(result2, ==, -ECANCELED);
 
+    job_lock();
     job_txn_unref_locked(txn);
+    job_unlock();
 }
 
 int main(int argc, char **argv)
diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c
index 2beed3623e..ec9128dbb5 100644
--- a/tests/unit/test-blockjob.c
+++ b/tests/unit/test-blockjob.c
@@ -211,8 +211,11 @@ static CancelJob *create_common(Job **pjob)
     bjob = mk_job(blk, "Steve", &test_cancel_driver, true,
                   JOB_MANUAL_FINALIZE | JOB_MANUAL_DISMISS);
     job = &bjob->job;
+    job_lock();
     job_ref_locked(job);
     assert(job->status == JOB_STATUS_CREATED);
+    job_unlock();
+
     s = container_of(bjob, CancelJob, common);
     s->blk = blk;
 
@@ -230,6 +233,7 @@ static void cancel_common(CancelJob *s)
     ctx = job->job.aio_context;
     aio_context_acquire(ctx);
 
+    job_lock();
     job_cancel_sync_locked(&job->job, true);
     if (sts != JOB_STATUS_CREATED && sts != JOB_STATUS_CONCLUDED) {
         Job *dummy = &job->job;
@@ -237,6 +241,7 @@ static void cancel_common(CancelJob *s)
     }
     assert(job->job.status == JOB_STATUS_NULL);
     job_unref_locked(&job->job);
+    job_unlock();
     destroy_blk(blk);
 
     aio_context_release(ctx);
@@ -259,7 +264,7 @@ static void test_cancel_running(void)
     s = create_common(&job);
 
     job_start(job);
-    assert(job->status == JOB_STATUS_RUNNING);
+    assert(job_get_status(job) == JOB_STATUS_RUNNING);
 
     cancel_common(s);
 }
@@ -272,11 +277,13 @@ static void test_cancel_paused(void)
     s = create_common(&job);
 
     job_start(job);
-    assert(job->status == JOB_STATUS_RUNNING);
+    assert(job_get_status(job) == JOB_STATUS_RUNNING);
 
+    job_lock();
     job_user_pause_locked(job, &error_abort);
+    job_unlock();
     job_enter(job);
-    assert(job->status == JOB_STATUS_PAUSED);
+    assert(job_get_status(job) == JOB_STATUS_PAUSED);
 
     cancel_common(s);
 }
@@ -289,11 +296,11 @@ static void test_cancel_ready(void)
     s = create_common(&job);
 
     job_start(job);
-    assert(job->status == JOB_STATUS_RUNNING);
+    assert(job_get_status(job) == JOB_STATUS_RUNNING);
 
     s->should_converge = true;
     job_enter(job);
-    assert(job->status == JOB_STATUS_READY);
+    assert(job_get_status(job) == JOB_STATUS_READY);
 
     cancel_common(s);
 }
@@ -306,15 +313,17 @@ static void test_cancel_standby(void)
     s = create_common(&job);
 
     job_start(job);
-    assert(job->status == JOB_STATUS_RUNNING);
+    assert(job_get_status(job) == JOB_STATUS_RUNNING);
 
     s->should_converge = true;
     job_enter(job);
-    assert(job->status == JOB_STATUS_READY);
+    assert(job_get_status(job) == JOB_STATUS_READY);
 
+    job_lock();
     job_user_pause_locked(job, &error_abort);
+    job_unlock();
     job_enter(job);
-    assert(job->status == JOB_STATUS_STANDBY);
+    assert(job_get_status(job) == JOB_STATUS_STANDBY);
 
     cancel_common(s);
 }
@@ -327,20 +336,22 @@ static void test_cancel_pending(void)
     s = create_common(&job);
 
     job_start(job);
-    assert(job->status == JOB_STATUS_RUNNING);
+    assert(job_get_status(job) == JOB_STATUS_RUNNING);
 
     s->should_converge = true;
     job_enter(job);
-    assert(job->status == JOB_STATUS_READY);
+    assert(job_get_status(job) == JOB_STATUS_READY);
 
+    job_lock();
     job_complete_locked(job, &error_abort);
+    job_unlock();
     job_enter(job);
     while (!job->deferred_to_main_loop) {
         aio_poll(qemu_get_aio_context(), true);
     }
-    assert(job->status == JOB_STATUS_READY);
+    assert(job_get_status(job) == JOB_STATUS_READY);
     aio_poll(qemu_get_aio_context(), true);
-    assert(job->status == JOB_STATUS_PENDING);
+    assert(job_get_status(job) == JOB_STATUS_PENDING);
 
     cancel_common(s);
 }
@@ -353,25 +364,29 @@ static void test_cancel_concluded(void)
     s = create_common(&job);
 
     job_start(job);
-    assert(job->status == JOB_STATUS_RUNNING);
+    assert(job_get_status(job) == JOB_STATUS_RUNNING);
 
     s->should_converge = true;
     job_enter(job);
-    assert(job->status == JOB_STATUS_READY);
+    assert(job_get_status(job) == JOB_STATUS_READY);
 
+    job_lock();
     job_complete_locked(job, &error_abort);
+    job_unlock();
     job_enter(job);
     while (!job->deferred_to_main_loop) {
         aio_poll(qemu_get_aio_context(), true);
     }
-    assert(job->status == JOB_STATUS_READY);
+    assert(job_get_status(job) == JOB_STATUS_READY);
     aio_poll(qemu_get_aio_context(), true);
-    assert(job->status == JOB_STATUS_PENDING);
+    assert(job_get_status(job) == JOB_STATUS_PENDING);
 
     aio_context_acquire(job->aio_context);
+    job_lock();
     job_finalize_locked(job, &error_abort);
+    job_unlock();
     aio_context_release(job->aio_context);
-    assert(job->status == JOB_STATUS_CONCLUDED);
+    assert(job_get_status(job) == JOB_STATUS_CONCLUDED);
 
     cancel_common(s);
 }
@@ -459,22 +474,23 @@ static void test_complete_in_standby(void)
     bjob = mk_job(blk, "job", &test_yielding_driver, true,
                   JOB_MANUAL_FINALIZE | JOB_MANUAL_DISMISS);
     job = &bjob->job;
-    assert(job->status == JOB_STATUS_CREATED);
+    assert(job_get_status(job) == JOB_STATUS_CREATED);
 
     /* Wait for the job to become READY */
     job_start(job);
     aio_context_acquire(ctx);
-    AIO_WAIT_WHILE(ctx, job->status != JOB_STATUS_READY);
+    AIO_WAIT_WHILE(ctx, job_get_status(job) != JOB_STATUS_READY);
     aio_context_release(ctx);
 
     /* Begin the drained section, pausing the job */
     bdrv_drain_all_begin();
-    assert(job->status == JOB_STATUS_STANDBY);
+    assert(job_get_status(job) == JOB_STATUS_STANDBY);
     /* Lock the IO thread to prevent the job from being run */
     aio_context_acquire(ctx);
     /* This will schedule the job to resume it */
     bdrv_drain_all_end();
 
+    job_lock();
     /* But the job cannot run, so it will remain on standby */
     assert(job->status == JOB_STATUS_STANDBY);
 
@@ -489,6 +505,7 @@ static void test_complete_in_standby(void)
     assert(job->status == JOB_STATUS_CONCLUDED);
 
     job_dismiss_locked(&job, &error_abort);
+    job_unlock();
 
     destroy_blk(blk);
     aio_context_release(ctx);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 13/16] jobs: add job lock in find_* functions
  2022-01-05 14:01 [PATCH v3 00/16] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (11 preceding siblings ...)
  2022-01-05 14:02 ` [PATCH v3 12/16] jobs: use job locks and helpers also in the unit tests Emanuele Giuseppe Esposito
@ 2022-01-05 14:02 ` Emanuele Giuseppe Esposito
  2022-01-05 14:02 ` [PATCH v3 14/16] job.c: use job_get_aio_context() Emanuele Giuseppe Esposito
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-05 14:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

Both blockdev.c and job-qmp.c have TOC/TOU conditions, because
they first search for the job and then perform an action on it.
Therefore, we need to do the search + action under the same
job mutex critical section.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 blockdev.c | 14 +++++++++++++-
 job-qmp.c  | 13 ++++++++++++-
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 099d57e0d2..1fbd9b9e04 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3305,7 +3305,10 @@ out:
     aio_context_release(aio_context);
 }
 
-/* Get a block job using its ID and acquire its AioContext */
+/*
+ * Get a block job using its ID and acquire its AioContext.
+ * Returns with job_lock held on success.
+ */
 static BlockJob *find_block_job(const char *id, AioContext **aio_context,
                                 Error **errp)
 {
@@ -3314,12 +3317,14 @@ static BlockJob *find_block_job(const char *id, AioContext **aio_context,
     assert(id != NULL);
 
     *aio_context = NULL;
+    job_lock();
 
     job = block_job_get(id);
 
     if (!job) {
         error_set(errp, ERROR_CLASS_DEVICE_NOT_ACTIVE,
                   "Block job '%s' not found", id);
+        job_unlock();
         return NULL;
     }
 
@@ -3340,6 +3345,7 @@ void qmp_block_job_set_speed(const char *device, int64_t speed, Error **errp)
 
     block_job_set_speed(job, speed, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_block_job_cancel(const char *device,
@@ -3366,6 +3372,7 @@ void qmp_block_job_cancel(const char *device,
     job_user_cancel_locked(&job->job, force, errp);
 out:
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_block_job_pause(const char *device, Error **errp)
@@ -3380,6 +3387,7 @@ void qmp_block_job_pause(const char *device, Error **errp)
     trace_qmp_block_job_pause(job);
     job_user_pause_locked(&job->job, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_block_job_resume(const char *device, Error **errp)
@@ -3394,6 +3402,7 @@ void qmp_block_job_resume(const char *device, Error **errp)
     trace_qmp_block_job_resume(job);
     job_user_resume_locked(&job->job, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_block_job_complete(const char *device, Error **errp)
@@ -3408,6 +3417,7 @@ void qmp_block_job_complete(const char *device, Error **errp)
     trace_qmp_block_job_complete(job);
     job_complete_locked(&job->job, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_block_job_finalize(const char *id, Error **errp)
@@ -3431,6 +3441,7 @@ void qmp_block_job_finalize(const char *id, Error **errp)
     aio_context = blk_get_aio_context(job->blk);
     job_unref_locked(&job->job);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_block_job_dismiss(const char *id, Error **errp)
@@ -3447,6 +3458,7 @@ void qmp_block_job_dismiss(const char *id, Error **errp)
     job = &bjob->job;
     job_dismiss_locked(&job, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_change_backing_file(const char *device,
diff --git a/job-qmp.c b/job-qmp.c
index 9fa14bf761..615e056fc4 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -29,16 +29,21 @@
 #include "qapi/error.h"
 #include "trace/trace-root.h"
 
-/* Get a job using its ID and acquire its AioContext */
+/*
+ * Get a block job using its ID and acquire its AioContext.
+ * Returns with job_lock held on success.
+ */
 static Job *find_job(const char *id, AioContext **aio_context, Error **errp)
 {
     Job *job;
 
     *aio_context = NULL;
+    job_lock();
 
     job = job_get_locked(id);
     if (!job) {
         error_setg(errp, "Job not found");
+        job_unlock();
         return NULL;
     }
 
@@ -60,6 +65,7 @@ void qmp_job_cancel(const char *id, Error **errp)
     trace_qmp_job_cancel(job);
     job_user_cancel_locked(job, true, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_job_pause(const char *id, Error **errp)
@@ -74,6 +80,7 @@ void qmp_job_pause(const char *id, Error **errp)
     trace_qmp_job_pause(job);
     job_user_pause_locked(job, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_job_resume(const char *id, Error **errp)
@@ -88,6 +95,7 @@ void qmp_job_resume(const char *id, Error **errp)
     trace_qmp_job_resume(job);
     job_user_resume_locked(job, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_job_complete(const char *id, Error **errp)
@@ -102,6 +110,7 @@ void qmp_job_complete(const char *id, Error **errp)
     trace_qmp_job_complete(job);
     job_complete_locked(job, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_job_finalize(const char *id, Error **errp)
@@ -125,6 +134,7 @@ void qmp_job_finalize(const char *id, Error **errp)
     aio_context = job->aio_context;
     job_unref_locked(job);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_job_dismiss(const char *id, Error **errp)
@@ -139,6 +149,7 @@ void qmp_job_dismiss(const char *id, Error **errp)
     trace_qmp_job_dismiss(job);
     job_dismiss_locked(&job, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 static JobInfo *job_query_single(Job *job, Error **errp)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 14/16] job.c: use job_get_aio_context()
  2022-01-05 14:01 [PATCH v3 00/16] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (12 preceding siblings ...)
  2022-01-05 14:02 ` [PATCH v3 13/16] jobs: add job lock in find_* functions Emanuele Giuseppe Esposito
@ 2022-01-05 14:02 ` Emanuele Giuseppe Esposito
  2022-01-19 10:31   ` Paolo Bonzini
  2022-01-05 14:02 ` [PATCH v3 15/16] job.c: enable job lock/unlock and remove Aiocontext locks Emanuele Giuseppe Esposito
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-05 14:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

If the job->aio_context is accessed under job_mutex,
leave as it is. Otherwise use job_get_aio_context().

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block/commit.c                   |  4 ++--
 block/mirror.c                   |  2 +-
 block/replication.c              |  2 +-
 blockjob.c                       | 18 +++++++++++-------
 job.c                            |  8 ++++----
 tests/unit/test-block-iothread.c |  6 +++---
 6 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index f639eb49c5..961b57edf0 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -369,7 +369,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
         goto fail;
     }
 
-    s->base = blk_new(s->common.job.aio_context,
+    s->base = blk_new(job_get_aio_context(&s->common.job),
                       base_perms,
                       BLK_PERM_CONSISTENT_READ
                       | BLK_PERM_GRAPH_MOD
@@ -382,7 +382,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
     s->base_bs = base;
 
     /* Required permissions are already taken with block_job_add_bdrv() */
-    s->top = blk_new(s->common.job.aio_context, 0, BLK_PERM_ALL);
+    s->top = blk_new(job_get_aio_context(&s->common.job), 0, BLK_PERM_ALL);
     ret = blk_insert_bs(s->top, top, errp);
     if (ret < 0) {
         goto fail;
diff --git a/block/mirror.c b/block/mirror.c
index 41450df55c..72b4367b4e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1743,7 +1743,7 @@ static BlockJob *mirror_start_job(
         target_perms |= BLK_PERM_GRAPH_MOD;
     }
 
-    s->target = blk_new(s->common.job.aio_context,
+    s->target = blk_new(job_get_aio_context(&s->common.job),
                         target_perms, target_shared_perms);
     ret = blk_insert_bs(s->target, target, errp);
     if (ret < 0) {
diff --git a/block/replication.c b/block/replication.c
index 50ea778937..68018948b9 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -148,8 +148,8 @@ static void replication_close(BlockDriverState *bs)
     }
     if (s->stage == BLOCK_REPLICATION_FAILOVER) {
         commit_job = &s->commit_job->job;
-        assert(commit_job->aio_context == qemu_get_current_aio_context());
         WITH_JOB_LOCK_GUARD() {
+            assert(commit_job->aio_context == qemu_get_current_aio_context());
             job_cancel_sync_locked(commit_job, false);
         }
     }
diff --git a/blockjob.c b/blockjob.c
index cf1f49f6c2..468ba735c5 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -155,14 +155,16 @@ static void child_job_set_aio_ctx(BdrvChild *c, AioContext *ctx,
         bdrv_set_aio_context_ignore(sibling->bs, ctx, ignore);
     }
 
-    job->job.aio_context = ctx;
+    WITH_JOB_LOCK_GUARD() {
+        job->job.aio_context = ctx;
+    }
 }
 
 static AioContext *child_job_get_parent_aio_context(BdrvChild *c)
 {
     BlockJob *job = c->opaque;
 
-    return job->job.aio_context;
+    return job_get_aio_context(&job->job);
 }
 
 static const BdrvChildClass child_job = {
@@ -218,19 +220,21 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
 {
     BdrvChild *c;
     bool need_context_ops;
+    AioContext *job_aiocontext;
     assert(qemu_in_main_thread());
 
     bdrv_ref(bs);
 
-    need_context_ops = bdrv_get_aio_context(bs) != job->job.aio_context;
+    job_aiocontext = job_get_aio_context(&job->job);
+    need_context_ops = bdrv_get_aio_context(bs) != job_aiocontext;
 
-    if (need_context_ops && job->job.aio_context != qemu_get_aio_context()) {
-        aio_context_release(job->job.aio_context);
+    if (need_context_ops && job_aiocontext != qemu_get_aio_context()) {
+        aio_context_release(job_aiocontext);
     }
     c = bdrv_root_attach_child(bs, name, &child_job, 0, perm, shared_perm, job,
                                errp);
-    if (need_context_ops && job->job.aio_context != qemu_get_aio_context()) {
-        aio_context_acquire(job->job.aio_context);
+    if (need_context_ops && job_aiocontext != qemu_get_aio_context()) {
+        aio_context_acquire(job_aiocontext);
     }
     if (c == NULL) {
         return -EPERM;
diff --git a/job.c b/job.c
index f16a4ef542..8a5b710d9b 100644
--- a/job.c
+++ b/job.c
@@ -566,7 +566,7 @@ void job_enter_cond_locked(Job *job, bool(*fn)(Job *job))
     job->busy = true;
     real_job_unlock();
     job_unlock();
-    aio_co_enter(job->aio_context, job->co);
+    aio_co_enter(job_get_aio_context(job), job->co);
     job_lock();
 }
 
@@ -1138,7 +1138,6 @@ static void coroutine_fn job_co_entry(void *opaque)
     Job *job = opaque;
     int ret;
 
-    assert(job->aio_context == qemu_get_current_aio_context());
     assert(job && job->driver && job->driver->run);
     job_pause_point(job);
     ret = job->driver->run(job, &job->err);
@@ -1177,7 +1176,7 @@ void job_start(Job *job)
         job->paused = false;
         job_state_transition_locked(job, JOB_STATUS_RUNNING);
     }
-    aio_co_enter(job->aio_context, job->co);
+    aio_co_enter(job_get_aio_context(job), job->co);
 }
 
 void job_cancel_locked(Job *job, bool force)
@@ -1303,7 +1302,8 @@ int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
     }
 
     job_unlock();
-    AIO_WAIT_WHILE(job->aio_context, (job_enter(job), !job_is_completed(job)));
+    AIO_WAIT_WHILE(job_get_aio_context(job),
+                   (job_enter(job), !job_is_completed(job)));
     job_lock();
 
     ret = (job_is_cancelled_locked(job) && job->ret == 0) ?
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index b9309beec2..addcb5846b 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -379,7 +379,7 @@ static int coroutine_fn test_job_run(Job *job, Error **errp)
     job_transition_to_ready(&s->common.job);
     while (!s->should_complete) {
         s->n++;
-        g_assert(qemu_get_current_aio_context() == job->aio_context);
+        g_assert(qemu_get_current_aio_context() == job_get_aio_context(job));
 
         /* Avoid job_sleep_ns() because it marks the job as !busy. We want to
          * emulate some actual activity (probably some I/O) here so that the
@@ -390,7 +390,7 @@ static int coroutine_fn test_job_run(Job *job, Error **errp)
         job_pause_point(&s->common.job);
     }
 
-    g_assert(qemu_get_current_aio_context() == job->aio_context);
+    g_assert(qemu_get_current_aio_context() == job_get_aio_context(job));
     return 0;
 }
 
@@ -642,7 +642,7 @@ static void test_propagate_mirror(void)
     g_assert(bdrv_get_aio_context(src) == ctx);
     g_assert(bdrv_get_aio_context(target) == ctx);
     g_assert(bdrv_get_aio_context(filter) == ctx);
-    g_assert(job->aio_context == ctx);
+    g_assert(job_get_aio_context(job) == ctx);
 
     /* Change the AioContext of target */
     aio_context_acquire(ctx);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 15/16] job.c: enable job lock/unlock and remove Aiocontext locks
  2022-01-05 14:01 [PATCH v3 00/16] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (13 preceding siblings ...)
  2022-01-05 14:02 ` [PATCH v3 14/16] job.c: use job_get_aio_context() Emanuele Giuseppe Esposito
@ 2022-01-05 14:02 ` Emanuele Giuseppe Esposito
  2022-01-19 10:35   ` Paolo Bonzini
  2022-01-05 14:02 ` [PATCH v3 16/16] block_job_query: remove atomic read Emanuele Giuseppe Esposito
  2022-01-19 11:15 ` [PATCH v3 00/16] job: replace AioContext lock with job_mutex Paolo Bonzini
  16 siblings, 1 reply; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-05 14:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

Change the job_{lock/unlock} and macros to use job_mutex.

Now that they are not nop anymore, remove the aiocontext
to avoid deadlocks.

Therefore:
- when possible, remove completely the aiocontext lock/unlock pair
- if it is used by some other functions too, reduce the locking
section as much as possible, leaving the job API outside.

There is only one JobDriver callback, ->free() that assumes that
the aiocontext lock is held (because it calls bdrv_unref), so for
now keep that under aiocontext lock.

Also remove real_job_{lock/unlock}, as they are replaced by the
public functions.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 blockdev.c                       | 65 ++++-----------------------
 include/qemu/job.h               | 11 +----
 job-qmp.c                        | 41 ++++-------------
 job.c                            | 76 +++-----------------------------
 tests/unit/test-bdrv-drain.c     |  4 +-
 tests/unit/test-block-iothread.c |  2 +-
 tests/unit/test-blockjob.c       | 13 ++----
 7 files changed, 31 insertions(+), 181 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 1fbd9b9e04..ebc14daa86 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -159,12 +159,7 @@ void blockdev_mark_auto_del(BlockBackend *blk)
 
     for (job = block_job_next(NULL); job; job = block_job_next(job)) {
         if (block_job_has_bdrv(job, blk_bs(blk))) {
-            AioContext *aio_context = job->job.aio_context;
-            aio_context_acquire(aio_context);
-
             job_cancel_locked(&job->job, false);
-
-            aio_context_release(aio_context);
         }
     }
 
@@ -1829,16 +1824,9 @@ static void drive_backup_abort(BlkActionState *common)
     DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
 
     if (state->job) {
-        AioContext *aio_context;
-
-        aio_context = bdrv_get_aio_context(state->bs);
-        aio_context_acquire(aio_context);
-
         WITH_JOB_LOCK_GUARD() {
             job_cancel_sync_locked(&state->job->job, true);
         }
-
-        aio_context_release(aio_context);
     }
 }
 
@@ -1932,16 +1920,9 @@ static void blockdev_backup_abort(BlkActionState *common)
     BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, common);
 
     if (state->job) {
-        AioContext *aio_context;
-
-        aio_context = bdrv_get_aio_context(state->bs);
-        aio_context_acquire(aio_context);
-
         WITH_JOB_LOCK_GUARD() {
             job_cancel_sync_locked(&state->job->job, true);
         }
-
-        aio_context_release(aio_context);
     }
 }
 
@@ -3305,18 +3286,13 @@ out:
     aio_context_release(aio_context);
 }
 
-/*
- * Get a block job using its ID and acquire its AioContext.
- * Returns with job_lock held on success.
- */
-static BlockJob *find_block_job(const char *id, AioContext **aio_context,
-                                Error **errp)
+/* Get a block job using its ID. Returns with job_lock held on success */
+static BlockJob *find_block_job(const char *id, Error **errp)
 {
     BlockJob *job;
 
     assert(id != NULL);
 
-    *aio_context = NULL;
     job_lock();
 
     job = block_job_get(id);
@@ -3328,31 +3304,25 @@ static BlockJob *find_block_job(const char *id, AioContext **aio_context,
         return NULL;
     }
 
-    *aio_context = blk_get_aio_context(job->blk);
-    aio_context_acquire(*aio_context);
-
     return job;
 }
 
 void qmp_block_job_set_speed(const char *device, int64_t speed, Error **errp)
 {
-    AioContext *aio_context;
-    BlockJob *job = find_block_job(device, &aio_context, errp);
+    BlockJob *job = find_block_job(device, errp);
 
     if (!job) {
         return;
     }
 
     block_job_set_speed(job, speed, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_block_job_cancel(const char *device,
                           bool has_force, bool force, Error **errp)
 {
-    AioContext *aio_context;
-    BlockJob *job = find_block_job(device, &aio_context, errp);
+    BlockJob *job = find_block_job(device, errp);
 
     if (!job) {
         return;
@@ -3371,14 +3341,12 @@ void qmp_block_job_cancel(const char *device,
     trace_qmp_block_job_cancel(job);
     job_user_cancel_locked(&job->job, force, errp);
 out:
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_block_job_pause(const char *device, Error **errp)
 {
-    AioContext *aio_context;
-    BlockJob *job = find_block_job(device, &aio_context, errp);
+    BlockJob *job = find_block_job(device, errp);
 
     if (!job) {
         return;
@@ -3386,14 +3354,12 @@ void qmp_block_job_pause(const char *device, Error **errp)
 
     trace_qmp_block_job_pause(job);
     job_user_pause_locked(&job->job, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_block_job_resume(const char *device, Error **errp)
 {
-    AioContext *aio_context;
-    BlockJob *job = find_block_job(device, &aio_context, errp);
+    BlockJob *job = find_block_job(device, errp);
 
     if (!job) {
         return;
@@ -3401,14 +3367,12 @@ void qmp_block_job_resume(const char *device, Error **errp)
 
     trace_qmp_block_job_resume(job);
     job_user_resume_locked(&job->job, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_block_job_complete(const char *device, Error **errp)
 {
-    AioContext *aio_context;
-    BlockJob *job = find_block_job(device, &aio_context, errp);
+    BlockJob *job = find_block_job(device, errp);
 
     if (!job) {
         return;
@@ -3416,14 +3380,12 @@ void qmp_block_job_complete(const char *device, Error **errp)
 
     trace_qmp_block_job_complete(job);
     job_complete_locked(&job->job, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_block_job_finalize(const char *id, Error **errp)
 {
-    AioContext *aio_context;
-    BlockJob *job = find_block_job(id, &aio_context, errp);
+    BlockJob *job = find_block_job(id, errp);
 
     if (!job) {
         return;
@@ -3433,21 +3395,13 @@ void qmp_block_job_finalize(const char *id, Error **errp)
     job_ref_locked(&job->job);
     job_finalize_locked(&job->job, errp);
 
-    /*
-     * Job's context might have changed via job_finalize_locked
-     * (and job_txn_apply automatically acquires the new one),
-     * so make sure we release the correct one.
-     */
-    aio_context = blk_get_aio_context(job->blk);
     job_unref_locked(&job->job);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_block_job_dismiss(const char *id, Error **errp)
 {
-    AioContext *aio_context;
-    BlockJob *bjob = find_block_job(id, &aio_context, errp);
+    BlockJob *bjob = find_block_job(id, errp);
     Job *job;
 
     if (!bjob) {
@@ -3457,7 +3411,6 @@ void qmp_block_job_dismiss(const char *id, Error **errp)
     trace_qmp_block_job_dismiss(bjob);
     job = &bjob->job;
     job_dismiss_locked(&job, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
diff --git a/include/qemu/job.h b/include/qemu/job.h
index c95f9fa8d1..602ee56ae6 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -326,9 +326,9 @@ typedef enum JobCreateFlags {
 
 extern QemuMutex job_mutex;
 
-#define JOB_LOCK_GUARD() /* QEMU_LOCK_GUARD(&job_mutex) */
+#define JOB_LOCK_GUARD() QEMU_LOCK_GUARD(&job_mutex)
 
-#define WITH_JOB_LOCK_GUARD() /* WITH_QEMU_LOCK_GUARD(&job_mutex) */
+#define WITH_JOB_LOCK_GUARD() WITH_QEMU_LOCK_GUARD(&job_mutex)
 
 /**
  * job_lock:
@@ -667,8 +667,6 @@ void job_user_cancel_locked(Job *job, bool force, Error **errp);
  *
  * Returns the return value from the job if the job actually completed
  * during the call, or -ECANCELED if it was canceled.
- *
- * Callers must hold the AioContext lock of job->aio_context.
  */
 int job_cancel_sync_locked(Job *job, bool force);
 
@@ -692,9 +690,6 @@ void job_cancel_sync_all(void);
  * function).
  *
  * Returns the return value from the job.
- *
- * Callers must hold the AioContext lock of job->aio_context.
- *
  * Called between job_lock and job_unlock.
  */
 int job_complete_sync_locked(Job *job, Error **errp);
@@ -726,8 +721,6 @@ void job_dismiss_locked(Job **job, Error **errp);
  * Returns 0 if the job is successfully completed, -ECANCELED if the job was
  * cancelled before completing, and -errno in other error cases.
  *
- * Callers must hold the AioContext lock of job->aio_context.
- *
  * Called between job_lock and job_unlock.
  */
 int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
diff --git a/job-qmp.c b/job-qmp.c
index 615e056fc4..858b3a28f5 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -29,15 +29,11 @@
 #include "qapi/error.h"
 #include "trace/trace-root.h"
 
-/*
- * Get a block job using its ID and acquire its AioContext.
- * Returns with job_lock held on success.
- */
-static Job *find_job(const char *id, AioContext **aio_context, Error **errp)
+/* Get a job using its ID. Returns with job_lock held on success. */
+static Job *find_job(const char *id, Error **errp)
 {
     Job *job;
 
-    *aio_context = NULL;
     job_lock();
 
     job = job_get_locked(id);
@@ -47,16 +43,12 @@ static Job *find_job(const char *id, AioContext **aio_context, Error **errp)
         return NULL;
     }
 
-    *aio_context = job->aio_context;
-    aio_context_acquire(*aio_context);
-
     return job;
 }
 
 void qmp_job_cancel(const char *id, Error **errp)
 {
-    AioContext *aio_context;
-    Job *job = find_job(id, &aio_context, errp);
+    Job *job = find_job(id, errp);
 
     if (!job) {
         return;
@@ -64,14 +56,12 @@ void qmp_job_cancel(const char *id, Error **errp)
 
     trace_qmp_job_cancel(job);
     job_user_cancel_locked(job, true, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_job_pause(const char *id, Error **errp)
 {
-    AioContext *aio_context;
-    Job *job = find_job(id, &aio_context, errp);
+    Job *job = find_job(id, errp);
 
     if (!job) {
         return;
@@ -79,14 +69,12 @@ void qmp_job_pause(const char *id, Error **errp)
 
     trace_qmp_job_pause(job);
     job_user_pause_locked(job, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_job_resume(const char *id, Error **errp)
 {
-    AioContext *aio_context;
-    Job *job = find_job(id, &aio_context, errp);
+    Job *job = find_job(id, errp);
 
     if (!job) {
         return;
@@ -94,14 +82,12 @@ void qmp_job_resume(const char *id, Error **errp)
 
     trace_qmp_job_resume(job);
     job_user_resume_locked(job, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_job_complete(const char *id, Error **errp)
 {
-    AioContext *aio_context;
-    Job *job = find_job(id, &aio_context, errp);
+    Job *job = find_job(id, errp);
 
     if (!job) {
         return;
@@ -109,14 +95,12 @@ void qmp_job_complete(const char *id, Error **errp)
 
     trace_qmp_job_complete(job);
     job_complete_locked(job, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_job_finalize(const char *id, Error **errp)
 {
-    AioContext *aio_context;
-    Job *job = find_job(id, &aio_context, errp);
+    Job *job = find_job(id, errp);
 
     if (!job) {
         return;
@@ -126,21 +110,13 @@ void qmp_job_finalize(const char *id, Error **errp)
     job_ref_locked(job);
     job_finalize_locked(job, errp);
 
-    /*
-     * Job's context might have changed via job_finalize_locked
-     * (and job_txn_apply automatically acquires the new one),
-     * so make sure we release the correct one.
-     */
-    aio_context = job->aio_context;
     job_unref_locked(job);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_job_dismiss(const char *id, Error **errp)
 {
-    AioContext *aio_context;
-    Job *job = find_job(id, &aio_context, errp);
+    Job *job = find_job(id, errp);
 
     if (!job) {
         return;
@@ -148,7 +124,6 @@ void qmp_job_dismiss(const char *id, Error **errp)
 
     trace_qmp_job_dismiss(job);
     job_dismiss_locked(&job, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
diff --git a/job.c b/job.c
index 8a5b710d9b..9fa0f34565 100644
--- a/job.c
+++ b/job.c
@@ -98,21 +98,11 @@ struct JobTxn {
 };
 
 void job_lock(void)
-{
-    /* nop */
-}
-
-void job_unlock(void)
-{
-    /* nop */
-}
-
-static void real_job_lock(void)
 {
     qemu_mutex_lock(&job_mutex);
 }
 
-static void real_job_unlock(void)
+void job_unlock(void)
 {
     qemu_mutex_unlock(&job_mutex);
 }
@@ -180,7 +170,6 @@ static int job_txn_apply_locked(Job *job, int fn(Job *))
      * twice - which would break AIO_WAIT_WHILE from within fn.
      */
     job_ref_locked(job);
-    aio_context_release(job->aio_context);
 
     QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
         rc = fn(other_job);
@@ -189,11 +178,6 @@ static int job_txn_apply_locked(Job *job, int fn(Job *))
         }
     }
 
-    /*
-     * Note that job->aio_context might have been changed by calling fn, so we
-     * can't use a local variable to cache it.
-     */
-    aio_context_acquire(job->aio_context);
     job_unref_locked(job);
     return rc;
 }
@@ -477,7 +461,10 @@ void job_unref_locked(Job *job)
 
         if (job->driver->free) {
             job_unlock();
+            /* FIXME: aiocontext lock is required because cb calls blk_unref */
+            aio_context_acquire(job_get_aio_context(job));
             job->driver->free(job);
+            aio_context_release(job_get_aio_context(job));
             job_lock();
         }
 
@@ -550,21 +537,17 @@ void job_enter_cond_locked(Job *job, bool(*fn)(Job *job))
         return;
     }
 
-    real_job_lock();
     if (job->busy) {
-        real_job_unlock();
         return;
     }
 
     if (fn && !fn(job)) {
-        real_job_unlock();
         return;
     }
 
     assert(!job->deferred_to_main_loop);
     timer_del(&job->sleep_timer);
     job->busy = true;
-    real_job_unlock();
     job_unlock();
     aio_co_enter(job_get_aio_context(job), job->co);
     job_lock();
@@ -587,13 +570,11 @@ void job_enter(Job *job)
  */
 static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns)
 {
-    real_job_lock();
     if (ns != -1) {
         timer_mod(&job->sleep_timer, ns);
     }
     job->busy = false;
     job_event_idle_locked(job);
-    real_job_unlock();
     job_unlock();
     qemu_coroutine_yield();
     job_lock();
@@ -922,7 +903,6 @@ static void job_cancel_async_locked(Job *job, bool force)
 /* Called with job_mutex held. */
 static void job_completed_txn_abort_locked(Job *job)
 {
-    AioContext *ctx;
     JobTxn *txn = job->txn;
     Job *other_job;
 
@@ -935,54 +915,28 @@ static void job_completed_txn_abort_locked(Job *job)
     txn->aborting = true;
     job_txn_ref_locked(txn);
 
-    /*
-     * We can only hold the single job's AioContext lock while calling
-     * job_finalize_single_locked() because the finalization callbacks can
-     *  involve calls of AIO_WAIT_WHILE(), which could deadlock otherwise.
-     * Note that the job's AioContext may change when it is finalized.
-     */
-    job_ref_locked(job);
-    aio_context_release(job->aio_context);
-
     /* Other jobs are effectively cancelled by us, set the status for
      * them; this job, however, may or may not be cancelled, depending
      * on the caller, so leave it. */
     QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
         if (other_job != job) {
-            ctx = other_job->aio_context;
-            aio_context_acquire(ctx);
             /*
              * This is a transaction: If one job failed, no result will matter.
              * Therefore, pass force=true to terminate all other jobs as
              * quickly as possible.
              */
             job_cancel_async_locked(other_job, true);
-            aio_context_release(ctx);
         }
     }
     while (!QLIST_EMPTY(&txn->jobs)) {
         other_job = QLIST_FIRST(&txn->jobs);
-        /*
-         * The job's AioContext may change, so store it in @ctx so we
-         * release the same context that we have acquired before.
-         */
-        ctx = other_job->aio_context;
-        aio_context_acquire(ctx);
         if (!job_is_completed_locked(other_job)) {
             assert(job_cancel_requested_locked(other_job));
             job_finish_sync_locked(other_job, NULL, NULL);
         }
         job_finalize_single_locked(other_job);
-        aio_context_release(ctx);
     }
 
-    /*
-     * Use job_ref_locked()/job_unref_locked() so we can read the AioContext
-     * here even if the job went away during job_finalize_single_locked().
-     */
-    aio_context_acquire(job->aio_context);
-    job_unref_locked(job);
-
     job_txn_unref_locked(txn);
 }
 
@@ -1101,11 +1055,7 @@ static void job_completed_locked(Job *job)
 static void job_exit(void *opaque)
 {
     Job *job = (Job *)opaque;
-    AioContext *ctx;
-
     JOB_LOCK_GUARD();
-    job_ref_locked(job);
-    aio_context_acquire(job->aio_context);
 
     /* This is a lie, we're not quiescent, but still doing the completion
      * callbacks. However, completion callbacks tend to involve operations that
@@ -1115,16 +1065,6 @@ static void job_exit(void *opaque)
     job_event_idle_locked(job);
 
     job_completed_locked(job);
-
-    /*
-     * Note that calling job_completed_locked can move the job to a different
-     * aio_context, so we cannot cache from above. job_txn_apply_locked takes
-     * care of acquiring the new lock, and we ref/unref to avoid
-     * job_completed_locked freeing the job underneath us.
-     */
-    ctx = job->aio_context;
-    job_unref_locked(job);
-    aio_context_release(ctx);
 }
 
 /**
@@ -1249,14 +1189,10 @@ int job_cancel_sync_locked(Job *job, bool force)
 void job_cancel_sync_all(void)
 {
     Job *job;
-    AioContext *aio_context;
 
     JOB_LOCK_GUARD();
     while ((job = job_next_locked(NULL))) {
-        aio_context = job->aio_context;
-        aio_context_acquire(aio_context);
         job_cancel_sync_locked(job, true);
-        aio_context_release(aio_context);
     }
 }
 
@@ -1302,8 +1238,8 @@ int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
     }
 
     job_unlock();
-    AIO_WAIT_WHILE(job_get_aio_context(job),
-                   (job_enter(job), !job_is_completed(job)));
+    AIO_WAIT_WHILE_UNLOCKED(job_get_aio_context(job),
+                            (job_enter(job), !job_is_completed(job)));
     job_lock();
 
     ret = (job_is_cancelled_locked(job) && job->ret == 0) ?
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index c03560e63d..dae207e24e 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -928,9 +928,9 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
         tjob->prepare_ret = -EIO;
         break;
     }
+    aio_context_release(ctx);
 
     job_start(&job->job);
-    aio_context_release(ctx);
 
     if (use_iothread) {
         /* job_co_entry() is run in the I/O thread, wait for the actual job
@@ -994,12 +994,12 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
     g_assert_false(job_get_paused(&job->job));
     g_assert_true(job_get_busy(&job->job)); /* We're in qemu_co_sleep_ns() */
 
-    aio_context_acquire(ctx);
     job_lock();
     ret = job_complete_sync_locked(&job->job, &error_abort);
     job_unlock();
     g_assert_cmpint(ret, ==, (result == TEST_JOB_SUCCESS ? 0 : -EIO));
 
+    aio_context_acquire(ctx);
     if (use_iothread) {
         blk_set_aio_context(blk_src, qemu_get_aio_context(), &error_abort);
         assert(blk_get_aio_context(blk_target) == qemu_get_aio_context());
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index addcb5846b..e09e440342 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -455,10 +455,10 @@ static void test_attach_blockjob(void)
         aio_poll(qemu_get_aio_context(), false);
     }
 
-    aio_context_acquire(ctx);
     job_lock();
     job_complete_sync_locked(&tjob->common.job, &error_abort);
     job_unlock();
+    aio_context_acquire(ctx);
     blk_set_aio_context(blk, qemu_get_aio_context(), &error_abort);
     aio_context_release(ctx);
 
diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c
index ec9128dbb5..c926db7b5d 100644
--- a/tests/unit/test-blockjob.c
+++ b/tests/unit/test-blockjob.c
@@ -228,10 +228,6 @@ static void cancel_common(CancelJob *s)
     BlockJob *job = &s->common;
     BlockBackend *blk = s->blk;
     JobStatus sts = job->job.status;
-    AioContext *ctx;
-
-    ctx = job->job.aio_context;
-    aio_context_acquire(ctx);
 
     job_lock();
     job_cancel_sync_locked(&job->job, true);
@@ -244,7 +240,6 @@ static void cancel_common(CancelJob *s)
     job_unlock();
     destroy_blk(blk);
 
-    aio_context_release(ctx);
 }
 
 static void test_cancel_created(void)
@@ -381,11 +376,9 @@ static void test_cancel_concluded(void)
     aio_poll(qemu_get_aio_context(), true);
     assert(job_get_status(job) == JOB_STATUS_PENDING);
 
-    aio_context_acquire(job->aio_context);
     job_lock();
     job_finalize_locked(job, &error_abort);
     job_unlock();
-    aio_context_release(job->aio_context);
     assert(job_get_status(job) == JOB_STATUS_CONCLUDED);
 
     cancel_common(s);
@@ -478,9 +471,7 @@ static void test_complete_in_standby(void)
 
     /* Wait for the job to become READY */
     job_start(job);
-    aio_context_acquire(ctx);
-    AIO_WAIT_WHILE(ctx, job_get_status(job) != JOB_STATUS_READY);
-    aio_context_release(ctx);
+    AIO_WAIT_WHILE_UNLOCKED(ctx, job_get_status(job) != JOB_STATUS_READY);
 
     /* Begin the drained section, pausing the job */
     bdrv_drain_all_begin();
@@ -498,6 +489,7 @@ static void test_complete_in_standby(void)
     job_complete_locked(job, &error_abort);
 
     /* The test is done now, clean up. */
+    aio_context_release(ctx);
     job_finish_sync_locked(job, NULL, &error_abort);
     assert(job->status == JOB_STATUS_PENDING);
 
@@ -507,6 +499,7 @@ static void test_complete_in_standby(void)
     job_dismiss_locked(&job, &error_abort);
     job_unlock();
 
+    aio_context_acquire(ctx);
     destroy_blk(blk);
     aio_context_release(ctx);
     iothread_join(iothread);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 16/16] block_job_query: remove atomic read
  2022-01-05 14:01 [PATCH v3 00/16] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (14 preceding siblings ...)
  2022-01-05 14:02 ` [PATCH v3 15/16] job.c: enable job lock/unlock and remove Aiocontext locks Emanuele Giuseppe Esposito
@ 2022-01-05 14:02 ` Emanuele Giuseppe Esposito
  2022-01-19 10:34   ` Paolo Bonzini
  2022-01-19 11:15 ` [PATCH v3 00/16] job: replace AioContext lock with job_mutex Paolo Bonzini
  16 siblings, 1 reply; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-05 14:02 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

Not sure what the atomic here was supposed to do, since job.busy
is protected by the job lock. Since the whole function
is called under job_mutex, just remove the atomic.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 blockjob.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 468ba735c5..d1d8808a56 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -335,13 +335,13 @@ BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
     info = g_new0(BlockJobInfo, 1);
     info->type      = g_strdup(job_type_str(&job->job));
     info->device    = g_strdup(job->job.id);
-    info->busy      = qatomic_read(&job->job.busy);
+    info->busy      = job->job.busy;
     info->paused    = job->job.pause_count > 0;
     info->offset    = progress_current;
     info->len       = progress_total;
     info->speed     = job->speed;
     info->io_status = job->iostatus;
-    info->ready     = job_is_ready(&job->job),
+    info->ready     = job_is_ready_locked(&job->job),
     info->status    = job->job.status;
     info->auto_finalize = job->job.auto_finalize;
     info->auto_dismiss  = job->job.auto_dismiss;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 01/16] job.c: make job_mutex and job_lock/unlock() public
  2022-01-05 14:01 ` [PATCH v3 01/16] job.c: make job_mutex and job_lock/unlock() public Emanuele Giuseppe Esposito
@ 2022-01-19  9:56   ` Paolo Bonzini
  2022-01-19 11:13     ` Paolo Bonzini
  0 siblings, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2022-01-19  9:56 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow

On 1/5/22 15:01, Emanuele Giuseppe Esposito wrote:
> job mutex will be used to protect the job struct elements and list,
> replacing AioContext locks.
> 
> Right now use a shared lock for all jobs, in order to keep things
> simple. Once the AioContext lock is gone, we can introduce per-job
> locks.

Not even needed in my opinion, this is not a fast path.  But we'll see.

> To simplify the switch from aiocontext to job lock, introduce
> *nop* lock/unlock functions and macros. Once everything is protected
> by jobs, we can add the mutex and remove the aiocontext.
> 
> Since job_mutex is already being used, add static
> real_job_{lock/unlock}.

Out of curiosity, what breaks if the real job lock is used from the 
start?  (It probably should be mentioned in the commit message).


> -static void job_lock(void)
> +static void real_job_lock(void)
>   {
>       qemu_mutex_lock(&job_mutex);
>   }
>   
> -static void job_unlock(void)
> +static void real_job_unlock(void)
>   {
>       qemu_mutex_unlock(&job_mutex);
>   }

Would it work to

#define job_lock real_job_lock
#define job_unlock real_job_unlock

instead of having to do the changes below?

Paolo

> @@ -449,21 +460,21 @@ void job_enter_cond(Job *job, bool(*fn)(Job *job))
>           return;
>       }
>   
> -    job_lock();
> +    real_job_lock();
>       if (job->busy) {
> -        job_unlock();
> +        real_job_unlock();
>           return;
>       }
>   
>       if (fn && !fn(job)) {
> -        job_unlock();
> +        real_job_unlock();
>           return;
>       }
>   
>       assert(!job->deferred_to_main_loop);
>       timer_del(&job->sleep_timer);
>       job->busy = true;
> -    job_unlock();
> +    real_job_unlock();
>       aio_co_enter(job->aio_context, job->co);
>   }
>   
> @@ -480,13 +491,13 @@ void job_enter(Job *job)
>    * called explicitly. */
>   static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
>   {
> -    job_lock();
> +    real_job_lock();
>       if (ns != -1) {
>           timer_mod(&job->sleep_timer, ns);
>       }
>       job->busy = false;
>       job_event_idle(job);
> -    job_unlock();
> +    real_job_unlock();
>       qemu_coroutine_yield();
>   
>       /* Set by job_enter_cond() before re-entering the coroutine.  */



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 02/16] job.h: categorize fields in struct Job
  2022-01-05 14:01 ` [PATCH v3 02/16] job.h: categorize fields in struct Job Emanuele Giuseppe Esposito
@ 2022-01-19  9:57   ` Paolo Bonzini
  0 siblings, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2022-01-19  9:57 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow

On 1/5/22 15:01, Emanuele Giuseppe Esposito wrote:
> +    /** Protected by job_mutex */

Technically not yet true.  You can add this in patch 15 and at the same 
time remove this one:

>        * Set to false by the job while the coroutine has yielded and may be
>        * re-entered by job_enter(). There may still be I/O or event loop activity
> -     * pending. Accessed under block_job_mutex (in blockjob.c).
> +     * pending. Accessed under job_mutex.

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 14/16] job.c: use job_get_aio_context()
  2022-01-05 14:02 ` [PATCH v3 14/16] job.c: use job_get_aio_context() Emanuele Giuseppe Esposito
@ 2022-01-19 10:31   ` Paolo Bonzini
  2022-01-21 12:33     ` Emanuele Giuseppe Esposito
  2022-01-21 15:18     ` Emanuele Giuseppe Esposito
  0 siblings, 2 replies; 38+ messages in thread
From: Paolo Bonzini @ 2022-01-19 10:31 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow


Getters such as job_get_aio_context are often wrong because the 
AioContext can change immediately after returning.

So, I wonder if job.aio_context should be protected with a kind of "fake 
rwlock": read under BQL or job_lock, write under BQL+job_lock.  For this 
to work, you can add an assertion for qemu_in_main_thread() to 
child_job_set_aio_ctx, or even better have the assertion in a wrapper 
API job_set_aio_context_locked().

And then, we can remove job_get_aio_context().

Let's look at all cases individually:

On 1/5/22 15:02, Emanuele Giuseppe Esposito wrote:
> diff --git a/block/commit.c b/block/commit.c
> index f639eb49c5..961b57edf0 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -369,7 +369,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
>           goto fail;
>       }
>   
> -    s->base = blk_new(s->common.job.aio_context,
> +    s->base = blk_new(job_get_aio_context(&s->common.job),
>                         base_perms,
>                         BLK_PERM_CONSISTENT_READ
>                         | BLK_PERM_GRAPH_MOD

Here the AioContext is the one of bs.  It cannot change because we're 
under BQL.  Replace with bdrv_get_aio_context.

> @@ -382,7 +382,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
>       s->base_bs = base;
>   
>       /* Required permissions are already taken with block_job_add_bdrv() */
> -    s->top = blk_new(s->common.job.aio_context, 0, BLK_PERM_ALL);
> +    s->top = blk_new(job_get_aio_context(&s->common.job), 0, BLK_PERM_ALL);
>       ret = blk_insert_bs(s->top, top, errp);
>       if (ret < 0) {
>           goto fail;

Same.

> diff --git a/block/mirror.c b/block/mirror.c
> index 41450df55c..72b4367b4e 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1743,7 +1743,7 @@ static BlockJob *mirror_start_job(
>           target_perms |= BLK_PERM_GRAPH_MOD;
>       }
>   
> -    s->target = blk_new(s->common.job.aio_context,
> +    s->target = blk_new(job_get_aio_context(&s->common.job),
>                           target_perms, target_shared_perms);
>       ret = blk_insert_bs(s->target, target, errp);
>       if (ret < 0) {

Same.

> diff --git a/block/replication.c b/block/replication.c
> index 50ea778937..68018948b9 100644
> --- a/block/replication.c
> +++ b/block/replication.c
> @@ -148,8 +148,8 @@ static void replication_close(BlockDriverState *bs)
>       }
>       if (s->stage == BLOCK_REPLICATION_FAILOVER) {
>           commit_job = &s->commit_job->job;
> -        assert(commit_job->aio_context == qemu_get_current_aio_context());
>           WITH_JOB_LOCK_GUARD() {
> +            assert(commit_job->aio_context == qemu_get_current_aio_context());
>               job_cancel_sync_locked(commit_job, false);
>           }
>       }

Ok.

> diff --git a/blockjob.c b/blockjob.c
> index cf1f49f6c2..468ba735c5 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -155,14 +155,16 @@ static void child_job_set_aio_ctx(BdrvChild *c, AioContext *ctx,
>           bdrv_set_aio_context_ignore(sibling->bs, ctx, ignore);
>       }
>   
> -    job->job.aio_context = ctx;
> +    WITH_JOB_LOCK_GUARD() {
> +        job->job.aio_context = ctx;
> +    }
>   }
>   
>   static AioContext *child_job_get_parent_aio_context(BdrvChild *c)
>   {
>       BlockJob *job = c->opaque;
>   
> -    return job->job.aio_context;
> +    return job_get_aio_context(&job->job);
>   }
>   
>   static const BdrvChildClass child_job = {

Both called with BQL held, I think.

> @@ -218,19 +220,21 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
>   {
>       BdrvChild *c;
>       bool need_context_ops;
> +    AioContext *job_aiocontext;
>       assert(qemu_in_main_thread());
>   
>       bdrv_ref(bs);
>   
> -    need_context_ops = bdrv_get_aio_context(bs) != job->job.aio_context;
> +    job_aiocontext = job_get_aio_context(&job->job);
> +    need_context_ops = bdrv_get_aio_context(bs) != job_aiocontext;
>   
> -    if (need_context_ops && job->job.aio_context != qemu_get_aio_context()) {
> -        aio_context_release(job->job.aio_context);
> +    if (need_context_ops && job_aiocontext != qemu_get_aio_context()) {
> +        aio_context_release(job_aiocontext);
>       }
>       c = bdrv_root_attach_child(bs, name, &child_job, 0, perm, shared_perm, job,
>                                  errp);
> -    if (need_context_ops && job->job.aio_context != qemu_get_aio_context()) {
> -        aio_context_acquire(job->job.aio_context);
> +    if (need_context_ops && job_aiocontext != qemu_get_aio_context()) {
> +        aio_context_acquire(job_aiocontext);
>       }
>       if (c == NULL) {
>           return -EPERM;

BQL held, too.

> diff --git a/job.c b/job.c
> index f16a4ef542..8a5b710d9b 100644
> --- a/job.c
> +++ b/job.c
> @@ -566,7 +566,7 @@ void job_enter_cond_locked(Job *job, bool(*fn)(Job *job))
>       job->busy = true;
>       real_job_unlock();
>       job_unlock();
> -    aio_co_enter(job->aio_context, job->co);
> +    aio_co_enter(job_get_aio_context(job), job->co);
>       job_lock();
>   }
>   

If you replace aio_co_enter with aio_co_schedule, you can call it 
without dropping the lock.  The difference being that aio_co_schedule 
will always go through a bottom half.

> @@ -1138,7 +1138,6 @@ static void coroutine_fn job_co_entry(void *opaque)
>       Job *job = opaque;
>       int ret;
>   
> -    assert(job->aio_context == qemu_get_current_aio_context());
>       assert(job && job->driver && job->driver->run);
>       job_pause_point(job);
>       ret = job->driver->run(job, &job->err);
> @@ -1177,7 +1176,7 @@ void job_start(Job *job)
>           job->paused = false;
>           job_state_transition_locked(job, JOB_STATUS_RUNNING);
>       }
> -    aio_co_enter(job->aio_context, job->co);
> +    aio_co_enter(job_get_aio_context(job), job->co);

Better to use aio_co_schedule here, too, and move it under the previous 
WITH_JOB_LOCK_GUARD.

>   }
>   
>   void job_cancel_locked(Job *job, bool force)
> @@ -1303,7 +1302,8 @@ int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
>       }
>   
>       job_unlock();
> -    AIO_WAIT_WHILE(job->aio_context, (job_enter(job), !job_is_completed(job)));
> +    AIO_WAIT_WHILE(job_get_aio_context(job),
> +                   (job_enter(job), !job_is_completed(job)));
>       job_lock();

Here I think we are also holding the BQL, because this function is 
"sync", so it's safe to use job->aio_context.

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 16/16] block_job_query: remove atomic read
  2022-01-05 14:02 ` [PATCH v3 16/16] block_job_query: remove atomic read Emanuele Giuseppe Esposito
@ 2022-01-19 10:34   ` Paolo Bonzini
  0 siblings, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2022-01-19 10:34 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow

On 1/5/22 15:02, Emanuele Giuseppe Esposito wrote:
> +++ b/blockjob.c
> @@ -335,13 +335,13 @@ BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
>       info = g_new0(BlockJobInfo, 1);
>       info->type      = g_strdup(job_type_str(&job->job));
>       info->device    = g_strdup(job->job.id);
> -    info->busy      = qatomic_read(&job->job.busy);
> +    info->busy      = job->job.busy;
>       info->paused    = job->job.pause_count > 0;
>       info->offset    = progress_current;
>       info->len       = progress_total;
>       info->speed     = job->speed;
>       info->io_status = job->iostatus;
> -    info->ready     = job_is_ready(&job->job),
> +    info->ready     = job_is_ready_locked(&job->job),

This second part belongs earlier in the series, I think?

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 15/16] job.c: enable job lock/unlock and remove Aiocontext locks
  2022-01-05 14:02 ` [PATCH v3 15/16] job.c: enable job lock/unlock and remove Aiocontext locks Emanuele Giuseppe Esposito
@ 2022-01-19 10:35   ` Paolo Bonzini
  0 siblings, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2022-01-19 10:35 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow

On 1/5/22 15:02, Emanuele Giuseppe Esposito wrote:
> 
> Now that they are not nop anymore, remove the aiocontext
> to avoid deadlocks.

Ok, that should have been in patch 1, together with a description of the 
deadlocks. :)  Disregard that review.

> There is only one JobDriver callback, ->free() that assumes that
> the aiocontext lock is held (because it calls bdrv_unref), so for
> now keep that under aiocontext lock.

This explains the issue with detach_child in the other series, too.

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 03/16] job.h: define locked functions
  2022-01-05 14:01 ` [PATCH v3 03/16] job.h: define locked functions Emanuele Giuseppe Esposito
@ 2022-01-19 10:44   ` Paolo Bonzini
  2022-01-21 15:25     ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2022-01-19 10:44 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow

On 1/5/22 15:01, Emanuele Giuseppe Esposito wrote:
> These functions assume that the job lock is held by the
> caller, to avoid TOC/TOU conditions. Therefore, their
> name must end with _locked.
> 
> Introduce also additional helpers that define _locked
> functions (useful when the job_mutex is globally applied).
> 
> Note: at this stage, job_{lock/unlock} and job lock guard macros
> are*nop*.
> 
> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>

So, this is the only remaining issue: I am not sure about this rename.
The functions you are changing are

+void job_txn_unref_locked(JobTxn *txn);
+void job_txn_add_job_locked(JobTxn *txn, Job *job);
+void job_ref_locked(Job *job);
+void job_unref_locked(Job *job);
+void job_enter_cond_locked(Job *job, bool(*fn)(Job *job));
+bool job_is_completed_locked(Job *job);
+bool job_is_ready_locked(Job *job);
+void job_pause_locked(Job *job);
+void job_resume_locked(Job *job);
+void job_user_pause_locked(Job *job, Error **errp);
+bool job_user_paused_locked(Job *job);
+void job_user_resume_locked(Job *job, Error **errp);
+Job *job_next_locked(Job *job);
+Job *job_get_locked(const char *id);
+int job_apply_verb_locked(Job *job, JobVerb verb, Error **errp);
+void job_early_fail_locked(Job *job);
+void job_complete_locked(Job *job, Error **errp);
+void job_cancel_locked(Job *job, bool force);
+void job_user_cancel_locked(Job *job, bool force, Error **errp);
+int job_cancel_sync_locked(Job *job, bool force);
+int job_complete_sync_locked(Job *job, Error **errp);
+void job_finalize_locked(Job *job, Error **errp);
+void job_dismiss_locked(Job **job, Error **errp);
+int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),

and most of them (if not all?) will never be called by the job driver, only
by the monitor.  The two APIs (for driver / for monitor) are quite separate
and have different locking policies: the monitor needs to take the lock to
avoid TOC/TOU races, the driver generally can let the API take the lock.

The rename makes the monitor code heavier, but if you don't do the rename the
functions in job.c are named very inconsistently.  So I'm inclined to say
this patch is fine---but I'd like to hear from others as well.

I think the two APIs should be in two different header files, similar
to how you did the graph/IO split.

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 10/16] jobs: protect jobs with job_lock/unlock
  2022-01-05 14:02 ` [PATCH v3 10/16] jobs: protect jobs with job_lock/unlock Emanuele Giuseppe Esposito
@ 2022-01-19 10:50   ` Paolo Bonzini
  0 siblings, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2022-01-19 10:50 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow

On 1/5/22 15:02, Emanuele Giuseppe Esposito wrote:
> Introduce the job locking mechanism through the whole job API,
> following the comments and requirements of job-monitor (assume
> lock is held) and job-driver (lock is not held).
> 
> job_{lock/unlock} is independent from real_job_{lock/unlock}.
> 
> Note: at this stage, job_{lock/unlock} and job lock guard macros
> are *nop*.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>   block.c             |  18 +++---
>   block/replication.c |   8 ++-
>   blockdev.c          |  17 +++++-
>   blockjob.c          |  64 ++++++++++++++-------
>   job-qmp.c           |   2 +
>   job.c               | 132 +++++++++++++++++++++++++++++++-------------
>   monitor/qmp-cmds.c  |   6 +-
>   qemu-img.c          |  41 ++++++++------
>   8 files changed, 199 insertions(+), 89 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 8fcd525fa0..fac0759422 100644
> --- a/block.c
> +++ b/block.c
> @@ -4976,7 +4976,9 @@ static void bdrv_close(BlockDriverState *bs)
>   
>   void bdrv_close_all(void)
>   {
> -    assert(job_next_locked(NULL) == NULL);
> +    WITH_JOB_LOCK_GUARD() {
> +        assert(job_next_locked(NULL) == NULL);
> +    }
>       assert(qemu_in_main_thread());
>   
>       /* Drop references from requests still in flight, such as canceled block
> @@ -6154,13 +6156,15 @@ XDbgBlockGraph *bdrv_get_xdbg_block_graph(Error **errp)
>           }
>       }
>   
> -    for (job = block_job_next(NULL); job; job = block_job_next(job)) {
> -        GSList *el;
> +    WITH_JOB_LOCK_GUARD() {
> +        for (job = block_job_next(NULL); job; job = block_job_next(job)) {
> +            GSList *el;
>   
> -        xdbg_graph_add_node(gr, job, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
> -                           job->job.id);
> -        for (el = job->nodes; el; el = el->next) {
> -            xdbg_graph_add_edge(gr, job, (BdrvChild *)el->data);
> +            xdbg_graph_add_node(gr, job, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
> +                                job->job.id);
> +            for (el = job->nodes; el; el = el->next) {
> +                xdbg_graph_add_edge(gr, job, (BdrvChild *)el->data);
> +            }
>           }
>       }
>   
> diff --git a/block/replication.c b/block/replication.c
> index 5215c328c1..50ea778937 100644
> --- a/block/replication.c
> +++ b/block/replication.c
> @@ -149,7 +149,9 @@ static void replication_close(BlockDriverState *bs)
>       if (s->stage == BLOCK_REPLICATION_FAILOVER) {
>           commit_job = &s->commit_job->job;
>           assert(commit_job->aio_context == qemu_get_current_aio_context());
> -        job_cancel_sync_locked(commit_job, false);
> +        WITH_JOB_LOCK_GUARD() {
> +            job_cancel_sync_locked(commit_job, false);
> +        }
>       }
>   
>       if (s->mode == REPLICATION_MODE_SECONDARY) {
> @@ -726,7 +728,9 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
>            * disk, secondary disk in backup_job_completed().
>            */
>           if (s->backup_job) {
> -            job_cancel_sync_locked(&s->backup_job->job, true);
> +            WITH_JOB_LOCK_GUARD() {
> +                job_cancel_sync_locked(&s->backup_job->job, true);
> +            }
>           }
>   
>           if (!failover) {
> diff --git a/blockdev.c b/blockdev.c
> index ee35aff13a..099d57e0d2 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -155,6 +155,8 @@ void blockdev_mark_auto_del(BlockBackend *blk)
>           return;
>       }
>   
> +    JOB_LOCK_GUARD();
> +
>       for (job = block_job_next(NULL); job; job = block_job_next(job)) {
>           if (block_job_has_bdrv(job, blk_bs(blk))) {
>               AioContext *aio_context = job->job.aio_context;
> @@ -1832,7 +1834,9 @@ static void drive_backup_abort(BlkActionState *common)
>           aio_context = bdrv_get_aio_context(state->bs);
>           aio_context_acquire(aio_context);
>   
> -        job_cancel_sync_locked(&state->job->job, true);
> +        WITH_JOB_LOCK_GUARD() {
> +            job_cancel_sync_locked(&state->job->job, true);
> +        }
>   
>           aio_context_release(aio_context);
>       }
> @@ -1933,7 +1937,9 @@ static void blockdev_backup_abort(BlkActionState *common)
>           aio_context = bdrv_get_aio_context(state->bs);
>           aio_context_acquire(aio_context);
>   
> -        job_cancel_sync_locked(&state->job->job, true);
> +        WITH_JOB_LOCK_GUARD() {
> +            job_cancel_sync_locked(&state->job->job, true);
> +        }
>   
>           aio_context_release(aio_context);
>       }
> @@ -2382,7 +2388,10 @@ exit:
>       if (!has_props) {
>           qapi_free_TransactionProperties(props);
>       }
> -    job_txn_unref_locked(block_job_txn);
> +
> +    WITH_JOB_LOCK_GUARD() {
> +        job_txn_unref_locked(block_job_txn);
> +    }
>   }
>   
>   BlockDirtyBitmapSha256 *qmp_x_debug_block_dirty_bitmap_sha256(const char *node,
> @@ -3705,6 +3714,8 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
>       BlockJobInfoList *head = NULL, **tail = &head;
>       BlockJob *job;
>   
> +    JOB_LOCK_GUARD();
> +
>       for (job = block_job_next(NULL); job; job = block_job_next(job)) {
>           BlockJobInfo *value;
>   
> diff --git a/blockjob.c b/blockjob.c
> index ce356be51e..e00c8d31d5 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -88,7 +88,9 @@ static char *child_job_get_parent_desc(BdrvChild *c)
>   static void child_job_drained_begin(BdrvChild *c)
>   {
>       BlockJob *job = c->opaque;
> -    job_pause_locked(&job->job);
> +    WITH_JOB_LOCK_GUARD() {
> +        job_pause_locked(&job->job);
> +    }
>   }
>   
>   static bool child_job_drained_poll(BdrvChild *c)
> @@ -100,8 +102,10 @@ static bool child_job_drained_poll(BdrvChild *c)
>       /* An inactive or completed job doesn't have any pending requests. Jobs
>        * with !job->busy are either already paused or have a pause point after
>        * being reentered, so no job driver code will run before they pause. */
> -    if (!job->busy || job_is_completed_locked(job)) {
> -        return false;
> +    WITH_JOB_LOCK_GUARD() {
> +        if (!job->busy || job_is_completed_locked(job)) {
> +            return false;
> +        }
>       }
>   
>       /* Otherwise, assume that it isn't fully stopped yet, but allow the job to
> @@ -116,7 +120,9 @@ static bool child_job_drained_poll(BdrvChild *c)
>   static void child_job_drained_end(BdrvChild *c, int *drained_end_counter)
>   {
>       BlockJob *job = c->opaque;
> -    job_resume_locked(&job->job);
> +    WITH_JOB_LOCK_GUARD() {
> +        job_resume_locked(&job->job);
> +    }
>   }
>   
>   static bool child_job_can_set_aio_ctx(BdrvChild *c, AioContext *ctx,
> @@ -238,7 +244,13 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
>   
>   static void block_job_on_idle(Notifier *n, void *opaque)
>   {
> +    /*
> +     * we can't kick with job_mutex held, but we also want
> +     * to protect the notifier list.
> +     */
> +    job_unlock();
>       aio_wait_kick();
> +    job_lock();
>   }
>   
>   bool block_job_is_internal(BlockJob *job)
> @@ -278,7 +290,9 @@ bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
>       job->speed = speed;
>   
>       if (drv->set_speed) {
> +        job_unlock();
>           drv->set_speed(job, speed);
> +        job_lock();
>       }
>   
>       if (speed && speed <= old_speed) {
> @@ -458,13 +472,15 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
>       job->ready_notifier.notify = block_job_event_ready;
>       job->idle_notifier.notify = block_job_on_idle;
>   
> -    notifier_list_add(&job->job.on_finalize_cancelled,
> -                      &job->finalize_cancelled_notifier);
> -    notifier_list_add(&job->job.on_finalize_completed,
> -                      &job->finalize_completed_notifier);
> -    notifier_list_add(&job->job.on_pending, &job->pending_notifier);
> -    notifier_list_add(&job->job.on_ready, &job->ready_notifier);
> -    notifier_list_add(&job->job.on_idle, &job->idle_notifier);
> +    WITH_JOB_LOCK_GUARD() {
> +        notifier_list_add(&job->job.on_finalize_cancelled,
> +                          &job->finalize_cancelled_notifier);
> +        notifier_list_add(&job->job.on_finalize_completed,
> +                          &job->finalize_completed_notifier);
> +        notifier_list_add(&job->job.on_pending, &job->pending_notifier);
> +        notifier_list_add(&job->job.on_ready, &job->ready_notifier);
> +        notifier_list_add(&job->job.on_idle, &job->idle_notifier);
> +    }
>   
>       error_setg(&job->blocker, "block device is in use by block job: %s",
>                  job_type_str(&job->job));
> @@ -477,11 +493,14 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
>       blk_set_disable_request_queuing(blk, true);
>       blk_set_allow_aio_context_change(blk, true);
>   
> -    if (!block_job_set_speed(job, speed, errp)) {
> -        job_early_fail(&job->job);
> -        return NULL;
> +    WITH_JOB_LOCK_GUARD() {
> +        if (!block_job_set_speed(job, speed, errp)) {
> +            job_early_fail_locked(&job->job);
> +            return NULL;
> +        }
>       }
>   
> +
>       return job;
>   }
>   
> @@ -499,7 +518,9 @@ void block_job_user_resume(Job *job)
>   {
>       BlockJob *bjob = container_of(job, BlockJob, job);
>       assert(qemu_in_main_thread());
> -    block_job_iostatus_reset(bjob);
> +    WITH_JOB_LOCK_GUARD() {
> +        block_job_iostatus_reset(bjob);
> +    }
>   }
>   
>   BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
> @@ -532,10 +553,15 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
>                                           action);
>       }
>       if (action == BLOCK_ERROR_ACTION_STOP) {
> -        if (!job->job.user_paused) {
> -            job_pause_locked(&job->job);
> -            /* make the pause user visible, which will be resumed from QMP. */
> -            job->job.user_paused = true;
> +        WITH_JOB_LOCK_GUARD() {
> +            if (!job->job.user_paused) {
> +                job_pause_locked(&job->job);
> +                /*
> +                 * make the pause user visible, which will be
> +                 * resumed from QMP.
> +                 */
> +                job->job.user_paused = true;
> +            }
>           }
>           block_job_iostatus_set_err(job, error);
>       }
> diff --git a/job-qmp.c b/job-qmp.c
> index f6f9840436..9fa14bf761 100644
> --- a/job-qmp.c
> +++ b/job-qmp.c
> @@ -171,6 +171,8 @@ JobInfoList *qmp_query_jobs(Error **errp)
>       JobInfoList *head = NULL, **tail = &head;
>       Job *job;
>   
> +    JOB_LOCK_GUARD();
> +
>       for (job = job_next_locked(NULL); job; job = job_next_locked(job)) {
>           JobInfo *value;
>   
> diff --git a/job.c b/job.c
> index 2ee7233763..56722a5043 100644
> --- a/job.c
> +++ b/job.c
> @@ -394,6 +394,8 @@ void *job_create(const char *job_id, const JobDriver *driver, JobTxn *txn,
>   {
>       Job *job;
>   
> +    JOB_LOCK_GUARD();
> +
>       if (job_id) {
>           if (flags & JOB_INTERNAL) {
>               error_setg(errp, "Cannot specify job ID for internal job");
> @@ -467,7 +469,9 @@ void job_unref_locked(Job *job)
>           assert(!job->txn);
>   
>           if (job->driver->free) {
> +            job_unlock();
>               job->driver->free(job);
> +            job_lock();
>           }
>   
>           QLIST_REMOVE(job, job_list);
> @@ -551,11 +555,14 @@ void job_enter_cond_locked(Job *job, bool(*fn)(Job *job))
>       timer_del(&job->sleep_timer);
>       job->busy = true;
>       real_job_unlock();
> +    job_unlock();
>       aio_co_enter(job->aio_context, job->co);
> +    job_lock();
>   }
>   
>   void job_enter(Job *job)
>   {
> +    JOB_LOCK_GUARD();
>       job_enter_cond_locked(job, NULL);
>   }
>   
> @@ -574,7 +581,9 @@ static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
>       job->busy = false;
>       job_event_idle(job);
>       real_job_unlock();
> +    job_unlock();
>       qemu_coroutine_yield();
> +    job_lock();
>   
>       /* Set by job_enter_cond_locked() before re-entering the coroutine.  */
>       assert(job->busy);
> @@ -584,18 +593,23 @@ void coroutine_fn job_pause_point(Job *job)
>   {
>       assert(job && job_started(job));
>   
> +    job_lock();
>       if (!job_should_pause(job)) {
> +        job_unlock();
>           return;
>       }
> -    if (job_is_cancelled(job)) {
> +    if (job_is_cancelled_locked(job)) {
> +        job_unlock();
>           return;
>       }
>   
>       if (job->driver->pause) {
> +        job_unlock();
>           job->driver->pause(job);
> +        job_lock();
>       }
>   
> -    if (job_should_pause(job) && !job_is_cancelled(job)) {
> +    if (job_should_pause(job) && !job_is_cancelled_locked(job)) {
>           JobStatus status = job->status;
>           job_state_transition(job, status == JOB_STATUS_READY
>                                     ? JOB_STATUS_STANDBY
> @@ -605,6 +619,7 @@ void coroutine_fn job_pause_point(Job *job)
>           job->paused = false;
>           job_state_transition(job, status);
>       }
> +    job_unlock();
>   
>       if (job->driver->resume) {
>           job->driver->resume(job);
> @@ -613,15 +628,17 @@ void coroutine_fn job_pause_point(Job *job)
>   
>   void job_yield(Job *job)
>   {
> -    assert(job->busy);
> +    WITH_JOB_LOCK_GUARD() {
> +        assert(job->busy);
>   
> -    /* Check cancellation *before* setting busy = false, too!  */
> -    if (job_is_cancelled(job)) {
> -        return;
> -    }
> +        /* Check cancellation *before* setting busy = false, too!  */
> +        if (job_is_cancelled_locked(job)) {
> +            return;
> +        }
>   
> -    if (!job_should_pause(job)) {
> -        job_do_yield(job, -1);
> +        if (!job_should_pause(job)) {
> +            job_do_yield(job, -1);
> +        }
>       }
>   
>       job_pause_point(job);
> @@ -629,21 +646,23 @@ void job_yield(Job *job)
>   
>   void coroutine_fn job_sleep_ns(Job *job, int64_t ns)
>   {
> -    assert(job->busy);
> +    WITH_JOB_LOCK_GUARD() {
> +        assert(job->busy);
>   
> -    /* Check cancellation *before* setting busy = false, too!  */
> -    if (job_is_cancelled(job)) {
> -        return;
> -    }
> +        /* Check cancellation *before* setting busy = false, too!  */
> +        if (job_is_cancelled_locked(job)) {
> +            return;
> +        }
>   
> -    if (!job_should_pause(job)) {
> -        job_do_yield(job, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + ns);
> +        if (!job_should_pause(job)) {
> +            job_do_yield(job, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + ns);
> +        }
>       }
>   
>       job_pause_point(job);
>   }
>   
> -/* Assumes the block_job_mutex is held */
> +/* Assumes the job_mutex is held */
>   static bool job_timer_not_pending(Job *job)
>   {
>       return !timer_pending(&job->sleep_timer);
> @@ -653,7 +672,7 @@ void job_pause_locked(Job *job)
>   {
>       job->pause_count++;
>       if (!job->paused) {
> -        job_enter(job);
> +        job_enter_cond_locked(job, NULL);
>       }
>   }
>   
> @@ -699,7 +718,9 @@ void job_user_resume_locked(Job *job, Error **errp)
>           return;
>       }
>       if (job->driver->user_resume) {
> +        job_unlock();
>           job->driver->user_resume(job);
> +        job_lock();
>       }
>       job->user_paused = false;
>       job_resume_locked(job);
> @@ -753,7 +774,7 @@ static void job_conclude(Job *job)
>   
>   static void job_update_rc(Job *job)
>   {
> -    if (!job->ret && job_is_cancelled(job)) {
> +    if (!job->ret && job_is_cancelled_locked(job)) {
>           job->ret = -ECANCELED;
>       }
>       if (job->ret) {
> @@ -769,7 +790,9 @@ static void job_commit(Job *job)
>       assert(!job->ret);
>       assert(qemu_in_main_thread());
>       if (job->driver->commit) {
> +        job_unlock();
>           job->driver->commit(job);
> +        job_lock();
>       }
>   }
>   
> @@ -778,7 +801,9 @@ static void job_abort(Job *job)
>       assert(job->ret);
>       assert(qemu_in_main_thread());
>       if (job->driver->abort) {
> +        job_unlock();
>           job->driver->abort(job);
> +        job_lock();
>       }
>   }
>   
> @@ -786,12 +811,15 @@ static void job_clean(Job *job)
>   {
>       assert(qemu_in_main_thread());
>       if (job->driver->clean) {
> +        job_unlock();
>           job->driver->clean(job);
> +        job_lock();
>       }
>   }
>   
>   static int job_finalize_single(Job *job)
>   {
> +    int job_ret;
>       AioContext *ctx = job->aio_context;
>   
>       assert(job_is_completed_locked(job));
> @@ -811,12 +839,15 @@ static int job_finalize_single(Job *job)
>       aio_context_release(ctx);
>   
>       if (job->cb) {
> -        job->cb(job->opaque, job->ret);
> +        job_ret = job->ret;
> +        job_unlock();
> +        job->cb(job->opaque, job_ret);
> +        job_lock();
>       }
>   
>       /* Emit events only if we actually started */
>       if (job_started(job)) {
> -        if (job_is_cancelled(job)) {
> +        if (job_is_cancelled_locked(job)) {
>               job_event_cancelled(job);
>           } else {
>               job_event_completed(job);
> @@ -832,7 +863,9 @@ static void job_cancel_async(Job *job, bool force)
>   {
>       assert(qemu_in_main_thread());
>       if (job->driver->cancel) {
> +        job_unlock();
>           force = job->driver->cancel(job, force);
> +        job_lock();
>       } else {
>           /* No .cancel() means the job will behave as if force-cancelled */
>           force = true;
> @@ -841,7 +874,9 @@ static void job_cancel_async(Job *job, bool force)
>       if (job->user_paused) {
>           /* Do not call job_enter here, the caller will handle it.  */
>           if (job->driver->user_resume) {
> +            job_unlock();
>               job->driver->user_resume(job);
> +            job_lock();
>           }
>           job->user_paused = false;
>           assert(job->pause_count > 0);
> @@ -911,7 +946,7 @@ static void job_completed_txn_abort(Job *job)
>           ctx = other_job->aio_context;
>           aio_context_acquire(ctx);
>           if (!job_is_completed_locked(other_job)) {
> -            assert(job_cancel_requested(other_job));
> +            assert(job_cancel_requested_locked(other_job));
>               job_finish_sync_locked(other_job, NULL, NULL);
>           }
>           job_finalize_single(other_job);
> @@ -930,13 +965,17 @@ static void job_completed_txn_abort(Job *job)
>   
>   static int job_prepare(Job *job)
>   {
> +    int ret;
>       AioContext *ctx = job->aio_context;
>       assert(qemu_in_main_thread());
>   
>       if (job->ret == 0 && job->driver->prepare) {
> +        job_unlock();
>           aio_context_acquire(ctx);
> -        job->ret = job->driver->prepare(job);
> +        ret = job->driver->prepare(job);
>           aio_context_release(ctx);
> +        job_lock();
> +        job->ret = ret;
>           job_update_rc(job);
>       }
>   
> @@ -982,6 +1021,7 @@ static int job_transition_to_pending(Job *job)
>   
>   void job_transition_to_ready(Job *job)
>   {
> +    JOB_LOCK_GUARD();
>       job_state_transition(job, JOB_STATUS_READY);
>       job_event_ready(job);
>   }
> @@ -1031,6 +1071,7 @@ static void job_exit(void *opaque)
>       Job *job = (Job *)opaque;
>       AioContext *ctx;
>   
> +    JOB_LOCK_GUARD();
>       job_ref_locked(job);
>       aio_context_acquire(job->aio_context);
>   
> @@ -1061,13 +1102,17 @@ static void job_exit(void *opaque)
>   static void coroutine_fn job_co_entry(void *opaque)
>   {
>       Job *job = opaque;
> +    int ret;
>   
>       assert(job->aio_context == qemu_get_current_aio_context());
>       assert(job && job->driver && job->driver->run);
>       job_pause_point(job);
> -    job->ret = job->driver->run(job, &job->err);
> -    job->deferred_to_main_loop = true;
> -    job->busy = true;
> +    ret = job->driver->run(job, &job->err);
> +    WITH_JOB_LOCK_GUARD() {
> +        job->ret = ret;
> +        job->deferred_to_main_loop = true;
> +        job->busy = true;
> +    }
>       aio_bh_schedule_oneshot(qemu_get_aio_context(), job_exit, job);
>   }
>   
> @@ -1083,16 +1128,20 @@ static int job_pre_run(Job *job)
>   
>   void job_start(Job *job)
>   {
> -    assert(job && !job_started(job) && job->paused &&
> -           job->driver && job->driver->run);
> -    job->co = qemu_coroutine_create(job_co_entry, job);
> +    WITH_JOB_LOCK_GUARD() {
> +        assert(job && !job_started(job) && job->paused &&
> +            job->driver && job->driver->run);
> +        job->co = qemu_coroutine_create(job_co_entry, job);
> +    }
>       if (job_pre_run(job)) {
>           return;
>       }
> -    job->pause_count--;
> -    job->busy = true;
> -    job->paused = false;
> -    job_state_transition(job, JOB_STATUS_RUNNING);
> +    WITH_JOB_LOCK_GUARD() {
> +        job->pause_count--;
> +        job->busy = true;
> +        job->paused = false;
> +        job_state_transition(job, JOB_STATUS_RUNNING);
> +    }
>       aio_co_enter(job->aio_context, job->co);
>   }
>   
> @@ -1116,11 +1165,11 @@ void job_cancel_locked(Job *job, bool force)
>            * choose to call job_is_cancelled() to show that we invoke
>            * job_completed_txn_abort() only for force-cancelled jobs.)
>            */
> -        if (job_is_cancelled(job)) {
> +        if (job_is_cancelled_locked(job)) {
>               job_completed_txn_abort(job);
>           }
>       } else {
> -        job_enter(job);
> +        job_enter_cond_locked(job, NULL);
>       }
>   }
>   
> @@ -1164,6 +1213,7 @@ void job_cancel_sync_all(void)
>       Job *job;
>       AioContext *aio_context;
>   
> +    JOB_LOCK_GUARD();
>       while ((job = job_next_locked(NULL))) {
>           aio_context = job->aio_context;
>           aio_context_acquire(aio_context);
> @@ -1185,13 +1235,15 @@ void job_complete_locked(Job *job, Error **errp)
>       if (job_apply_verb_locked(job, JOB_VERB_COMPLETE, errp)) {
>           return;
>       }
> -    if (job_cancel_requested(job) || !job->driver->complete) {
> +    if (job_cancel_requested_locked(job) || !job->driver->complete) {
>           error_setg(errp, "The active block job '%s' cannot be completed",
>                      job->id);
>           return;
>       }
>   
> +    job_unlock();
>       job->driver->complete(job, errp);
> +    job_lock();
>   }
>   
>   int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
> @@ -1211,10 +1263,12 @@ int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
>           return -EBUSY;
>       }
>   
> -    AIO_WAIT_WHILE(job->aio_context,
> -                   (job_enter(job), !job_is_completed_locked(job)));
> +    job_unlock();
> +    AIO_WAIT_WHILE(job->aio_context, (job_enter(job), !job_is_completed(job)));
> +    job_lock();
>   
> -    ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : job->ret;
> +    ret = (job_is_cancelled_locked(job) && job->ret == 0) ?
> +           -ECANCELED : job->ret;
>       job_unref_locked(job);
>       return ret;
>   }
> diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
> index 343353e27a..2f11d086a6 100644
> --- a/monitor/qmp-cmds.c
> +++ b/monitor/qmp-cmds.c
> @@ -133,8 +133,10 @@ void qmp_cont(Error **errp)
>           blk_iostatus_reset(blk);
>       }
>   
> -    for (job = block_job_next(NULL); job; job = block_job_next(job)) {
> -        block_job_iostatus_reset(job);
> +    WITH_JOB_LOCK_GUARD() {
> +        for (job = block_job_next(NULL); job; job = block_job_next(job)) {
> +            block_job_iostatus_reset(job);
> +        }
>       }
>   
>       /* Continuing after completed migration. Images have been inactivated to
> diff --git a/qemu-img.c b/qemu-img.c
> index 09f3b11eab..95e2e33e61 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -906,25 +906,30 @@ static void run_block_job(BlockJob *job, Error **errp)
>       int ret = 0;
>   
>       aio_context_acquire(aio_context);
> -    job_ref_locked(&job->job);
> -    do {
> -        float progress = 0.0f;
> -        aio_poll(aio_context, true);
> +    WITH_JOB_LOCK_GUARD() {
> +        job_ref_locked(&job->job);
> +        do {
> +            float progress = 0.0f;
> +            job_unlock();
> +            aio_poll(aio_context, true);
> +
> +            progress_get_snapshot(&job->job.progress, &progress_current,
> +                                &progress_total);
> +            if (progress_total) {
> +                progress = (float)progress_current / progress_total * 100.f;
> +            }
> +            qemu_progress_print(progress, 0);
> +            job_lock();
> +        } while (!job_is_ready_locked(&job->job) &&
> +                !job_is_completed_locked(&job->job));
>   
> -        progress_get_snapshot(&job->job.progress, &progress_current,
> -                              &progress_total);
> -        if (progress_total) {
> -            progress = (float)progress_current / progress_total * 100.f;
> +        if (!job_is_completed_locked(&job->job)) {
> +            ret = job_complete_sync_locked(&job->job, errp);
> +        } else {
> +            ret = job->job.ret;
>           }
> -        qemu_progress_print(progress, 0);
> -    } while (!job_is_ready(&job->job) && !job_is_completed_locked(&job->job));
> -
> -    if (!job_is_completed_locked(&job->job)) {
> -        ret = job_complete_sync_locked(&job->job, errp);
> -    } else {
> -        ret = job->job.ret;
> +        job_unref_locked(&job->job);
>       }
> -    job_unref_locked(&job->job);
>       aio_context_release(aio_context);
>   
>       /* publish completion progress only when success */
> @@ -1077,7 +1082,9 @@ static int img_commit(int argc, char **argv)
>           bdrv_ref(bs);
>       }
>   
> -    job = block_job_get("commit");
> +    WITH_JOB_LOCK_GUARD() {
> +        job = block_job_get("commit");
> +    }
>       assert(job);
>       run_block_job(job, &local_err);
>       if (local_err) {

If meaningful, I'd either do this much earlier, or do the _locked rename 
much later.  Having such a large part of the series where _locked 
functions are *not* locked is confusing.

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 05/16] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU
  2022-01-05 14:01 ` [PATCH v3 05/16] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU Emanuele Giuseppe Esposito
@ 2022-01-19 11:06   ` Paolo Bonzini
  0 siblings, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2022-01-19 11:06 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow

On 1/5/22 15:01, Emanuele Giuseppe Esposito wrote:
> 
> +    WITH_JOB_LOCK_GUARD() {
> +        abort = job->ret < 0;
> +    }
> +
>      if (s->prepared) {
>          return 0;
>      }

At this point I think job->ret is stable and can be accessed without 
guard.  The question however is what serializes calls to job_prepare. 
Is it the BQL?  Can we say that job->ret is only written under BQL, just 
like job->aio_context.

> @@ -1161,8 +1165,10 @@ static void mirror_complete(Job *job, Error **errp)
>      s->should_complete = true;
>  
>      /* If the job is paused, it will be re-entered when it is resumed */
> -    if (!job->paused) {
> -        job_enter(job);
> +    WITH_JOB_LOCK_GUARD() {
> +        if (!job->paused) {
> +            job_enter_cond_locked(job, NULL);
> +        }
>      }

I don't want to open a can of worms, but does it ever make sense to call 
job_enter while the job is paused?  Should this condition be moved to 
job_enter_cond_locked?

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 09/16] jobs: remove aiocontext locks since the functions are under BQL
  2022-01-05 14:02 ` [PATCH v3 09/16] jobs: remove aiocontext locks since the functions are under BQL Emanuele Giuseppe Esposito
@ 2022-01-19 11:09   ` Paolo Bonzini
  2022-01-26 16:18     ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2022-01-19 11:09 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow

On 1/5/22 15:02, Emanuele Giuseppe Esposito wrote:
> In preparation to the job_lock/unlock patch, remove these
> aiocontext locks.
> The main reason these two locks are removed here is because
> they are inside a loop iterating on the jobs list. Once the
> job_lock is added, it will have to protect the whole loop,
> wrapping also the aiocontext acquire/release.
> 
> We don't want this, as job_lock can only be *wrapped by*
> the aiocontext lock, and not vice-versa, to avoid deadlocks.

Better to avoid the passive: "must be taken inside the AioContext lock, 
and taking it outside would cause deadlocks".  Also add a note about the 
lock hierarchy to patch 1.

> @@ -3707,15 +3707,11 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
>   
>       for (job = block_job_next(NULL); job; job = block_job_next(job)) {
>           BlockJobInfo *value;
> -        AioContext *aio_context;
>   
>           if (block_job_is_internal(job)) {
>               continue;
>           }

block_job_next, block_job_query, etc. do not have the _locked suffix. 
Is this because all block_job_ functions need the job_mutex held, or 
just laziness? :)

Paolo

> -        aio_context = blk_get_aio_context(job->blk);
> -        aio_context_acquire(aio_context);
>           value = block_job_query(job, errp);
> -        aio_context_release(aio_context);
>           if (!value) {
>               qapi_free_BlockJobInfoList(head);
>               return NULL;
> diff --git a/job-qmp.c b/job-qmp.c
> index de4120a1d4..f6f9840436 100644
> --- a/job-qmp.c
> +++ b/job-qmp.c
> @@ -173,15 +173,11 @@ JobInfoList *qmp_query_jobs(Error **errp)
>   
>       for (job = job_next_locked(NULL); job; job = job_next_locked(job)) {
>           JobInfo *value;
> -        AioContext *aio_context;
>   
>           if (job_is_internal(job)) {
>               continue;
>           }
> -        aio_context = job->aio_context;
> -        aio_context_acquire(aio_context);
>           value = job_query_single(job, errp);
> -        aio_context_release(aio_context);
>           if (!value) {
>               qapi_free_JobInfoList(head);
>               return NULL;



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 01/16] job.c: make job_mutex and job_lock/unlock() public
  2022-01-19  9:56   ` Paolo Bonzini
@ 2022-01-19 11:13     ` Paolo Bonzini
  0 siblings, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2022-01-19 11:13 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow

On 1/19/22 10:56, Paolo Bonzini wrote:
> On 1/5/22 15:01, Emanuele Giuseppe Esposito wrote:
>> job mutex will be used to protect the job struct elements and list,
>> replacing AioContext locks.
>>
>> Right now use a shared lock for all jobs, in order to keep things
>> simple. Once the AioContext lock is gone, we can introduce per-job
>> locks.
> 
> Not even needed in my opinion, this is not a fast path.  But we'll see.
> 
>> To simplify the switch from aiocontext to job lock, introduce
>> *nop* lock/unlock functions and macros. Once everything is protected
>> by jobs, we can add the mutex and remove the aiocontext.
>>
>> Since job_mutex is already being used, add static
>> real_job_{lock/unlock}.
> 
> Out of curiosity, what breaks if the real job lock is used from the 
> start?  (It probably should be mentioned in the commit message).
> 
> 
>> -static void job_lock(void)
>> +static void real_job_lock(void)
>>   {
>>       qemu_mutex_lock(&job_mutex);
>>   }
>> -static void job_unlock(void)
>> +static void real_job_unlock(void)
>>   {
>>       qemu_mutex_unlock(&job_mutex);
>>   }
> 
> Would it work to
> 
> #define job_lock real_job_lock
> #define job_unlock real_job_unlock
> 
> instead of having to do the changes below?

Ignore all this, please.

Paolo



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 00/16] job: replace AioContext lock with job_mutex
  2022-01-05 14:01 [PATCH v3 00/16] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (15 preceding siblings ...)
  2022-01-05 14:02 ` [PATCH v3 16/16] block_job_query: remove atomic read Emanuele Giuseppe Esposito
@ 2022-01-19 11:15 ` Paolo Bonzini
  16 siblings, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2022-01-19 11:15 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow

On 1/5/22 15:01, Emanuele Giuseppe Esposito wrote:
> In this series, we want to remove the AioContext lock and instead
> use the already existent job_mutex to protect the job structures
> and list. This is part of the work to get rid of AioContext lock
> usage in favour of smaller granularity locks.
> 
> In order to simplify reviewer's job, job lock/unlock functions and
> macros are added as empty prototypes (nop) in patch 1.
> They are converted to use the actual job mutex only in the last
> patch, 14. In this way we can freely create locking sections
> without worrying about deadlocks with the aiocontext lock.

Oops, sorry -- I missed this explanation when first reading the cover 
letter.  Good job, though it needs another iteration; especially for 
patch 14, and possibly to decide the right placement of patches 10-13.

Thanks,

Paolo

> Patch 2 defines what fields in the job structure need protection,
> and patches 3-4 categorize respectively locked and unlocked
> functions in the job API.
> 
> Patch 5-9 are in preparation to the job locks, they try to reduce
> the aiocontext critical sections and other minor fixes.
> 
> Patch 10-13 introduces the (nop) job lock into the job API and
> its users, following the comments and categorizations done in
> patch 2-3-4.
> 
> Patch 14 makes the prototypes in patch 1 use the job_mutex and
> removes all aiocontext lock at the same time.
> 
> Tested this series by running unit tests, qemu-iotests and qtests
> (x86_64).
> 
> This serie is based on my previous series "block layer: split
> block APIs in global state and I/O".
> 
> Based-on: <20211124064418.3120601-1-eesposit@redhat.com>
> ---
> v3:
> * add "_locked" suffix to the functions called under job_mutex lock
> * rename _job_lock in real_job_lock
> * job_mutex is now public, and drivers like monitor use it directly
> * introduce and protect job_get_aio_context
> * remove mirror-specific APIs and just use WITH_JOB_GUARD
> * more extensive use of WITH_JOB_GUARD and JOB_LOCK_GUARD
> 
> RFC v2:
> * use JOB_LOCK_GUARD and WITH_JOB_LOCK_GUARD
> * mu(u)ltiple typos in commit messages
> * job API split patches are sent separately in another series
> * use of empty job_{lock/unlock} and JOB_LOCK_GUARD/WITH_JOB_LOCK_GUARD
>    to avoid deadlocks and simplify the reviewer job
> * move patch 11 (block_job_query: remove atomic read) as last
> 
> Emanuele Giuseppe Esposito (16):
>    job.c: make job_mutex and job_lock/unlock() public
>    job.h: categorize fields in struct Job
>    job.h: define locked functions
>    job.h: define unlocked functions
>    block/mirror.c: use of job helpers in drivers to avoid TOC/TOU
>    job.c: make job_event_* functions static
>    job.c: move inner aiocontext lock in callbacks
>    aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED
>    jobs: remove aiocontext locks since the functions are under BQL
>    jobs: protect jobs with job_lock/unlock
>    jobs: document all static functions and add _locked() suffix
>    jobs: use job locks and helpers also in the unit tests
>    jobs: add job lock in find_* functions
>    job.c: use job_get_aio_context()
>    job.c: enable job lock/unlock and remove Aiocontext locks
>    block_job_query: remove atomic read
> 
>   block.c                          |  18 +-
>   block/commit.c                   |   4 +-
>   block/mirror.c                   |  21 +-
>   block/replication.c              |  10 +-
>   blockdev.c                       | 112 ++----
>   blockjob.c                       | 122 +++---
>   include/block/aio-wait.h         |  15 +-
>   include/qemu/job.h               | 317 +++++++++++----
>   job-qmp.c                        |  74 ++--
>   job.c                            | 656 +++++++++++++++++++------------
>   monitor/qmp-cmds.c               |   6 +-
>   qemu-img.c                       |  41 +-
>   tests/unit/test-bdrv-drain.c     |  46 +--
>   tests/unit/test-block-iothread.c |  14 +-
>   tests/unit/test-blockjob-txn.c   |  24 +-
>   tests/unit/test-blockjob.c       |  98 ++---
>   16 files changed, 947 insertions(+), 631 deletions(-)
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 14/16] job.c: use job_get_aio_context()
  2022-01-19 10:31   ` Paolo Bonzini
@ 2022-01-21 12:33     ` Emanuele Giuseppe Esposito
  2022-01-21 17:43       ` Emanuele Giuseppe Esposito
  2022-01-21 15:18     ` Emanuele Giuseppe Esposito
  1 sibling, 1 reply; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-21 12:33 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow



On 19/01/2022 11:31, Paolo Bonzini wrote:
>> diff --git a/blockjob.c b/blockjob.c
>> index cf1f49f6c2..468ba735c5 100644
>> --- a/blockjob.c
>> +++ b/blockjob.c
>> @@ -155,14 +155,16 @@ static void child_job_set_aio_ctx(BdrvChild *c, 
>> AioContext *ctx,
>>           bdrv_set_aio_context_ignore(sibling->bs, ctx, ignore);
>>       }
>> -    job->job.aio_context = ctx;
>> +    WITH_JOB_LOCK_GUARD() {
>> +        job->job.aio_context = ctx;
>> +    }
>>   }
>>   static AioContext *child_job_get_parent_aio_context(BdrvChild *c)
>>   {
>>       BlockJob *job = c->opaque;
>> -    return job->job.aio_context;
>> +    return job_get_aio_context(&job->job);
>>   }
>>   static const BdrvChildClass child_job = {
> 
> Both called with BQL held, I think.

Yes, as their callbacks .get_parent_aio_context and .set_aio_context are 
defined as GS functions in block_int-common.h
> 
>> @@ -218,19 +220,21 @@ int block_job_add_bdrv(BlockJob *job, const char 
>> *name, BlockDriverState *bs,
>>   {
>>       BdrvChild *c;
>>       bool need_context_ops;
>> +    AioContext *job_aiocontext;
>>       assert(qemu_in_main_thread());
>>       bdrv_ref(bs);
>> -    need_context_ops = bdrv_get_aio_context(bs) != job->job.aio_context;
>> +    job_aiocontext = job_get_aio_context(&job->job);
>> +    need_context_ops = bdrv_get_aio_context(bs) != job_aiocontext;
>> -    if (need_context_ops && job->job.aio_context != 
>> qemu_get_aio_context()) {
>> -        aio_context_release(job->job.aio_context);
>> +    if (need_context_ops && job_aiocontext != qemu_get_aio_context()) {
>> +        aio_context_release(job_aiocontext);
>>       }
>>       c = bdrv_root_attach_child(bs, name, &child_job, 0, perm, 
>> shared_perm, job,
>>                                  errp);
>> -    if (need_context_ops && job->job.aio_context != 
>> qemu_get_aio_context()) {
>> -        aio_context_acquire(job->job.aio_context);
>> +    if (need_context_ops && job_aiocontext != qemu_get_aio_context()) {
>> +        aio_context_acquire(job_aiocontext);
>>       }
>>       if (c == NULL) {
>>           return -EPERM;
> 
> BQL held, too.

Wouldn't it be better to keep job_get_aio_context and implement like this:

AioContext *job_get_aio_context(Job *job)
{
     /*
      * Job AioContext can be written under BQL+job_mutex,
      * but can be read with just the BQL held.
      */
     assert(qemu_in_main_thread());
     return job->aio_context;
}

and instead job_set_aio_context:

void job_set_aio_context(Job *job, AioContext *ctx)
{
     JOB_LOCK_GUARD();
     assert(qemu_in_main_thread());
     job->aio_context = ctx;
}

(obviously implement also _locked version, if needed, and probably move 
the comment in get_aio_context in job.h).

Thank you,
Emanuele



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 14/16] job.c: use job_get_aio_context()
  2022-01-19 10:31   ` Paolo Bonzini
  2022-01-21 12:33     ` Emanuele Giuseppe Esposito
@ 2022-01-21 15:18     ` Emanuele Giuseppe Esposito
  2022-01-24 14:22       ` Paolo Bonzini
  1 sibling, 1 reply; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-21 15:18 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow



On 19/01/2022 11:31, Paolo Bonzini wrote:
> 
>> diff --git a/job.c b/job.c
>> index f16a4ef542..8a5b710d9b 100644
>> --- a/job.c
>> +++ b/job.c
>> @@ -566,7 +566,7 @@ void job_enter_cond_locked(Job *job, bool(*fn)(Job 
>> *job))
>>       job->busy = true;
>>       real_job_unlock();
>>       job_unlock();
>> -    aio_co_enter(job->aio_context, job->co);
>> +    aio_co_enter(job_get_aio_context(job), job->co);
>>       job_lock();
>>   }
> 
> If you replace aio_co_enter with aio_co_schedule, you can call it 
> without dropping the lock.  The difference being that aio_co_schedule 
> will always go through a bottom half.
> 
>> @@ -1138,7 +1138,6 @@ static void coroutine_fn job_co_entry(void *opaque)
>>       Job *job = opaque;
>>       int ret;
>> -    assert(job->aio_context == qemu_get_current_aio_context());
>>       assert(job && job->driver && job->driver->run);
>>       job_pause_point(job);
>>       ret = job->driver->run(job, &job->err);
>> @@ -1177,7 +1176,7 @@ void job_start(Job *job)
>>           job->paused = false;
>>           job_state_transition_locked(job, JOB_STATUS_RUNNING);
>>       }
>> -    aio_co_enter(job->aio_context, job->co);
>> +    aio_co_enter(job_get_aio_context(job), job->co);
> 
> Better to use aio_co_schedule here, too, and move it under the previous 
> WITH_JOB_LOCK_GUARD.

Unfortunately this does not work straightforward: aio_co_enter invokes 
aio_co_schedule only if the context is different from the main loop, 
otherwise it can directly enter the coroutine with 
qemu_aio_coroutine_enter. So always replacing it with aio_co_schedule 
breaks the unit tests assumptions, as they expect that when control is 
returned the job has already executed.

A possible solution is to aio_poll() on the condition we want to assert, 
waiting for the bh to be scheduled. But I don't know if this is then 
useful to test something.

Thank you,
Emanuele



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 03/16] job.h: define locked functions
  2022-01-19 10:44   ` Paolo Bonzini
@ 2022-01-21 15:25     ` Emanuele Giuseppe Esposito
  2022-01-21 16:04       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-21 15:25 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow



On 19/01/2022 11:44, Paolo Bonzini wrote:
> On 1/5/22 15:01, Emanuele Giuseppe Esposito wrote:
>> These functions assume that the job lock is held by the
>> caller, to avoid TOC/TOU conditions. Therefore, their
>> name must end with _locked.
>>
>> Introduce also additional helpers that define _locked
>> functions (useful when the job_mutex is globally applied).
>>
>> Note: at this stage, job_{lock/unlock} and job lock guard macros
>> are*nop*.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
> 
> So, this is the only remaining issue: I am not sure about this rename.
> The functions you are changing are
> 
> +void job_txn_unref_locked(JobTxn *txn);
> +void job_txn_add_job_locked(JobTxn *txn, Job *job);
> +void job_ref_locked(Job *job);
> +void job_unref_locked(Job *job);
> +void job_enter_cond_locked(Job *job, bool(*fn)(Job *job));
> +bool job_is_completed_locked(Job *job);
> +bool job_is_ready_locked(Job *job);
> +void job_pause_locked(Job *job);
> +void job_resume_locked(Job *job);
> +void job_user_pause_locked(Job *job, Error **errp);
> +bool job_user_paused_locked(Job *job);
> +void job_user_resume_locked(Job *job, Error **errp);
> +Job *job_next_locked(Job *job);
> +Job *job_get_locked(const char *id);
> +int job_apply_verb_locked(Job *job, JobVerb verb, Error **errp);
> +void job_early_fail_locked(Job *job);
> +void job_complete_locked(Job *job, Error **errp);
> +void job_cancel_locked(Job *job, bool force);
> +void job_user_cancel_locked(Job *job, bool force, Error **errp);
> +int job_cancel_sync_locked(Job *job, bool force);
> +int job_complete_sync_locked(Job *job, Error **errp);
> +void job_finalize_locked(Job *job, Error **errp);
> +void job_dismiss_locked(Job **job, Error **errp);
> +int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
> 
> and most of them (if not all?) will never be called by the job driver, only
> by the monitor.  The two APIs (for driver / for monitor) are quite separate
> and have different locking policies: the monitor needs to take the lock to
> avoid TOC/TOU races, the driver generally can let the API take the lock.
> 
> The rename makes the monitor code heavier, but if you don't do the 
> rename the
> functions in job.c are named very inconsistently.  So I'm inclined to say
> this patch is fine---but I'd like to hear from others as well.
> 
> I think the two APIs should be in two different header files, similar
> to how you did the graph/IO split.

The split was proposed in previous versions, but Vladimir did not really 
like it and suggested to send it as a separate series:

https://patchew.org/QEMU/20211104153121.1362449-1-eesposit@redhat.com/


Vladimir's comment:
https://patchew.org/QEMU/20211104153121.1362449-1-eesposit@redhat.com/

Thank you,
Emanuele

> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 03/16] job.h: define locked functions
  2022-01-21 15:25     ` Emanuele Giuseppe Esposito
@ 2022-01-21 16:04       ` Vladimir Sementsov-Ogievskiy
  2022-01-24 14:26         ` Paolo Bonzini
  0 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-01-21 16:04 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, Paolo Bonzini, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Wen Congyang, Xie Changlong,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	John Snow

21.01.2022 18:25, Emanuele Giuseppe Esposito wrote:
> 
> 
> On 19/01/2022 11:44, Paolo Bonzini wrote:
>> On 1/5/22 15:01, Emanuele Giuseppe Esposito wrote:
>>> These functions assume that the job lock is held by the
>>> caller, to avoid TOC/TOU conditions. Therefore, their
>>> name must end with _locked.
>>>
>>> Introduce also additional helpers that define _locked
>>> functions (useful when the job_mutex is globally applied).
>>>
>>> Note: at this stage, job_{lock/unlock} and job lock guard macros
>>> are*nop*.
>>>
>>> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
>>
>> So, this is the only remaining issue: I am not sure about this rename.
>> The functions you are changing are
>>
>> +void job_txn_unref_locked(JobTxn *txn);
>> +void job_txn_add_job_locked(JobTxn *txn, Job *job);
>> +void job_ref_locked(Job *job);
>> +void job_unref_locked(Job *job);
>> +void job_enter_cond_locked(Job *job, bool(*fn)(Job *job));
>> +bool job_is_completed_locked(Job *job);
>> +bool job_is_ready_locked(Job *job);
>> +void job_pause_locked(Job *job);
>> +void job_resume_locked(Job *job);
>> +void job_user_pause_locked(Job *job, Error **errp);
>> +bool job_user_paused_locked(Job *job);
>> +void job_user_resume_locked(Job *job, Error **errp);
>> +Job *job_next_locked(Job *job);
>> +Job *job_get_locked(const char *id);
>> +int job_apply_verb_locked(Job *job, JobVerb verb, Error **errp);
>> +void job_early_fail_locked(Job *job);
>> +void job_complete_locked(Job *job, Error **errp);
>> +void job_cancel_locked(Job *job, bool force);
>> +void job_user_cancel_locked(Job *job, bool force, Error **errp);
>> +int job_cancel_sync_locked(Job *job, bool force);
>> +int job_complete_sync_locked(Job *job, Error **errp);
>> +void job_finalize_locked(Job *job, Error **errp);
>> +void job_dismiss_locked(Job **job, Error **errp);
>> +int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
>>
>> and most of them (if not all?) will never be called by the job driver, only
>> by the monitor.  The two APIs (for driver / for monitor) are quite separate
>> and have different locking policies: the monitor needs to take the lock to
>> avoid TOC/TOU races, the driver generally can let the API take the lock.
>>
>> The rename makes the monitor code heavier, but if you don't do the rename the
>> functions in job.c are named very inconsistently.  So I'm inclined to say
>> this patch is fine---but I'd like to hear from others as well.
>>
>> I think the two APIs should be in two different header files, similar
>> to how you did the graph/IO split.
> 
> The split was proposed in previous versions, but Vladimir did not really like it and suggested to send it as a separate series:

I didn't really like it as it seemed unusual and unobvious to me. But if we already accepted similar split for generic block layer, no way for me to resist :) And if we follow new logic of generic block layer in jobs, it's not "unusual" any more.

> 
> https://patchew.org/QEMU/20211104153121.1362449-1-eesposit@redhat.com/
> 
> 
> Vladimir's comment:
> https://patchew.org/QEMU/20211104153121.1362449-1-eesposit@redhat.com/
> 
> Thank you,
> Emanuele
> 
>>
>> Paolo
>>
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 14/16] job.c: use job_get_aio_context()
  2022-01-21 12:33     ` Emanuele Giuseppe Esposito
@ 2022-01-21 17:43       ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-21 17:43 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow



On 21/01/2022 13:33, Emanuele Giuseppe Esposito wrote:
> 
> 
> On 19/01/2022 11:31, Paolo Bonzini wrote:
>>> diff --git a/blockjob.c b/blockjob.c
>>> index cf1f49f6c2..468ba735c5 100644
>>> --- a/blockjob.c
>>> +++ b/blockjob.c
>>> @@ -155,14 +155,16 @@ static void child_job_set_aio_ctx(BdrvChild *c, 
>>> AioContext *ctx,
>>>           bdrv_set_aio_context_ignore(sibling->bs, ctx, ignore);
>>>       }
>>> -    job->job.aio_context = ctx;
>>> +    WITH_JOB_LOCK_GUARD() {
>>> +        job->job.aio_context = ctx;
>>> +    }
>>>   }
>>>   static AioContext *child_job_get_parent_aio_context(BdrvChild *c)
>>>   {
>>>       BlockJob *job = c->opaque;
>>> -    return job->job.aio_context;
>>> +    return job_get_aio_context(&job->job);
>>>   }
>>>   static const BdrvChildClass child_job = {
>>
>> Both called with BQL held, I think.
> 
> Yes, as their callbacks .get_parent_aio_context and .set_aio_context are 
> defined as GS functions in block_int-common.h
>>
>>> @@ -218,19 +220,21 @@ int block_job_add_bdrv(BlockJob *job, const 
>>> char *name, BlockDriverState *bs,
>>>   {
>>>       BdrvChild *c;
>>>       bool need_context_ops;
>>> +    AioContext *job_aiocontext;
>>>       assert(qemu_in_main_thread());
>>>       bdrv_ref(bs);
>>> -    need_context_ops = bdrv_get_aio_context(bs) != 
>>> job->job.aio_context;
>>> +    job_aiocontext = job_get_aio_context(&job->job);
>>> +    need_context_ops = bdrv_get_aio_context(bs) != job_aiocontext;
>>> -    if (need_context_ops && job->job.aio_context != 
>>> qemu_get_aio_context()) {
>>> -        aio_context_release(job->job.aio_context);
>>> +    if (need_context_ops && job_aiocontext != qemu_get_aio_context()) {
>>> +        aio_context_release(job_aiocontext);
>>>       }
>>>       c = bdrv_root_attach_child(bs, name, &child_job, 0, perm, 
>>> shared_perm, job,
>>>                                  errp);
>>> -    if (need_context_ops && job->job.aio_context != 
>>> qemu_get_aio_context()) {
>>> -        aio_context_acquire(job->job.aio_context);
>>> +    if (need_context_ops && job_aiocontext != qemu_get_aio_context()) {
>>> +        aio_context_acquire(job_aiocontext);
>>>       }
>>>       if (c == NULL) {
>>>           return -EPERM;
>>
>> BQL held, too.
> 
> Wouldn't it be better to keep job_get_aio_context and implement like this:
> 
> AioContext *job_get_aio_context(Job *job)
> {
>      /*
>       * Job AioContext can be written under BQL+job_mutex,
>       * but can be read with just the BQL held.
>       */
>      assert(qemu_in_main_thread());
>      return job->aio_context;
> }

Uhm ok this one doesn't really work, because it's ok to read it under 
either BQL or job lock. So I will get rid of job_get_aio_context, but 
add job_set_aio_context (and use it in child_job_set_aio_ctx).

Emanuele
> 
> and instead job_set_aio_context:
> 
> void job_set_aio_context(Job *job, AioContext *ctx)
> {
>      JOB_LOCK_GUARD();
>      assert(qemu_in_main_thread());
>      job->aio_context = ctx;
> }
> 
> (obviously implement also _locked version, if needed, and probably move 
> the comment in get_aio_context in job.h).
> 
> Thank you,
> Emanuele



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 14/16] job.c: use job_get_aio_context()
  2022-01-21 15:18     ` Emanuele Giuseppe Esposito
@ 2022-01-24 14:22       ` Paolo Bonzini
  2022-01-26 15:58         ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2022-01-24 14:22 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow

On 1/21/22 16:18, Emanuele Giuseppe Esposito wrote:
>>>
>>
>> Better to use aio_co_schedule here, too, and move it under the 
>> previous WITH_JOB_LOCK_GUARD.
> 
> Unfortunately this does not work straightforward: aio_co_enter invokes 
> aio_co_schedule only if the context is different from the main loop, 
> otherwise it can directly enter the coroutine with 
> qemu_aio_coroutine_enter. So always replacing it with aio_co_schedule 
> breaks the unit tests assumptions, as they expect that when control is 
> returned the job has already executed.
> 
> A possible solution is to aio_poll() on the condition we want to assert, 
> waiting for the bh to be scheduled. But I don't know if this is then 
> useful to test something.

I think you sorted that out, based on IRC conversation?

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 03/16] job.h: define locked functions
  2022-01-21 16:04       ` Vladimir Sementsov-Ogievskiy
@ 2022-01-24 14:26         ` Paolo Bonzini
  2022-01-26 15:58           ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2022-01-24 14:26 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Wen Congyang, Xie Changlong, qemu-devel,
	Markus Armbruster, Hanna Reitz, Stefan Hajnoczi, John Snow

On 1/21/22 17:04, Vladimir Sementsov-Ogievskiy wrote:
>>
>> The split was proposed in previous versions, but Vladimir did not 
>> really like it and suggested to send it as a separate series:
> 
> I didn't really like it as it seemed unusual and unobvious to me. But if 
> we already accepted similar split for generic block layer, no way for me 
> to resist :) And if we follow new logic of generic block layer in jobs, 
> it's not "unusual" any more.

Either way I think it's okay to have it as a follow-up.  The explicit 
naming in the API is a bit verbose but definitely clearer, so it's okay 
to order different than the graph/IO split.  In that case we weren't 
even sure, until you went through all the testcase failures, that a 
_locked or rather "_drained" API was possible.

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 14/16] job.c: use job_get_aio_context()
  2022-01-24 14:22       ` Paolo Bonzini
@ 2022-01-26 15:58         ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-26 15:58 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow



On 24/01/2022 15:22, Paolo Bonzini wrote:
> On 1/21/22 16:18, Emanuele Giuseppe Esposito wrote:
>>>>
>>>
>>> Better to use aio_co_schedule here, too, and move it under the
>>> previous WITH_JOB_LOCK_GUARD.
>>
>> Unfortunately this does not work straightforward: aio_co_enter invokes
>> aio_co_schedule only if the context is different from the main loop,
>> otherwise it can directly enter the coroutine with
>> qemu_aio_coroutine_enter. So always replacing it with aio_co_schedule
>> breaks the unit tests assumptions, as they expect that when control is
>> returned the job has already executed.
>>
>> A possible solution is to aio_poll() on the condition we want to
>> assert, waiting for the bh to be scheduled. But I don't know if this
>> is then useful to test something.
> 
> I think you sorted that out, based on IRC conversation?
> 

Yes.

Thank you,
Emanuele



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 03/16] job.h: define locked functions
  2022-01-24 14:26         ` Paolo Bonzini
@ 2022-01-26 15:58           ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-26 15:58 UTC (permalink / raw)
  To: Paolo Bonzini, Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Wen Congyang, Xie Changlong, qemu-devel,
	Markus Armbruster, Hanna Reitz, Stefan Hajnoczi, John Snow



On 24/01/2022 15:26, Paolo Bonzini wrote:
> On 1/21/22 17:04, Vladimir Sementsov-Ogievskiy wrote:
>>>
>>> The split was proposed in previous versions, but Vladimir did not
>>> really like it and suggested to send it as a separate series:
>>
>> I didn't really like it as it seemed unusual and unobvious to me. But
>> if we already accepted similar split for generic block layer, no way
>> for me to resist :) And if we follow new logic of generic block layer
>> in jobs, it's not "unusual" any more.
> 
> Either way I think it's okay to have it as a follow-up.  The explicit
> naming in the API is a bit verbose but definitely clearer, so it's okay
> to order different than the graph/IO split.  In that case we weren't
> even sure, until you went through all the testcase failures, that a
> _locked or rather "_drained" API was possible.
> 
> Paolo
> 

Ok, I will send the split in a separate series.

Thank you,
Emanuele



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 09/16] jobs: remove aiocontext locks since the functions are under BQL
  2022-01-19 11:09   ` Paolo Bonzini
@ 2022-01-26 16:18     ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 38+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-01-26 16:18 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, John Snow



On 19/01/2022 12:09, Paolo Bonzini wrote:
>> @@ -3707,15 +3707,11 @@ BlockJobInfoList *qmp_query_block_jobs(Error
>> **errp)
>>         for (job = block_job_next(NULL); job; job =
>> block_job_next(job)) {
>>           BlockJobInfo *value;
>> -        AioContext *aio_context;
>>             if (block_job_is_internal(job)) {
>>               continue;
>>           }
> 
> block_job_next, block_job_query, etc. do not have the _locked suffix. Is
> this because all block_job_ functions need the job_mutex held, or just
> laziness? :)
> 

I wasn't really sure whether to touch that API naming or not (+ laziness
:D )

But it makes sense to add _locked also there. Will do.

Thank you,
Emanuele



^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2022-01-26 16:26 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-05 14:01 [PATCH v3 00/16] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
2022-01-05 14:01 ` [PATCH v3 01/16] job.c: make job_mutex and job_lock/unlock() public Emanuele Giuseppe Esposito
2022-01-19  9:56   ` Paolo Bonzini
2022-01-19 11:13     ` Paolo Bonzini
2022-01-05 14:01 ` [PATCH v3 02/16] job.h: categorize fields in struct Job Emanuele Giuseppe Esposito
2022-01-19  9:57   ` Paolo Bonzini
2022-01-05 14:01 ` [PATCH v3 03/16] job.h: define locked functions Emanuele Giuseppe Esposito
2022-01-19 10:44   ` Paolo Bonzini
2022-01-21 15:25     ` Emanuele Giuseppe Esposito
2022-01-21 16:04       ` Vladimir Sementsov-Ogievskiy
2022-01-24 14:26         ` Paolo Bonzini
2022-01-26 15:58           ` Emanuele Giuseppe Esposito
2022-01-05 14:01 ` [PATCH v3 04/16] job.h: define unlocked functions Emanuele Giuseppe Esposito
2022-01-05 14:01 ` [PATCH v3 05/16] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU Emanuele Giuseppe Esposito
2022-01-19 11:06   ` Paolo Bonzini
2022-01-05 14:01 ` [PATCH v3 06/16] job.c: make job_event_* functions static Emanuele Giuseppe Esposito
2022-01-05 14:01 ` [PATCH v3 07/16] job.c: move inner aiocontext lock in callbacks Emanuele Giuseppe Esposito
2022-01-05 14:02 ` [PATCH v3 08/16] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-01-05 14:02 ` [PATCH v3 09/16] jobs: remove aiocontext locks since the functions are under BQL Emanuele Giuseppe Esposito
2022-01-19 11:09   ` Paolo Bonzini
2022-01-26 16:18     ` Emanuele Giuseppe Esposito
2022-01-05 14:02 ` [PATCH v3 10/16] jobs: protect jobs with job_lock/unlock Emanuele Giuseppe Esposito
2022-01-19 10:50   ` Paolo Bonzini
2022-01-05 14:02 ` [PATCH v3 11/16] jobs: document all static functions and add _locked() suffix Emanuele Giuseppe Esposito
2022-01-05 14:02 ` [PATCH v3 12/16] jobs: use job locks and helpers also in the unit tests Emanuele Giuseppe Esposito
2022-01-05 14:02 ` [PATCH v3 13/16] jobs: add job lock in find_* functions Emanuele Giuseppe Esposito
2022-01-05 14:02 ` [PATCH v3 14/16] job.c: use job_get_aio_context() Emanuele Giuseppe Esposito
2022-01-19 10:31   ` Paolo Bonzini
2022-01-21 12:33     ` Emanuele Giuseppe Esposito
2022-01-21 17:43       ` Emanuele Giuseppe Esposito
2022-01-21 15:18     ` Emanuele Giuseppe Esposito
2022-01-24 14:22       ` Paolo Bonzini
2022-01-26 15:58         ` Emanuele Giuseppe Esposito
2022-01-05 14:02 ` [PATCH v3 15/16] job.c: enable job lock/unlock and remove Aiocontext locks Emanuele Giuseppe Esposito
2022-01-19 10:35   ` Paolo Bonzini
2022-01-05 14:02 ` [PATCH v3 16/16] block_job_query: remove atomic read Emanuele Giuseppe Esposito
2022-01-19 10:34   ` Paolo Bonzini
2022-01-19 11:15 ` [PATCH v3 00/16] job: replace AioContext lock with job_mutex Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.