All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 00/18] job: replace AioContext lock with job_mutex
@ 2022-06-16 13:18 Emanuele Giuseppe Esposito
  2022-06-16 13:18 ` [PATCH v7 01/18] job.c: make job_mutex and job_lock/unlock() public Emanuele Giuseppe Esposito
                   ` (17 more replies)
  0 siblings, 18 replies; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-16 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel,
	Emanuele Giuseppe Esposito

In this series, we want to remove the AioContext lock and instead
use the already existent job_mutex to protect the job structures
and list. This is part of the work to get rid of AioContext lock
usage in favour of smaller granularity locks.

In order to simplify reviewer's job, job lock/unlock functions and
macros are added as empty prototypes (nop) in patch 1.
They are converted to use the actual job mutex only in the last
patch. In this way we can freely create locking sections
without worrying about deadlocks with the aiocontext lock.

Patch 2 defines what fields in the job structure need protection.
Patches 3-6 are in preparation to the job locks, moving functions
from global to static and introducing helpers.

Patch 7-9 introduce the (nop) job lock into the job API and
its users, and patches 10-13 categorize respectively locked and unlocked functions in the job API.

Patches 14-17 take care of protecting job->aio_context, and
finally patch 18 makes the prototypes in patch 1 use the
job_mutex and removes all aiocontext lock at the same time.

Tested this series by running unit tests, qemu-iotests and qtests
(x86_64).

---
v7:
* s/temporary/temporarly
* double identical locking comment to the same function
* patch 2: add "Protected by AioContext lock" to better categorize fields in
  job.h
* use same comment style in all function headers ("Just like {funct}, but
  called between job_lock and job_unlock")

v6:
* patch 4 and 6 squashed with patch 19 (enable job lock and
  reduce/remove AioContext lock)
* patch 19: job_unref_locked read the aiocontext inside the
  job lock.

v5:
* just restructured patches a little bit better, as there were
  functions used before they were defined.
* rebased on kwolf/block branch and API split serie

v4:
* move "protected by job_mutex" from patch 2 to 15, where the job_mutex is
  actually added.
* s/aio_co_enter/aio_co_schedule in job.c, and adjust tests accordingly.
* remove job_get_aio_context, add job_set_aio_context. Use "fake rwlock"
  to protect job->aiocontext.
* get rid of useless getters method, namely:
  job_get_status
  job_get_pause_count
  job_get_paused
  job_get_busy
  They are all used only by tests, and such getter is pretty useless.
  Replace with job_lock(); assert(); job_unlock();
* use job lock macros instead of job lock/unlock in unit tests.
* convert also blockjob functions to have _locked
* put the job_lock/unlock patches before the _locked ones
* replace aio_co_enter in job.c and detect change of context

v3:
* add "_locked" suffix to the functions called under job_mutex lock
* rename _job_lock in real_job_lock
* job_mutex is now public, and drivers like monitor use it directly
* introduce and protect job_get_aio_context
* remove mirror-specific APIs and just use WITH_JOB_GUARD
* more extensive use of WITH_JOB_GUARD and JOB_LOCK_GUARD

RFC v2:
* use JOB_LOCK_GUARD and WITH_JOB_LOCK_GUARD
* mu(u)ltiple typos in commit messages
* job API split patches are sent separately in another series
* use of empty job_{lock/unlock} and JOB_LOCK_GUARD/WITH_JOB_LOCK_GUARD
  to avoid deadlocks and simplify the reviewer job
* move patch 11 (block_job_query: remove atomic read) as last

Emanuele Giuseppe Esposito (17):
  job.c: make job_mutex and job_lock/unlock() public
  job.h: categorize fields in struct Job
  job.c: API functions not used outside should be static
  aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED
  job.h: add _locked duplicates for job API functions called with and
    without job_mutex
  jobs: protect jobs with job_lock/unlock
  jobs: add job lock in find_* functions
  jobs: use job locks also in the unit tests
  block/mirror.c: use of job helpers in drivers to avoid TOC/TOU
  jobs: rename static functions called with job_mutex held
  job.h: rename job API functions called with job_mutex held
  block_job: rename block_job functions called with job_mutex held
  job.h: define unlocked functions
  commit and mirror: create new nodes using bdrv_get_aio_context, and
    not the job aiocontext
  jobs: protect job.aio_context with BQL and job_mutex
  job.c: enable job lock/unlock and remove Aiocontext locks
  block_job_query: remove atomic read

Paolo Bonzini (1):
  job: detect change of aiocontext within job coroutine

 block.c                          |  19 +-
 block/backup.c                   |   4 +-
 block/commit.c                   |   4 +-
 block/mirror.c                   |  21 +-
 block/replication.c              |  10 +-
 blockdev.c                       | 143 +++----
 blockjob.c                       | 126 +++---
 include/block/aio-wait.h         |  17 +-
 include/block/blockjob.h         |  29 +-
 include/qemu/job.h               | 303 +++++++++-----
 job-qmp.c                        |  87 ++--
 job.c                            | 658 +++++++++++++++++++------------
 monitor/qmp-cmds.c               |   7 +-
 qemu-img.c                       |  41 +-
 tests/unit/test-bdrv-drain.c     |  80 ++--
 tests/unit/test-block-iothread.c |   8 +-
 tests/unit/test-blockjob-txn.c   |  32 +-
 tests/unit/test-blockjob.c       | 113 ++++--
 18 files changed, 1034 insertions(+), 668 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 01/18] job.c: make job_mutex and job_lock/unlock() public
  2022-06-16 13:18 [PATCH v7 00/18] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
@ 2022-06-16 13:18 ` Emanuele Giuseppe Esposito
  2022-06-21 13:47   ` Vladimir Sementsov-Ogievskiy
  2022-06-24 18:22   ` Vladimir Sementsov-Ogievskiy
  2022-06-16 13:18 ` [PATCH v7 02/18] job.h: categorize fields in struct Job Emanuele Giuseppe Esposito
                   ` (16 subsequent siblings)
  17 siblings, 2 replies; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-16 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel,
	Emanuele Giuseppe Esposito

job mutex will be used to protect the job struct elements and list,
replacing AioContext locks.

Right now use a shared lock for all jobs, in order to keep things
simple. Once the AioContext lock is gone, we can introduce per-job
locks.

To simplify the switch from aiocontext to job lock, introduce
*nop* lock/unlock functions and macros.
We want to always call job_lock/unlock outside the AioContext locks,
and not vice-versa, otherwise we might get a deadlock. This is not
straightforward to do, and that's why we start with nop functions.
Once everything is protected by job_lock/unlock, we can change the nop into
an actual mutex and remove the aiocontext lock.

Since job_mutex is already being used, add static
real_job_{lock/unlock} for the existing usage.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/qemu/job.h | 24 ++++++++++++++++++++++++
 job.c              | 35 +++++++++++++++++++++++------------
 2 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index c105b31076..d1192ffd61 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -303,6 +303,30 @@ typedef enum JobCreateFlags {
     JOB_MANUAL_DISMISS = 0x04,
 } JobCreateFlags;
 
+extern QemuMutex job_mutex;
+
+#define JOB_LOCK_GUARD() /* QEMU_LOCK_GUARD(&job_mutex) */
+
+#define WITH_JOB_LOCK_GUARD() /* WITH_QEMU_LOCK_GUARD(&job_mutex) */
+
+/**
+ * job_lock:
+ *
+ * Take the mutex protecting the list of jobs and their status.
+ * Most functions called by the monitor need to call job_lock
+ * and job_unlock manually.  On the other hand, function called
+ * by the block jobs themselves and by the block layer will take the
+ * lock for you.
+ */
+void job_lock(void);
+
+/**
+ * job_unlock:
+ *
+ * Release the mutex protecting the list of jobs and their status.
+ */
+void job_unlock(void);
+
 /**
  * Allocate and return a new job transaction. Jobs can be added to the
  * transaction using job_txn_add_job().
diff --git a/job.c b/job.c
index 075c6f3a20..2b4ffca9d4 100644
--- a/job.c
+++ b/job.c
@@ -32,6 +32,12 @@
 #include "trace/trace-root.h"
 #include "qapi/qapi-events-job.h"
 
+/*
+ * job_mutex protects the jobs list, but also makes the
+ * struct job fields thread-safe.
+ */
+QemuMutex job_mutex;
+
 static QLIST_HEAD(, Job) jobs = QLIST_HEAD_INITIALIZER(jobs);
 
 /* Job State Transition Table */
@@ -74,17 +80,22 @@ struct JobTxn {
     int refcnt;
 };
 
-/* Right now, this mutex is only needed to synchronize accesses to job->busy
- * and job->sleep_timer, such as concurrent calls to job_do_yield and
- * job_enter. */
-static QemuMutex job_mutex;
+void job_lock(void)
+{
+    /* nop */
+}
+
+void job_unlock(void)
+{
+    /* nop */
+}
 
-static void job_lock(void)
+static void real_job_lock(void)
 {
     qemu_mutex_lock(&job_mutex);
 }
 
-static void job_unlock(void)
+static void real_job_unlock(void)
 {
     qemu_mutex_unlock(&job_mutex);
 }
@@ -450,21 +461,21 @@ void job_enter_cond(Job *job, bool(*fn)(Job *job))
         return;
     }
 
-    job_lock();
+    real_job_lock();
     if (job->busy) {
-        job_unlock();
+        real_job_unlock();
         return;
     }
 
     if (fn && !fn(job)) {
-        job_unlock();
+        real_job_unlock();
         return;
     }
 
     assert(!job->deferred_to_main_loop);
     timer_del(&job->sleep_timer);
     job->busy = true;
-    job_unlock();
+    real_job_unlock();
     aio_co_enter(job->aio_context, job->co);
 }
 
@@ -481,13 +492,13 @@ void job_enter(Job *job)
  * called explicitly. */
 static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
 {
-    job_lock();
+    real_job_lock();
     if (ns != -1) {
         timer_mod(&job->sleep_timer, ns);
     }
     job->busy = false;
     job_event_idle(job);
-    job_unlock();
+    real_job_unlock();
     qemu_coroutine_yield();
 
     /* Set by job_enter_cond() before re-entering the coroutine.  */
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 02/18] job.h: categorize fields in struct Job
  2022-06-16 13:18 [PATCH v7 00/18] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
  2022-06-16 13:18 ` [PATCH v7 01/18] job.c: make job_mutex and job_lock/unlock() public Emanuele Giuseppe Esposito
@ 2022-06-16 13:18 ` Emanuele Giuseppe Esposito
  2022-06-21 14:29   ` Vladimir Sementsov-Ogievskiy
  2022-06-16 13:18 ` [PATCH v7 03/18] job.c: API functions not used outside should be static Emanuele Giuseppe Esposito
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-16 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel,
	Emanuele Giuseppe Esposito

Categorize the fields in struct Job to understand which ones
need to be protected by the job mutex and which don't.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 include/qemu/job.h | 61 +++++++++++++++++++++++++++-------------------
 1 file changed, 36 insertions(+), 25 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index d1192ffd61..876e13d549 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -40,27 +40,52 @@ typedef struct JobTxn JobTxn;
  * Long-running operation.
  */
 typedef struct Job {
+
+    /* Fields set at initialization (job_create), and never modified */
+
     /** The ID of the job. May be NULL for internal jobs. */
     char *id;
 
-    /** The type of this job. */
+    /**
+     * The type of this job.
+     * All callbacks are called with job_mutex *not* held.
+     */
     const JobDriver *driver;
 
-    /** Reference count of the block job */
-    int refcnt;
-
-    /** Current state; See @JobStatus for details. */
-    JobStatus status;
-
-    /** AioContext to run the job coroutine in */
-    AioContext *aio_context;
-
     /**
      * The coroutine that executes the job.  If not NULL, it is reentered when
      * busy is false and the job is cancelled.
+     * Initialized in job_start()
      */
     Coroutine *co;
 
+    /** True if this job should automatically finalize itself */
+    bool auto_finalize;
+
+    /** True if this job should automatically dismiss itself */
+    bool auto_dismiss;
+
+    /** The completion function that will be called when the job completes.  */
+    BlockCompletionFunc *cb;
+
+    /** The opaque value that is passed to the completion function.  */
+    void *opaque;
+
+    /* ProgressMeter API is thread-safe */
+    ProgressMeter progress;
+
+
+    /** Protected by AioContext lock */
+
+    /** AioContext to run the job coroutine in */
+    AioContext *aio_context;
+
+    /** Reference count of the block job */
+    int refcnt;
+
+    /** Current state; See @JobStatus for details. */
+    JobStatus status;
+
     /**
      * Timer that is used by @job_sleep_ns. Accessed under job_mutex (in
      * job.c).
@@ -112,14 +137,6 @@ typedef struct Job {
     /** Set to true when the job has deferred work to the main loop. */
     bool deferred_to_main_loop;
 
-    /** True if this job should automatically finalize itself */
-    bool auto_finalize;
-
-    /** True if this job should automatically dismiss itself */
-    bool auto_dismiss;
-
-    ProgressMeter progress;
-
     /**
      * Return code from @run and/or @prepare callback(s).
      * Not final until the job has reached the CONCLUDED status.
@@ -134,12 +151,6 @@ typedef struct Job {
      */
     Error *err;
 
-    /** The completion function that will be called when the job completes.  */
-    BlockCompletionFunc *cb;
-
-    /** The opaque value that is passed to the completion function.  */
-    void *opaque;
-
     /** Notifiers called when a cancelled job is finalised */
     NotifierList on_finalize_cancelled;
 
@@ -167,6 +178,7 @@ typedef struct Job {
 
 /**
  * Callbacks and other information about a Job driver.
+ * All callbacks are invoked with job_mutex *not* held.
  */
 struct JobDriver {
 
@@ -472,7 +484,6 @@ void job_yield(Job *job);
  */
 void coroutine_fn job_sleep_ns(Job *job, int64_t ns);
 
-
 /** Returns the JobType of a given Job. */
 JobType job_type(const Job *job);
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 03/18] job.c: API functions not used outside should be static
  2022-06-16 13:18 [PATCH v7 00/18] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
  2022-06-16 13:18 ` [PATCH v7 01/18] job.c: make job_mutex and job_lock/unlock() public Emanuele Giuseppe Esposito
  2022-06-16 13:18 ` [PATCH v7 02/18] job.h: categorize fields in struct Job Emanuele Giuseppe Esposito
@ 2022-06-16 13:18 ` Emanuele Giuseppe Esposito
  2022-06-21 14:34   ` Vladimir Sementsov-Ogievskiy
  2022-06-16 13:18 ` [PATCH v7 04/18] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-16 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel,
	Emanuele Giuseppe Esposito

job_event_* functions can all be static, as they are not used
outside job.c.

Same applies for job_txn_add_job().

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 include/qemu/job.h | 18 ------------------
 job.c              | 22 +++++++++++++++++++---
 2 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 876e13d549..4b64eb15f7 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -358,18 +358,6 @@ JobTxn *job_txn_new(void);
  */
 void job_txn_unref(JobTxn *txn);
 
-/**
- * @txn: The transaction (may be NULL)
- * @job: Job to add to the transaction
- *
- * Add @job to the transaction.  The @job must not already be in a transaction.
- * The caller must call either job_txn_unref() or job_completed() to release
- * the reference that is automatically grabbed here.
- *
- * If @txn is NULL, the function does nothing.
- */
-void job_txn_add_job(JobTxn *txn, Job *job);
-
 /**
  * Create a new long-running job and return it.
  *
@@ -431,12 +419,6 @@ void job_progress_set_remaining(Job *job, uint64_t remaining);
  */
 void job_progress_increase_remaining(Job *job, uint64_t delta);
 
-/** To be called when a cancelled job is finalised. */
-void job_event_cancelled(Job *job);
-
-/** To be called when a successfully completed job is finalised. */
-void job_event_completed(Job *job);
-
 /**
  * Conditionally enter the job coroutine if the job is ready to run, not
  * already busy and fn() returns true. fn() is called while under the job_lock
diff --git a/job.c b/job.c
index 2b4ffca9d4..cafd597ba4 100644
--- a/job.c
+++ b/job.c
@@ -125,7 +125,17 @@ void job_txn_unref(JobTxn *txn)
     }
 }
 
-void job_txn_add_job(JobTxn *txn, Job *job)
+/**
+ * @txn: The transaction (may be NULL)
+ * @job: Job to add to the transaction
+ *
+ * Add @job to the transaction.  The @job must not already be in a transaction.
+ * The caller must call either job_txn_unref() or job_completed() to release
+ * the reference that is automatically grabbed here.
+ *
+ * If @txn is NULL, the function does nothing.
+ */
+static void job_txn_add_job(JobTxn *txn, Job *job)
 {
     if (!txn) {
         return;
@@ -427,12 +437,18 @@ void job_progress_increase_remaining(Job *job, uint64_t delta)
     progress_increase_remaining(&job->progress, delta);
 }
 
-void job_event_cancelled(Job *job)
+/**
+ * To be called when a cancelled job is finalised.
+ */
+static void job_event_cancelled(Job *job)
 {
     notifier_list_notify(&job->on_finalize_cancelled, job);
 }
 
-void job_event_completed(Job *job)
+/**
+ * To be called when a successfully completed job is finalised.
+ */
+static void job_event_completed(Job *job)
 {
     notifier_list_notify(&job->on_finalize_completed, job);
 }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 04/18] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED
  2022-06-16 13:18 [PATCH v7 00/18] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (2 preceding siblings ...)
  2022-06-16 13:18 ` [PATCH v7 03/18] job.c: API functions not used outside should be static Emanuele Giuseppe Esposito
@ 2022-06-16 13:18 ` Emanuele Giuseppe Esposito
  2022-06-21 14:40   ` Vladimir Sementsov-Ogievskiy
  2022-06-16 13:18 ` [PATCH v7 05/18] job.h: add _locked duplicates for job API functions called with and without job_mutex Emanuele Giuseppe Esposito
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-16 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel,
	Emanuele Giuseppe Esposito

Same as AIO_WAIT_WHILE macro, but if we are in the Main loop
do not release and then acquire ctx_ 's aiocontext.

Once all Aiocontext locks go away, this macro will replace
AIO_WAIT_WHILE.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 include/block/aio-wait.h | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index 54840f8622..a61f82c617 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -59,10 +59,13 @@ typedef struct {
 extern AioWait global_aio_wait;
 
 /**
- * AIO_WAIT_WHILE:
+ * _AIO_WAIT_WHILE:
  * @ctx: the aio context, or NULL if multiple aio contexts (for which the
  *       caller does not hold a lock) are involved in the polling condition.
  * @cond: wait while this conditional expression is true
+ * @unlock: whether to unlock and then lock again @ctx. This apples
+ * only when waiting for another AioContext from the main loop.
+ * Otherwise it's ignored.
  *
  * Wait while a condition is true.  Use this to implement synchronous
  * operations that require event loop activity.
@@ -75,7 +78,7 @@ extern AioWait global_aio_wait;
  * wait on conditions between two IOThreads since that could lead to deadlock,
  * go via the main loop instead.
  */
-#define AIO_WAIT_WHILE(ctx, cond) ({                               \
+#define _AIO_WAIT_WHILE(ctx, cond, unlock) ({                      \
     bool waited_ = false;                                          \
     AioWait *wait_ = &global_aio_wait;                             \
     AioContext *ctx_ = (ctx);                                      \
@@ -92,11 +95,11 @@ extern AioWait global_aio_wait;
         assert(qemu_get_current_aio_context() ==                   \
                qemu_get_aio_context());                            \
         while ((cond)) {                                           \
-            if (ctx_) {                                            \
+            if (unlock && ctx_) {                                  \
                 aio_context_release(ctx_);                         \
             }                                                      \
             aio_poll(qemu_get_aio_context(), true);                \
-            if (ctx_) {                                            \
+            if (unlock && ctx_) {                                  \
                 aio_context_acquire(ctx_);                         \
             }                                                      \
             waited_ = true;                                        \
@@ -105,6 +108,12 @@ extern AioWait global_aio_wait;
     qatomic_dec(&wait_->num_waiters);                              \
     waited_; })
 
+#define AIO_WAIT_WHILE(ctx, cond)                                  \
+    _AIO_WAIT_WHILE(ctx, cond, true)
+
+#define AIO_WAIT_WHILE_UNLOCKED(ctx, cond)                         \
+    _AIO_WAIT_WHILE(ctx, cond, false)
+
 /**
  * aio_wait_kick:
  * Wake up the main thread if it is waiting on AIO_WAIT_WHILE().  During
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 05/18] job.h: add _locked duplicates for job API functions called with and without job_mutex
  2022-06-16 13:18 [PATCH v7 00/18] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (3 preceding siblings ...)
  2022-06-16 13:18 ` [PATCH v7 04/18] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
@ 2022-06-16 13:18 ` Emanuele Giuseppe Esposito
  2022-06-21 15:03   ` Vladimir Sementsov-Ogievskiy
  2022-06-16 13:18 ` [PATCH v7 06/18] jobs: protect jobs with job_lock/unlock Emanuele Giuseppe Esposito
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-16 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel,
	Emanuele Giuseppe Esposito

In preparation to the job_lock/unlock usage, create _locked
duplicates of some functions, since they will be sometimes called with
job_mutex held (mostly within job.c),
and sometimes without (mostly from JobDrivers using the job API).

Therefore create a _locked version of such function, so that it
can be used in both cases.

List of functions duplicated as _locked:
job_is_ready (both versions are public)
job_is_completed (both versions are public)
job_is_cancelled (_locked version is public, needed by mirror.c)
job_pause_point (_locked version is static, purely done to simplify the code)
job_cancel_requested (_locked version is static)

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 include/qemu/job.h | 25 +++++++++++++++++++++---
 job.c              | 48 ++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 64 insertions(+), 9 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 4b64eb15f7..275d593715 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -475,21 +475,40 @@ const char *job_type_str(const Job *job);
 /** Returns true if the job should not be visible to the management layer. */
 bool job_is_internal(Job *job);
 
-/** Returns whether the job is being cancelled. */
+/**
+ * Returns whether the job is being cancelled.
+ * Called with job_mutex *not* held.
+ */
 bool job_is_cancelled(Job *job);
 
+/** Just like job_is_cancelled, but called between job_lock and job_unlock */
+bool job_is_cancelled_locked(Job *job);
+
 /**
  * Returns whether the job is scheduled for cancellation (at an
  * indefinite point).
+ * Called with job_mutex *not* held.
  */
 bool job_cancel_requested(Job *job);
 
-/** Returns whether the job is in a completed state. */
+/**
+ * Returns whether the job is in a completed state.
+ * Called with job_mutex *not* held.
+ */
 bool job_is_completed(Job *job);
 
-/** Returns whether the job is ready to be completed. */
+/** Just like job_is_completed, but called between job_lock and job_unlock */
+bool job_is_completed_locked(Job *job);
+
+/**
+ * Returns whether the job is ready to be completed.
+ * Called with job_mutex *not* held.
+ */
 bool job_is_ready(Job *job);
 
+/** Just like job_is_ready, but called between job_lock and job_unlock */
+bool job_is_ready_locked(Job *job);
+
 /**
  * Request @job to pause at the next pause point. Must be paired with
  * job_resume(). If the job is supposed to be resumed by user action, call
diff --git a/job.c b/job.c
index cafd597ba4..c4776985c4 100644
--- a/job.c
+++ b/job.c
@@ -236,19 +236,32 @@ const char *job_type_str(const Job *job)
     return JobType_str(job_type(job));
 }
 
-bool job_is_cancelled(Job *job)
+bool job_is_cancelled_locked(Job *job)
 {
     /* force_cancel may be true only if cancelled is true, too */
     assert(job->cancelled || !job->force_cancel);
     return job->force_cancel;
 }
 
-bool job_cancel_requested(Job *job)
+bool job_is_cancelled(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job_is_cancelled_locked(job);
+}
+
+/* Called with job_mutex held. */
+static bool job_cancel_requested_locked(Job *job)
 {
     return job->cancelled;
 }
 
-bool job_is_ready(Job *job)
+bool job_cancel_requested(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job_cancel_requested_locked(job);
+}
+
+bool job_is_ready_locked(Job *job)
 {
     switch (job->status) {
     case JOB_STATUS_UNDEFINED:
@@ -270,7 +283,13 @@ bool job_is_ready(Job *job)
     return false;
 }
 
-bool job_is_completed(Job *job)
+bool job_is_ready(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job_is_ready_locked(job);
+}
+
+bool job_is_completed_locked(Job *job)
 {
     switch (job->status) {
     case JOB_STATUS_UNDEFINED:
@@ -292,6 +311,12 @@ bool job_is_completed(Job *job)
     return false;
 }
 
+bool job_is_completed(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job_is_completed_locked(job);
+}
+
 static bool job_started(Job *job)
 {
     return job->co;
@@ -521,7 +546,8 @@ static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
     assert(job->busy);
 }
 
-void coroutine_fn job_pause_point(Job *job)
+/* Called with job_mutex held, but releases it temporarily. */
+static void coroutine_fn job_pause_point_locked(Job *job)
 {
     assert(job && job_started(job));
 
@@ -552,6 +578,12 @@ void coroutine_fn job_pause_point(Job *job)
     }
 }
 
+void coroutine_fn job_pause_point(Job *job)
+{
+    JOB_LOCK_GUARD();
+    job_pause_point_locked(job);
+}
+
 void job_yield(Job *job)
 {
     assert(job->busy);
@@ -949,11 +981,15 @@ static void job_completed(Job *job)
     }
 }
 
-/** Useful only as a type shim for aio_bh_schedule_oneshot. */
+/**
+ * Useful only as a type shim for aio_bh_schedule_oneshot.
+ * Called with job_mutex *not* held.
+ */
 static void job_exit(void *opaque)
 {
     Job *job = (Job *)opaque;
     AioContext *ctx;
+    JOB_LOCK_GUARD();
 
     job_ref(job);
     aio_context_acquire(job->aio_context);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 06/18] jobs: protect jobs with job_lock/unlock
  2022-06-16 13:18 [PATCH v7 00/18] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (4 preceding siblings ...)
  2022-06-16 13:18 ` [PATCH v7 05/18] job.h: add _locked duplicates for job API functions called with and without job_mutex Emanuele Giuseppe Esposito
@ 2022-06-16 13:18 ` Emanuele Giuseppe Esposito
  2022-06-21 16:47   ` Vladimir Sementsov-Ogievskiy
  2022-06-21 17:09   ` Vladimir Sementsov-Ogievskiy
  2022-06-16 13:18 ` [PATCH v7 07/18] jobs: add job lock in find_* functions Emanuele Giuseppe Esposito
                   ` (11 subsequent siblings)
  17 siblings, 2 replies; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-16 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel,
	Emanuele Giuseppe Esposito

Introduce the job locking mechanism through the whole job API,
following the comments  in job.h and requirements of job-monitor
(like the functions in job-qmp.c, assume lock is held) and
job-driver (like in mirror.c and all other JobDriver, lock is not held).

Use the _locked helpers introduced before to differentiate
between functions called with and without job_mutex.
This only applies to function that are called under both
cases, all the others will be renamed later.

job_{lock/unlock} is independent from real_job_{lock/unlock}.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block.c             |  18 ++++---
 block/replication.c |   8 ++-
 blockdev.c          |  17 ++++--
 blockjob.c          |  56 +++++++++++++-------
 job-qmp.c           |   2 +
 job.c               | 125 +++++++++++++++++++++++++++++++-------------
 monitor/qmp-cmds.c  |   6 ++-
 qemu-img.c          |  41 +++++++++------
 8 files changed, 187 insertions(+), 86 deletions(-)

diff --git a/block.c b/block.c
index 2c00dddd80..b6f0d860d2 100644
--- a/block.c
+++ b/block.c
@@ -4978,7 +4978,9 @@ static void bdrv_close(BlockDriverState *bs)
 
 void bdrv_close_all(void)
 {
-    assert(job_next(NULL) == NULL);
+    WITH_JOB_LOCK_GUARD() {
+        assert(job_next(NULL) == NULL);
+    }
     GLOBAL_STATE_CODE();
 
     /* Drop references from requests still in flight, such as canceled block
@@ -6165,13 +6167,15 @@ XDbgBlockGraph *bdrv_get_xdbg_block_graph(Error **errp)
         }
     }
 
-    for (job = block_job_next(NULL); job; job = block_job_next(job)) {
-        GSList *el;
+    WITH_JOB_LOCK_GUARD() {
+        for (job = block_job_next(NULL); job; job = block_job_next(job)) {
+            GSList *el;
 
-        xdbg_graph_add_node(gr, job, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
-                           job->job.id);
-        for (el = job->nodes; el; el = el->next) {
-            xdbg_graph_add_edge(gr, job, (BdrvChild *)el->data);
+            xdbg_graph_add_node(gr, job, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
+                                job->job.id);
+            for (el = job->nodes; el; el = el->next) {
+                xdbg_graph_add_edge(gr, job, (BdrvChild *)el->data);
+            }
         }
     }
 
diff --git a/block/replication.c b/block/replication.c
index 55c8f894aa..a03b28726e 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -149,7 +149,9 @@ static void replication_close(BlockDriverState *bs)
     if (s->stage == BLOCK_REPLICATION_FAILOVER) {
         commit_job = &s->commit_job->job;
         assert(commit_job->aio_context == qemu_get_current_aio_context());
-        job_cancel_sync(commit_job, false);
+        WITH_JOB_LOCK_GUARD() {
+            job_cancel_sync(commit_job, false);
+        }
     }
 
     if (s->mode == REPLICATION_MODE_SECONDARY) {
@@ -726,7 +728,9 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
          * disk, secondary disk in backup_job_completed().
          */
         if (s->backup_job) {
-            job_cancel_sync(&s->backup_job->job, true);
+            WITH_JOB_LOCK_GUARD() {
+                job_cancel_sync(&s->backup_job->job, true);
+            }
         }
 
         if (!failover) {
diff --git a/blockdev.c b/blockdev.c
index 9230888e34..b1099e678c 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -150,6 +150,8 @@ void blockdev_mark_auto_del(BlockBackend *blk)
         return;
     }
 
+    JOB_LOCK_GUARD();
+
     for (job = block_job_next(NULL); job; job = block_job_next(job)) {
         if (block_job_has_bdrv(job, blk_bs(blk))) {
             AioContext *aio_context = job->job.aio_context;
@@ -1838,7 +1840,9 @@ static void drive_backup_abort(BlkActionState *common)
         aio_context = bdrv_get_aio_context(state->bs);
         aio_context_acquire(aio_context);
 
-        job_cancel_sync(&state->job->job, true);
+        WITH_JOB_LOCK_GUARD() {
+            job_cancel_sync(&state->job->job, true);
+        }
 
         aio_context_release(aio_context);
     }
@@ -1939,7 +1943,9 @@ static void blockdev_backup_abort(BlkActionState *common)
         aio_context = bdrv_get_aio_context(state->bs);
         aio_context_acquire(aio_context);
 
-        job_cancel_sync(&state->job->job, true);
+        WITH_JOB_LOCK_GUARD() {
+            job_cancel_sync(&state->job->job, true);
+        }
 
         aio_context_release(aio_context);
     }
@@ -2388,7 +2394,10 @@ exit:
     if (!has_props) {
         qapi_free_TransactionProperties(props);
     }
-    job_txn_unref(block_job_txn);
+
+    WITH_JOB_LOCK_GUARD() {
+        job_txn_unref(block_job_txn);
+    }
 }
 
 BlockDirtyBitmapSha256 *qmp_x_debug_block_dirty_bitmap_sha256(const char *node,
@@ -3720,6 +3729,8 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
     BlockJobInfoList *head = NULL, **tail = &head;
     BlockJob *job;
 
+    JOB_LOCK_GUARD();
+
     for (job = block_job_next(NULL); job; job = block_job_next(job)) {
         BlockJobInfo *value;
         AioContext *aio_context;
diff --git a/blockjob.c b/blockjob.c
index 4868453d74..d726efe679 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -102,7 +102,9 @@ static char *child_job_get_parent_desc(BdrvChild *c)
 static void child_job_drained_begin(BdrvChild *c)
 {
     BlockJob *job = c->opaque;
-    job_pause(&job->job);
+    WITH_JOB_LOCK_GUARD() {
+        job_pause(&job->job);
+    }
 }
 
 static bool child_job_drained_poll(BdrvChild *c)
@@ -114,8 +116,10 @@ static bool child_job_drained_poll(BdrvChild *c)
     /* An inactive or completed job doesn't have any pending requests. Jobs
      * with !job->busy are either already paused or have a pause point after
      * being reentered, so no job driver code will run before they pause. */
-    if (!job->busy || job_is_completed(job)) {
-        return false;
+    WITH_JOB_LOCK_GUARD() {
+        if (!job->busy || job_is_completed_locked(job)) {
+            return false;
+        }
     }
 
     /* Otherwise, assume that it isn't fully stopped yet, but allow the job to
@@ -130,7 +134,9 @@ static bool child_job_drained_poll(BdrvChild *c)
 static void child_job_drained_end(BdrvChild *c, int *drained_end_counter)
 {
     BlockJob *job = c->opaque;
-    job_resume(&job->job);
+    WITH_JOB_LOCK_GUARD() {
+        job_resume(&job->job);
+    }
 }
 
 static bool child_job_can_set_aio_ctx(BdrvChild *c, AioContext *ctx,
@@ -292,7 +298,9 @@ bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
     job->speed = speed;
 
     if (drv->set_speed) {
+        job_unlock();
         drv->set_speed(job, speed);
+        job_lock();
     }
 
     if (speed && speed <= old_speed) {
@@ -335,7 +343,7 @@ BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
     info->len       = progress_total;
     info->speed     = job->speed;
     info->io_status = job->iostatus;
-    info->ready     = job_is_ready(&job->job),
+    info->ready     = job_is_ready_locked(&job->job),
     info->status    = job->job.status;
     info->auto_finalize = job->job.auto_finalize;
     info->auto_dismiss  = job->job.auto_dismiss;
@@ -469,13 +477,15 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     job->ready_notifier.notify = block_job_event_ready;
     job->idle_notifier.notify = block_job_on_idle;
 
-    notifier_list_add(&job->job.on_finalize_cancelled,
-                      &job->finalize_cancelled_notifier);
-    notifier_list_add(&job->job.on_finalize_completed,
-                      &job->finalize_completed_notifier);
-    notifier_list_add(&job->job.on_pending, &job->pending_notifier);
-    notifier_list_add(&job->job.on_ready, &job->ready_notifier);
-    notifier_list_add(&job->job.on_idle, &job->idle_notifier);
+    WITH_JOB_LOCK_GUARD() {
+        notifier_list_add(&job->job.on_finalize_cancelled,
+                          &job->finalize_cancelled_notifier);
+        notifier_list_add(&job->job.on_finalize_completed,
+                          &job->finalize_completed_notifier);
+        notifier_list_add(&job->job.on_pending, &job->pending_notifier);
+        notifier_list_add(&job->job.on_ready, &job->ready_notifier);
+        notifier_list_add(&job->job.on_idle, &job->idle_notifier);
+    }
 
     error_setg(&job->blocker, "block device is in use by block job: %s",
                job_type_str(&job->job));
@@ -487,7 +497,10 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
 
     bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
 
-    if (!block_job_set_speed(job, speed, errp)) {
+    WITH_JOB_LOCK_GUARD() {
+        ret = block_job_set_speed(job, speed, errp);
+    }
+    if (!ret) {
         goto fail;
     }
 
@@ -512,7 +525,9 @@ void block_job_user_resume(Job *job)
 {
     BlockJob *bjob = container_of(job, BlockJob, job);
     GLOBAL_STATE_CODE();
-    block_job_iostatus_reset(bjob);
+    WITH_JOB_LOCK_GUARD() {
+        block_job_iostatus_reset(bjob);
+    }
 }
 
 BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
@@ -546,10 +561,15 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
                                         action);
     }
     if (action == BLOCK_ERROR_ACTION_STOP) {
-        if (!job->job.user_paused) {
-            job_pause(&job->job);
-            /* make the pause user visible, which will be resumed from QMP. */
-            job->job.user_paused = true;
+        WITH_JOB_LOCK_GUARD() {
+            if (!job->job.user_paused) {
+                job_pause(&job->job);
+                /*
+                 * make the pause user visible, which will be
+                 * resumed from QMP.
+                 */
+                job->job.user_paused = true;
+            }
         }
         block_job_iostatus_set_err(job, error);
     }
diff --git a/job-qmp.c b/job-qmp.c
index 829a28aa70..270df1eb7e 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -171,6 +171,8 @@ JobInfoList *qmp_query_jobs(Error **errp)
     JobInfoList *head = NULL, **tail = &head;
     Job *job;
 
+    JOB_LOCK_GUARD();
+
     for (job = job_next(NULL); job; job = job_next(job)) {
         JobInfo *value;
         AioContext *aio_context;
diff --git a/job.c b/job.c
index c4776985c4..55b92b2332 100644
--- a/job.c
+++ b/job.c
@@ -361,6 +361,8 @@ void *job_create(const char *job_id, const JobDriver *driver, JobTxn *txn,
 {
     Job *job;
 
+    JOB_LOCK_GUARD();
+
     if (job_id) {
         if (flags & JOB_INTERNAL) {
             error_setg(errp, "Cannot specify job ID for internal job");
@@ -435,7 +437,9 @@ void job_unref(Job *job)
         assert(!job->txn);
 
         if (job->driver->free) {
+            job_unlock();
             job->driver->free(job);
+            job_lock();
         }
 
         QLIST_REMOVE(job, job_list);
@@ -522,6 +526,7 @@ void job_enter_cond(Job *job, bool(*fn)(Job *job))
 
 void job_enter(Job *job)
 {
+    JOB_LOCK_GUARD();
     job_enter_cond(job, NULL);
 }
 
@@ -540,7 +545,9 @@ static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
     job->busy = false;
     job_event_idle(job);
     real_job_unlock();
+    job_unlock();
     qemu_coroutine_yield();
+    job_lock();
 
     /* Set by job_enter_cond() before re-entering the coroutine.  */
     assert(job->busy);
@@ -554,15 +561,17 @@ static void coroutine_fn job_pause_point_locked(Job *job)
     if (!job_should_pause(job)) {
         return;
     }
-    if (job_is_cancelled(job)) {
+    if (job_is_cancelled_locked(job)) {
         return;
     }
 
     if (job->driver->pause) {
+        job_unlock();
         job->driver->pause(job);
+        job_lock();
     }
 
-    if (job_should_pause(job) && !job_is_cancelled(job)) {
+    if (job_should_pause(job) && !job_is_cancelled_locked(job)) {
         JobStatus status = job->status;
         job_state_transition(job, status == JOB_STATUS_READY
                                   ? JOB_STATUS_STANDBY
@@ -574,7 +583,9 @@ static void coroutine_fn job_pause_point_locked(Job *job)
     }
 
     if (job->driver->resume) {
+        job_unlock();
         job->driver->resume(job);
+        job_lock();
     }
 }
 
@@ -586,10 +597,11 @@ void coroutine_fn job_pause_point(Job *job)
 
 void job_yield(Job *job)
 {
+    JOB_LOCK_GUARD();
     assert(job->busy);
 
     /* Check cancellation *before* setting busy = false, too!  */
-    if (job_is_cancelled(job)) {
+    if (job_is_cancelled_locked(job)) {
         return;
     }
 
@@ -597,15 +609,16 @@ void job_yield(Job *job)
         job_do_yield(job, -1);
     }
 
-    job_pause_point(job);
+    job_pause_point_locked(job);
 }
 
 void coroutine_fn job_sleep_ns(Job *job, int64_t ns)
 {
+    JOB_LOCK_GUARD();
     assert(job->busy);
 
     /* Check cancellation *before* setting busy = false, too!  */
-    if (job_is_cancelled(job)) {
+    if (job_is_cancelled_locked(job)) {
         return;
     }
 
@@ -613,10 +626,10 @@ void coroutine_fn job_sleep_ns(Job *job, int64_t ns)
         job_do_yield(job, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + ns);
     }
 
-    job_pause_point(job);
+    job_pause_point_locked(job);
 }
 
-/* Assumes the block_job_mutex is held */
+/* Assumes the job_mutex is held */
 static bool job_timer_not_pending(Job *job)
 {
     return !timer_pending(&job->sleep_timer);
@@ -626,7 +639,7 @@ void job_pause(Job *job)
 {
     job->pause_count++;
     if (!job->paused) {
-        job_enter(job);
+        job_enter_cond(job, NULL);
     }
 }
 
@@ -672,7 +685,9 @@ void job_user_resume(Job *job, Error **errp)
         return;
     }
     if (job->driver->user_resume) {
+        job_unlock();
         job->driver->user_resume(job);
+        job_lock();
     }
     job->user_paused = false;
     job_resume(job);
@@ -706,6 +721,7 @@ void job_dismiss(Job **jobptr, Error **errp)
 
 void job_early_fail(Job *job)
 {
+    JOB_LOCK_GUARD();
     assert(job->status == JOB_STATUS_CREATED);
     job_do_dismiss(job);
 }
@@ -720,7 +736,7 @@ static void job_conclude(Job *job)
 
 static void job_update_rc(Job *job)
 {
-    if (!job->ret && job_is_cancelled(job)) {
+    if (!job->ret && job_is_cancelled_locked(job)) {
         job->ret = -ECANCELED;
     }
     if (job->ret) {
@@ -736,7 +752,9 @@ static void job_commit(Job *job)
     assert(!job->ret);
     GLOBAL_STATE_CODE();
     if (job->driver->commit) {
+        job_unlock();
         job->driver->commit(job);
+        job_lock();
     }
 }
 
@@ -745,7 +763,9 @@ static void job_abort(Job *job)
     assert(job->ret);
     GLOBAL_STATE_CODE();
     if (job->driver->abort) {
+        job_unlock();
         job->driver->abort(job);
+        job_lock();
     }
 }
 
@@ -753,13 +773,17 @@ static void job_clean(Job *job)
 {
     GLOBAL_STATE_CODE();
     if (job->driver->clean) {
+        job_unlock();
         job->driver->clean(job);
+        job_lock();
     }
 }
 
 static int job_finalize_single(Job *job)
 {
-    assert(job_is_completed(job));
+    int job_ret;
+
+    assert(job_is_completed_locked(job));
 
     /* Ensure abort is called for late-transactional failures */
     job_update_rc(job);
@@ -772,12 +796,15 @@ static int job_finalize_single(Job *job)
     job_clean(job);
 
     if (job->cb) {
-        job->cb(job->opaque, job->ret);
+        job_ret = job->ret;
+        job_unlock();
+        job->cb(job->opaque, job_ret);
+        job_lock();
     }
 
     /* Emit events only if we actually started */
     if (job_started(job)) {
-        if (job_is_cancelled(job)) {
+        if (job_is_cancelled_locked(job)) {
             job_event_cancelled(job);
         } else {
             job_event_completed(job);
@@ -793,7 +820,9 @@ static void job_cancel_async(Job *job, bool force)
 {
     GLOBAL_STATE_CODE();
     if (job->driver->cancel) {
+        job_unlock();
         force = job->driver->cancel(job, force);
+        job_lock();
     } else {
         /* No .cancel() means the job will behave as if force-cancelled */
         force = true;
@@ -802,7 +831,9 @@ static void job_cancel_async(Job *job, bool force)
     if (job->user_paused) {
         /* Do not call job_enter here, the caller will handle it.  */
         if (job->driver->user_resume) {
+            job_unlock();
             job->driver->user_resume(job);
+            job_lock();
         }
         job->user_paused = false;
         assert(job->pause_count > 0);
@@ -871,8 +902,8 @@ static void job_completed_txn_abort(Job *job)
          */
         ctx = other_job->aio_context;
         aio_context_acquire(ctx);
-        if (!job_is_completed(other_job)) {
-            assert(job_cancel_requested(other_job));
+        if (!job_is_completed_locked(other_job)) {
+            assert(job_cancel_requested_locked(other_job));
             job_finish_sync(other_job, NULL, NULL);
         }
         job_finalize_single(other_job);
@@ -891,9 +922,14 @@ static void job_completed_txn_abort(Job *job)
 
 static int job_prepare(Job *job)
 {
+    int ret;
+
     GLOBAL_STATE_CODE();
     if (job->ret == 0 && job->driver->prepare) {
-        job->ret = job->driver->prepare(job);
+        job_unlock();
+        ret = job->driver->prepare(job);
+        job_lock();
+        job->ret = ret;
         job_update_rc(job);
     }
     return job->ret;
@@ -938,6 +974,7 @@ static int job_transition_to_pending(Job *job)
 
 void job_transition_to_ready(Job *job)
 {
+    JOB_LOCK_GUARD();
     job_state_transition(job, JOB_STATUS_READY);
     job_event_ready(job);
 }
@@ -954,7 +991,7 @@ static void job_completed_txn_success(Job *job)
      * txn.
      */
     QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
-        if (!job_is_completed(other_job)) {
+        if (!job_is_completed_locked(other_job)) {
             return;
         }
         assert(other_job->ret == 0);
@@ -970,7 +1007,7 @@ static void job_completed_txn_success(Job *job)
 
 static void job_completed(Job *job)
 {
-    assert(job && job->txn && !job_is_completed(job));
+    assert(job && job->txn && !job_is_completed_locked(job));
 
     job_update_rc(job);
     trace_job_completed(job, job->ret);
@@ -1021,25 +1058,33 @@ static void job_exit(void *opaque)
 static void coroutine_fn job_co_entry(void *opaque)
 {
     Job *job = opaque;
+    int ret;
 
     assert(job && job->driver && job->driver->run);
-    assert(job->aio_context == qemu_get_current_aio_context());
-    job_pause_point(job);
-    job->ret = job->driver->run(job, &job->err);
-    job->deferred_to_main_loop = true;
-    job->busy = true;
+    WITH_JOB_LOCK_GUARD() {
+        assert(job->aio_context == qemu_get_current_aio_context());
+        job_pause_point_locked(job);
+    }
+    ret = job->driver->run(job, &job->err);
+    WITH_JOB_LOCK_GUARD() {
+        job->ret = ret;
+        job->deferred_to_main_loop = true;
+        job->busy = true;
+    }
     aio_bh_schedule_oneshot(qemu_get_aio_context(), job_exit, job);
 }
 
 void job_start(Job *job)
 {
-    assert(job && !job_started(job) && job->paused &&
-           job->driver && job->driver->run);
-    job->co = qemu_coroutine_create(job_co_entry, job);
-    job->pause_count--;
-    job->busy = true;
-    job->paused = false;
-    job_state_transition(job, JOB_STATUS_RUNNING);
+    WITH_JOB_LOCK_GUARD() {
+        assert(job && !job_started(job) && job->paused &&
+            job->driver && job->driver->run);
+        job->co = qemu_coroutine_create(job_co_entry, job);
+        job->pause_count--;
+        job->busy = true;
+        job->paused = false;
+        job_state_transition(job, JOB_STATUS_RUNNING);
+    }
     aio_co_enter(job->aio_context, job->co);
 }
 
@@ -1057,17 +1102,17 @@ void job_cancel(Job *job, bool force)
          * job_cancel_async() ignores soft-cancel requests for jobs
          * that are already done (i.e. deferred to the main loop).  We
          * have to check again whether the job is really cancelled.
-         * (job_cancel_requested() and job_is_cancelled() are equivalent
-         * here, because job_cancel_async() will make soft-cancel
-         * requests no-ops when deferred_to_main_loop is true.  We
-         * choose to call job_is_cancelled() to show that we invoke
+         * (job_cancel_requested_locked() and job_is_cancelled_locked()
+         * are equivalent here, because job_cancel_async() will
+         * make soft-cancel requests no-ops when deferred_to_main_loop is true.
+         * We choose to call job_is_cancelled_locked() to show that we invoke
          * job_completed_txn_abort() only for force-cancelled jobs.)
          */
-        if (job_is_cancelled(job)) {
+        if (job_is_cancelled_locked(job)) {
             job_completed_txn_abort(job);
         }
     } else {
-        job_enter(job);
+        job_enter_cond(job, NULL);
     }
 }
 
@@ -1109,6 +1154,7 @@ void job_cancel_sync_all(void)
     Job *job;
     AioContext *aio_context;
 
+    JOB_LOCK_GUARD();
     while ((job = job_next(NULL))) {
         aio_context = job->aio_context;
         aio_context_acquire(aio_context);
@@ -1130,13 +1176,15 @@ void job_complete(Job *job, Error **errp)
     if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
         return;
     }
-    if (job_cancel_requested(job) || !job->driver->complete) {
+    if (job_cancel_requested_locked(job) || !job->driver->complete) {
         error_setg(errp, "The active block job '%s' cannot be completed",
                    job->id);
         return;
     }
 
+    job_unlock();
     job->driver->complete(job, errp);
+    job_lock();
 }
 
 int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
@@ -1155,10 +1203,13 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
         return -EBUSY;
     }
 
+    job_unlock();
     AIO_WAIT_WHILE(job->aio_context,
                    (job_enter(job), !job_is_completed(job)));
+    job_lock();
 
-    ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : job->ret;
+    ret = (job_is_cancelled_locked(job) && job->ret == 0)
+          ? -ECANCELED : job->ret;
     job_unref(job);
     return ret;
 }
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 1ebb89f46c..39d9d06a81 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -133,8 +133,10 @@ void qmp_cont(Error **errp)
         blk_iostatus_reset(blk);
     }
 
-    for (job = block_job_next(NULL); job; job = block_job_next(job)) {
-        block_job_iostatus_reset(job);
+    WITH_JOB_LOCK_GUARD() {
+        for (job = block_job_next(NULL); job; job = block_job_next(job)) {
+            block_job_iostatus_reset(job);
+        }
     }
 
     /* Continuing after completed migration. Images have been inactivated to
diff --git a/qemu-img.c b/qemu-img.c
index 4cf4d2423d..d1f5eda687 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -912,25 +912,30 @@ static void run_block_job(BlockJob *job, Error **errp)
     int ret = 0;
 
     aio_context_acquire(aio_context);
-    job_ref(&job->job);
-    do {
-        float progress = 0.0f;
-        aio_poll(aio_context, true);
+    WITH_JOB_LOCK_GUARD() {
+        job_ref(&job->job);
+        do {
+            float progress = 0.0f;
+            job_unlock();
+            aio_poll(aio_context, true);
+
+            progress_get_snapshot(&job->job.progress, &progress_current,
+                                &progress_total);
+            if (progress_total) {
+                progress = (float)progress_current / progress_total * 100.f;
+            }
+            qemu_progress_print(progress, 0);
+            job_lock();
+        } while (!job_is_ready_locked(&job->job) &&
+                 !job_is_completed_locked(&job->job));
 
-        progress_get_snapshot(&job->job.progress, &progress_current,
-                              &progress_total);
-        if (progress_total) {
-            progress = (float)progress_current / progress_total * 100.f;
+        if (!job_is_completed_locked(&job->job)) {
+            ret = job_complete_sync(&job->job, errp);
+        } else {
+            ret = job->job.ret;
         }
-        qemu_progress_print(progress, 0);
-    } while (!job_is_ready(&job->job) && !job_is_completed(&job->job));
-
-    if (!job_is_completed(&job->job)) {
-        ret = job_complete_sync(&job->job, errp);
-    } else {
-        ret = job->job.ret;
+        job_unref(&job->job);
     }
-    job_unref(&job->job);
     aio_context_release(aio_context);
 
     /* publish completion progress only when success */
@@ -1083,7 +1088,9 @@ static int img_commit(int argc, char **argv)
         bdrv_ref(bs);
     }
 
-    job = block_job_get("commit");
+    WITH_JOB_LOCK_GUARD() {
+        job = block_job_get("commit");
+    }
     assert(job);
     run_block_job(job, &local_err);
     if (local_err) {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 07/18] jobs: add job lock in find_* functions
  2022-06-16 13:18 [PATCH v7 00/18] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (5 preceding siblings ...)
  2022-06-16 13:18 ` [PATCH v7 06/18] jobs: protect jobs with job_lock/unlock Emanuele Giuseppe Esposito
@ 2022-06-16 13:18 ` Emanuele Giuseppe Esposito
  2022-06-16 13:18 ` [PATCH v7 08/18] jobs: use job locks also in the unit tests Emanuele Giuseppe Esposito
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-16 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel,
	Emanuele Giuseppe Esposito

Both blockdev.c and job-qmp.c have TOC/TOU conditions, because
they first search for the job and then perform an action on it.
Therefore, we need to do the search + action under the same
job mutex critical section.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 blockdev.c | 46 +++++++++++++++++++++++++++++++++++-----------
 job-qmp.c  | 37 +++++++++++++++++++++++++++++--------
 2 files changed, 64 insertions(+), 19 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index b1099e678c..6f83783f10 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3311,9 +3311,13 @@ out:
     aio_context_release(aio_context);
 }
 
-/* Get a block job using its ID and acquire its AioContext */
-static BlockJob *find_block_job(const char *id, AioContext **aio_context,
-                                Error **errp)
+/*
+ * Get a block job using its ID and acquire its AioContext.
+ * Called with job_mutex held.
+ */
+static BlockJob *find_block_job_locked(const char *id,
+                                       AioContext **aio_context,
+                                       Error **errp)
 {
     BlockJob *job;
 
@@ -3322,7 +3326,6 @@ static BlockJob *find_block_job(const char *id, AioContext **aio_context,
     *aio_context = NULL;
 
     job = block_job_get(id);
-
     if (!job) {
         error_set(errp, ERROR_CLASS_DEVICE_NOT_ACTIVE,
                   "Block job '%s' not found", id);
@@ -3338,7 +3341,10 @@ static BlockJob *find_block_job(const char *id, AioContext **aio_context,
 void qmp_block_job_set_speed(const char *device, int64_t speed, Error **errp)
 {
     AioContext *aio_context;
-    BlockJob *job = find_block_job(device, &aio_context, errp);
+    BlockJob *job;
+
+    JOB_LOCK_GUARD();
+    job = find_block_job_locked(device, &aio_context, errp);
 
     if (!job) {
         return;
@@ -3352,7 +3358,10 @@ void qmp_block_job_cancel(const char *device,
                           bool has_force, bool force, Error **errp)
 {
     AioContext *aio_context;
-    BlockJob *job = find_block_job(device, &aio_context, errp);
+    BlockJob *job;
+
+    JOB_LOCK_GUARD();
+    job = find_block_job_locked(device, &aio_context, errp);
 
     if (!job) {
         return;
@@ -3377,7 +3386,10 @@ out:
 void qmp_block_job_pause(const char *device, Error **errp)
 {
     AioContext *aio_context;
-    BlockJob *job = find_block_job(device, &aio_context, errp);
+    BlockJob *job;
+
+    JOB_LOCK_GUARD();
+    job = find_block_job_locked(device, &aio_context, errp);
 
     if (!job) {
         return;
@@ -3391,7 +3403,10 @@ void qmp_block_job_pause(const char *device, Error **errp)
 void qmp_block_job_resume(const char *device, Error **errp)
 {
     AioContext *aio_context;
-    BlockJob *job = find_block_job(device, &aio_context, errp);
+    BlockJob *job;
+
+    JOB_LOCK_GUARD();
+    job = find_block_job_locked(device, &aio_context, errp);
 
     if (!job) {
         return;
@@ -3405,7 +3420,10 @@ void qmp_block_job_resume(const char *device, Error **errp)
 void qmp_block_job_complete(const char *device, Error **errp)
 {
     AioContext *aio_context;
-    BlockJob *job = find_block_job(device, &aio_context, errp);
+    BlockJob *job;
+
+    JOB_LOCK_GUARD();
+    job = find_block_job_locked(device, &aio_context, errp);
 
     if (!job) {
         return;
@@ -3419,7 +3437,10 @@ void qmp_block_job_complete(const char *device, Error **errp)
 void qmp_block_job_finalize(const char *id, Error **errp)
 {
     AioContext *aio_context;
-    BlockJob *job = find_block_job(id, &aio_context, errp);
+    BlockJob *job;
+
+    JOB_LOCK_GUARD();
+    job = find_block_job_locked(id, &aio_context, errp);
 
     if (!job) {
         return;
@@ -3442,9 +3463,12 @@ void qmp_block_job_finalize(const char *id, Error **errp)
 void qmp_block_job_dismiss(const char *id, Error **errp)
 {
     AioContext *aio_context;
-    BlockJob *bjob = find_block_job(id, &aio_context, errp);
+    BlockJob *bjob;
     Job *job;
 
+    JOB_LOCK_GUARD();
+    bjob = find_block_job_locked(id, &aio_context, errp);
+
     if (!bjob) {
         return;
     }
diff --git a/job-qmp.c b/job-qmp.c
index 270df1eb7e..58ca9b6632 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -29,8 +29,11 @@
 #include "qapi/error.h"
 #include "trace/trace-root.h"
 
-/* Get a job using its ID and acquire its AioContext */
-static Job *find_job(const char *id, AioContext **aio_context, Error **errp)
+/*
+ * Get a block job using its ID and acquire its AioContext.
+ * Called with job_mutex held.
+ */
+static Job *find_job_locked(const char *id, AioContext **aio_context, Error **errp)
 {
     Job *job;
 
@@ -51,7 +54,10 @@ static Job *find_job(const char *id, AioContext **aio_context, Error **errp)
 void qmp_job_cancel(const char *id, Error **errp)
 {
     AioContext *aio_context;
-    Job *job = find_job(id, &aio_context, errp);
+    Job *job;
+
+    JOB_LOCK_GUARD();
+    job = find_job_locked(id, &aio_context, errp);
 
     if (!job) {
         return;
@@ -65,7 +71,10 @@ void qmp_job_cancel(const char *id, Error **errp)
 void qmp_job_pause(const char *id, Error **errp)
 {
     AioContext *aio_context;
-    Job *job = find_job(id, &aio_context, errp);
+    Job *job;
+
+    JOB_LOCK_GUARD();
+    job = find_job_locked(id, &aio_context, errp);
 
     if (!job) {
         return;
@@ -79,7 +88,10 @@ void qmp_job_pause(const char *id, Error **errp)
 void qmp_job_resume(const char *id, Error **errp)
 {
     AioContext *aio_context;
-    Job *job = find_job(id, &aio_context, errp);
+    Job *job;
+
+    JOB_LOCK_GUARD();
+    job = find_job_locked(id, &aio_context, errp);
 
     if (!job) {
         return;
@@ -93,7 +105,10 @@ void qmp_job_resume(const char *id, Error **errp)
 void qmp_job_complete(const char *id, Error **errp)
 {
     AioContext *aio_context;
-    Job *job = find_job(id, &aio_context, errp);
+    Job *job;
+
+    JOB_LOCK_GUARD();
+    job = find_job_locked(id, &aio_context, errp);
 
     if (!job) {
         return;
@@ -107,7 +122,10 @@ void qmp_job_complete(const char *id, Error **errp)
 void qmp_job_finalize(const char *id, Error **errp)
 {
     AioContext *aio_context;
-    Job *job = find_job(id, &aio_context, errp);
+    Job *job;
+
+    JOB_LOCK_GUARD();
+    job = find_job_locked(id, &aio_context, errp);
 
     if (!job) {
         return;
@@ -130,7 +148,10 @@ void qmp_job_finalize(const char *id, Error **errp)
 void qmp_job_dismiss(const char *id, Error **errp)
 {
     AioContext *aio_context;
-    Job *job = find_job(id, &aio_context, errp);
+    Job *job;
+
+    JOB_LOCK_GUARD();
+    job = find_job_locked(id, &aio_context, errp);
 
     if (!job) {
         return;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 08/18] jobs: use job locks also in the unit tests
  2022-06-16 13:18 [PATCH v7 00/18] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (6 preceding siblings ...)
  2022-06-16 13:18 ` [PATCH v7 07/18] jobs: add job lock in find_* functions Emanuele Giuseppe Esposito
@ 2022-06-16 13:18 ` Emanuele Giuseppe Esposito
  2022-06-16 13:18 ` [PATCH v7 09/18] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU Emanuele Giuseppe Esposito
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-16 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel,
	Emanuele Giuseppe Esposito

Add missing job synchronization in the unit tests, with
explicit locks.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 tests/unit/test-bdrv-drain.c     | 76 ++++++++++++++++---------
 tests/unit/test-block-iothread.c |  8 ++-
 tests/unit/test-blockjob-txn.c   | 32 +++++++----
 tests/unit/test-blockjob.c       | 96 ++++++++++++++++++++++++--------
 4 files changed, 148 insertions(+), 64 deletions(-)

diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 36be84ae55..181458eecb 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -943,61 +943,83 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
         }
     }
 
-    g_assert_cmpint(job->job.pause_count, ==, 0);
-    g_assert_false(job->job.paused);
-    g_assert_true(tjob->running);
-    g_assert_true(job->job.busy); /* We're in qemu_co_sleep_ns() */
+    WITH_JOB_LOCK_GUARD() {
+        g_assert_cmpint(job->job.pause_count, ==, 0);
+        g_assert_false(job->job.paused);
+        g_assert_true(tjob->running);
+        g_assert_true(job->job.busy); /* We're in qemu_co_sleep_ns() */
+    }
 
     do_drain_begin_unlocked(drain_type, drain_bs);
 
-    if (drain_type == BDRV_DRAIN_ALL) {
-        /* bdrv_drain_all() drains both src and target */
-        g_assert_cmpint(job->job.pause_count, ==, 2);
-    } else {
-        g_assert_cmpint(job->job.pause_count, ==, 1);
+    WITH_JOB_LOCK_GUARD() {
+        if (drain_type == BDRV_DRAIN_ALL) {
+            /* bdrv_drain_all() drains both src and target */
+            g_assert_cmpint(job->job.pause_count, ==, 2);
+        } else {
+            g_assert_cmpint(job->job.pause_count, ==, 1);
+        }
+        g_assert_true(job->job.paused);
+        g_assert_false(job->job.busy); /* The job is paused */
     }
-    g_assert_true(job->job.paused);
-    g_assert_false(job->job.busy); /* The job is paused */
 
     do_drain_end_unlocked(drain_type, drain_bs);
 
     if (use_iothread) {
-        /* paused is reset in the I/O thread, wait for it */
+        /*
+         * Here we are waiting for the paused status to change,
+         * so don't bother protecting the read every time.
+         *
+         * paused is reset in the I/O thread, wait for it
+         */
         while (job->job.paused) {
             aio_poll(qemu_get_aio_context(), false);
         }
     }
 
-    g_assert_cmpint(job->job.pause_count, ==, 0);
-    g_assert_false(job->job.paused);
-    g_assert_true(job->job.busy); /* We're in qemu_co_sleep_ns() */
+    WITH_JOB_LOCK_GUARD() {
+        g_assert_cmpint(job->job.pause_count, ==, 0);
+        g_assert_false(job->job.paused);
+        g_assert_true(job->job.busy); /* We're in qemu_co_sleep_ns() */
+    }
 
     do_drain_begin_unlocked(drain_type, target);
 
-    if (drain_type == BDRV_DRAIN_ALL) {
-        /* bdrv_drain_all() drains both src and target */
-        g_assert_cmpint(job->job.pause_count, ==, 2);
-    } else {
-        g_assert_cmpint(job->job.pause_count, ==, 1);
+    WITH_JOB_LOCK_GUARD() {
+        if (drain_type == BDRV_DRAIN_ALL) {
+            /* bdrv_drain_all() drains both src and target */
+            g_assert_cmpint(job->job.pause_count, ==, 2);
+        } else {
+            g_assert_cmpint(job->job.pause_count, ==, 1);
+        }
+        g_assert_true(job->job.paused);
+        g_assert_false(job->job.busy); /* The job is paused */
     }
-    g_assert_true(job->job.paused);
-    g_assert_false(job->job.busy); /* The job is paused */
 
     do_drain_end_unlocked(drain_type, target);
 
     if (use_iothread) {
-        /* paused is reset in the I/O thread, wait for it */
+        /*
+         * Here we are waiting for the paused status to change,
+         * so don't bother protecting the read every time.
+         *
+         * paused is reset in the I/O thread, wait for it
+         */
         while (job->job.paused) {
             aio_poll(qemu_get_aio_context(), false);
         }
     }
 
-    g_assert_cmpint(job->job.pause_count, ==, 0);
-    g_assert_false(job->job.paused);
-    g_assert_true(job->job.busy); /* We're in qemu_co_sleep_ns() */
+    WITH_JOB_LOCK_GUARD() {
+        g_assert_cmpint(job->job.pause_count, ==, 0);
+        g_assert_false(job->job.paused);
+        g_assert_true(job->job.busy); /* We're in qemu_co_sleep_ns() */
+    }
 
     aio_context_acquire(ctx);
-    ret = job_complete_sync(&job->job, &error_abort);
+    WITH_JOB_LOCK_GUARD() {
+        ret = job_complete_sync(&job->job, &error_abort);
+    }
     g_assert_cmpint(ret, ==, (result == TEST_JOB_SUCCESS ? 0 : -EIO));
 
     if (use_iothread) {
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index 94718c9319..9866262f79 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -456,7 +456,9 @@ static void test_attach_blockjob(void)
     }
 
     aio_context_acquire(ctx);
-    job_complete_sync(&tjob->common.job, &error_abort);
+    WITH_JOB_LOCK_GUARD() {
+        job_complete_sync(&tjob->common.job, &error_abort);
+    }
     blk_set_aio_context(blk, qemu_get_aio_context(), &error_abort);
     aio_context_release(ctx);
 
@@ -630,7 +632,9 @@ static void test_propagate_mirror(void)
                  BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
                  false, "filter_node", MIRROR_COPY_MODE_BACKGROUND,
                  &error_abort);
-    job = job_get("job0");
+    WITH_JOB_LOCK_GUARD() {
+        job = job_get("job0");
+    }
     filter = bdrv_find_node("filter_node");
 
     /* Change the AioContext of src */
diff --git a/tests/unit/test-blockjob-txn.c b/tests/unit/test-blockjob-txn.c
index c69028b450..0355e54001 100644
--- a/tests/unit/test-blockjob-txn.c
+++ b/tests/unit/test-blockjob-txn.c
@@ -116,8 +116,10 @@ static void test_single_job(int expected)
     job = test_block_job_start(1, true, expected, &result, txn);
     job_start(&job->job);
 
-    if (expected == -ECANCELED) {
-        job_cancel(&job->job, false);
+    WITH_JOB_LOCK_GUARD() {
+        if (expected == -ECANCELED) {
+            job_cancel(&job->job, false);
+        }
     }
 
     while (result == -EINPROGRESS) {
@@ -125,7 +127,9 @@ static void test_single_job(int expected)
     }
     g_assert_cmpint(result, ==, expected);
 
-    job_txn_unref(txn);
+    WITH_JOB_LOCK_GUARD() {
+        job_txn_unref(txn);
+    }
 }
 
 static void test_single_job_success(void)
@@ -160,13 +164,15 @@ static void test_pair_jobs(int expected1, int expected2)
     /* Release our reference now to trigger as many nice
      * use-after-free bugs as possible.
      */
-    job_txn_unref(txn);
+    WITH_JOB_LOCK_GUARD() {
+        job_txn_unref(txn);
 
-    if (expected1 == -ECANCELED) {
-        job_cancel(&job1->job, false);
-    }
-    if (expected2 == -ECANCELED) {
-        job_cancel(&job2->job, false);
+        if (expected1 == -ECANCELED) {
+            job_cancel(&job1->job, false);
+        }
+        if (expected2 == -ECANCELED) {
+            job_cancel(&job2->job, false);
+        }
     }
 
     while (result1 == -EINPROGRESS || result2 == -EINPROGRESS) {
@@ -219,7 +225,9 @@ static void test_pair_jobs_fail_cancel_race(void)
     job_start(&job1->job);
     job_start(&job2->job);
 
-    job_cancel(&job1->job, false);
+    WITH_JOB_LOCK_GUARD() {
+        job_cancel(&job1->job, false);
+    }
 
     /* Now make job2 finish before the main loop kicks jobs.  This simulates
      * the race between a pending kick and another job completing.
@@ -234,7 +242,9 @@ static void test_pair_jobs_fail_cancel_race(void)
     g_assert_cmpint(result1, ==, -ECANCELED);
     g_assert_cmpint(result2, ==, -ECANCELED);
 
-    job_txn_unref(txn);
+    WITH_JOB_LOCK_GUARD() {
+        job_txn_unref(txn);
+    }
 }
 
 int main(int argc, char **argv)
diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c
index 4c9e1bf1e5..ab7958dad5 100644
--- a/tests/unit/test-blockjob.c
+++ b/tests/unit/test-blockjob.c
@@ -211,8 +211,11 @@ static CancelJob *create_common(Job **pjob)
     bjob = mk_job(blk, "Steve", &test_cancel_driver, true,
                   JOB_MANUAL_FINALIZE | JOB_MANUAL_DISMISS);
     job = &bjob->job;
-    job_ref(job);
-    assert(job->status == JOB_STATUS_CREATED);
+    WITH_JOB_LOCK_GUARD() {
+        job_ref(job);
+        assert(job->status == JOB_STATUS_CREATED);
+    }
+
     s = container_of(bjob, CancelJob, common);
     s->blk = blk;
 
@@ -230,13 +233,15 @@ static void cancel_common(CancelJob *s)
     ctx = job->job.aio_context;
     aio_context_acquire(ctx);
 
-    job_cancel_sync(&job->job, true);
-    if (sts != JOB_STATUS_CREATED && sts != JOB_STATUS_CONCLUDED) {
-        Job *dummy = &job->job;
-        job_dismiss(&dummy, &error_abort);
+    WITH_JOB_LOCK_GUARD() {
+        job_cancel_sync(&job->job, true);
+        if (sts != JOB_STATUS_CREATED && sts != JOB_STATUS_CONCLUDED) {
+            Job *dummy = &job->job;
+            job_dismiss(&dummy, &error_abort);
+        }
+        assert(job->job.status == JOB_STATUS_NULL);
+        job_unref(&job->job);
     }
-    assert(job->job.status == JOB_STATUS_NULL);
-    job_unref(&job->job);
     destroy_blk(blk);
 
     aio_context_release(ctx);
@@ -251,6 +256,10 @@ static void test_cancel_created(void)
     cancel_common(s);
 }
 
+/*
+ * This test always runs in the main loop, so there is no
+ * need to protect job->status.
+ */
 static void test_cancel_running(void)
 {
     Job *job;
@@ -264,6 +273,10 @@ static void test_cancel_running(void)
     cancel_common(s);
 }
 
+/*
+ * This test always runs in the main loop, so there is no
+ * need to protect job->status.
+ */
 static void test_cancel_paused(void)
 {
     Job *job;
@@ -274,13 +287,19 @@ static void test_cancel_paused(void)
     job_start(job);
     assert(job->status == JOB_STATUS_RUNNING);
 
-    job_user_pause(job, &error_abort);
+    WITH_JOB_LOCK_GUARD() {
+        job_user_pause(job, &error_abort);
+    }
     job_enter(job);
     assert(job->status == JOB_STATUS_PAUSED);
 
     cancel_common(s);
 }
 
+/*
+ * This test always runs in the main loop, so there is no
+ * need to protect job->status.
+ */
 static void test_cancel_ready(void)
 {
     Job *job;
@@ -298,6 +317,10 @@ static void test_cancel_ready(void)
     cancel_common(s);
 }
 
+/*
+ * This test always runs in the main loop, so there is no
+ * need to protect job->status.
+ */
 static void test_cancel_standby(void)
 {
     Job *job;
@@ -312,13 +335,19 @@ static void test_cancel_standby(void)
     job_enter(job);
     assert(job->status == JOB_STATUS_READY);
 
-    job_user_pause(job, &error_abort);
+    WITH_JOB_LOCK_GUARD() {
+        job_user_pause(job, &error_abort);
+    }
     job_enter(job);
     assert(job->status == JOB_STATUS_STANDBY);
 
     cancel_common(s);
 }
 
+/*
+ * This test always runs in the main loop, so there is no
+ * need to protect job->status.
+ */
 static void test_cancel_pending(void)
 {
     Job *job;
@@ -333,7 +362,9 @@ static void test_cancel_pending(void)
     job_enter(job);
     assert(job->status == JOB_STATUS_READY);
 
-    job_complete(job, &error_abort);
+    WITH_JOB_LOCK_GUARD() {
+        job_complete(job, &error_abort);
+    }
     job_enter(job);
     while (!job->deferred_to_main_loop) {
         aio_poll(qemu_get_aio_context(), true);
@@ -345,6 +376,10 @@ static void test_cancel_pending(void)
     cancel_common(s);
 }
 
+/*
+ * This test always runs in the main loop, so there is no
+ * need to protect job->status.
+ */
 static void test_cancel_concluded(void)
 {
     Job *job;
@@ -359,7 +394,9 @@ static void test_cancel_concluded(void)
     job_enter(job);
     assert(job->status == JOB_STATUS_READY);
 
-    job_complete(job, &error_abort);
+    WITH_JOB_LOCK_GUARD() {
+        job_complete(job, &error_abort);
+    }
     job_enter(job);
     while (!job->deferred_to_main_loop) {
         aio_poll(qemu_get_aio_context(), true);
@@ -369,7 +406,9 @@ static void test_cancel_concluded(void)
     assert(job->status == JOB_STATUS_PENDING);
 
     aio_context_acquire(job->aio_context);
-    job_finalize(job, &error_abort);
+    WITH_JOB_LOCK_GUARD() {
+        job_finalize(job, &error_abort);
+    }
     aio_context_release(job->aio_context);
     assert(job->status == JOB_STATUS_CONCLUDED);
 
@@ -459,36 +498,45 @@ static void test_complete_in_standby(void)
     bjob = mk_job(blk, "job", &test_yielding_driver, true,
                   JOB_MANUAL_FINALIZE | JOB_MANUAL_DISMISS);
     job = &bjob->job;
+    /* Job did not start, so status is safe to read*/
     assert(job->status == JOB_STATUS_CREATED);
 
     /* Wait for the job to become READY */
     job_start(job);
     aio_context_acquire(ctx);
+    /*
+     * Here we are waiting for the status to change, so don't bother
+     * protecting the read every time.
+     */
     AIO_WAIT_WHILE(ctx, job->status != JOB_STATUS_READY);
     aio_context_release(ctx);
 
     /* Begin the drained section, pausing the job */
     bdrv_drain_all_begin();
-    assert(job->status == JOB_STATUS_STANDBY);
+    WITH_JOB_LOCK_GUARD() {
+        assert(job->status == JOB_STATUS_STANDBY);
+    }
     /* Lock the IO thread to prevent the job from being run */
     aio_context_acquire(ctx);
     /* This will schedule the job to resume it */
     bdrv_drain_all_end();
 
-    /* But the job cannot run, so it will remain on standby */
-    assert(job->status == JOB_STATUS_STANDBY);
+    WITH_JOB_LOCK_GUARD() {
+        /* But the job cannot run, so it will remain on standby */
+        assert(job->status == JOB_STATUS_STANDBY);
 
-    /* Even though the job is on standby, this should work */
-    job_complete(job, &error_abort);
+        /* Even though the job is on standby, this should work */
+        job_complete(job, &error_abort);
 
-    /* The test is done now, clean up. */
-    job_finish_sync(job, NULL, &error_abort);
-    assert(job->status == JOB_STATUS_PENDING);
+        /* The test is done now, clean up. */
+        job_finish_sync(job, NULL, &error_abort);
+        assert(job->status == JOB_STATUS_PENDING);
 
-    job_finalize(job, &error_abort);
-    assert(job->status == JOB_STATUS_CONCLUDED);
+        job_finalize(job, &error_abort);
+        assert(job->status == JOB_STATUS_CONCLUDED);
 
-    job_dismiss(&job, &error_abort);
+        job_dismiss(&job, &error_abort);
+    }
 
     destroy_blk(blk);
     aio_context_release(ctx);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 09/18] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU
  2022-06-16 13:18 [PATCH v7 00/18] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (7 preceding siblings ...)
  2022-06-16 13:18 ` [PATCH v7 08/18] jobs: use job locks also in the unit tests Emanuele Giuseppe Esposito
@ 2022-06-16 13:18 ` Emanuele Giuseppe Esposito
  2022-06-16 13:18 ` [PATCH v7 10/18] jobs: rename static functions called with job_mutex held Emanuele Giuseppe Esposito
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-16 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel,
	Emanuele Giuseppe Esposito

Once job lock is used and aiocontext is removed, mirror has
to perform job operations under the same critical section,
using the helpers prepared in previous commit.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/mirror.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index d8ecb9efa2..f5c6bac24f 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -654,9 +654,13 @@ static int mirror_exit_common(Job *job)
     BlockDriverState *target_bs;
     BlockDriverState *mirror_top_bs;
     Error *local_err = NULL;
-    bool abort = job->ret < 0;
+    bool abort;
     int ret = 0;
 
+    WITH_JOB_LOCK_GUARD() {
+        abort = job->ret < 0;
+    }
+
     if (s->prepared) {
         return 0;
     }
@@ -1152,8 +1156,10 @@ static void mirror_complete(Job *job, Error **errp)
     s->should_complete = true;
 
     /* If the job is paused, it will be re-entered when it is resumed */
-    if (!job->paused) {
-        job_enter(job);
+    WITH_JOB_LOCK_GUARD() {
+        if (!job->paused) {
+            job_enter_cond(job, NULL);
+        }
     }
 }
 
@@ -1173,8 +1179,11 @@ static bool mirror_drained_poll(BlockJob *job)
      * from one of our own drain sections, to avoid a deadlock waiting for
      * ourselves.
      */
-    if (!s->common.job.paused && !job_is_cancelled(&job->job) && !s->in_drain) {
-        return true;
+    WITH_JOB_LOCK_GUARD() {
+        if (!s->common.job.paused && !job_is_cancelled_locked(&job->job)
+            && !s->in_drain) {
+            return true;
+        }
     }
 
     return !!s->in_flight;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 10/18] jobs: rename static functions called with job_mutex held
  2022-06-16 13:18 [PATCH v7 00/18] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (8 preceding siblings ...)
  2022-06-16 13:18 ` [PATCH v7 09/18] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU Emanuele Giuseppe Esposito
@ 2022-06-16 13:18 ` Emanuele Giuseppe Esposito
  2022-06-21 17:26   ` Vladimir Sementsov-Ogievskiy
  2022-06-16 13:18 ` [PATCH v7 11/18] job.h: rename job API " Emanuele Giuseppe Esposito
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-16 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel,
	Emanuele Giuseppe Esposito

With the *nop* job_lock/unlock placed, rename the static
functions that are always under job_mutex, adding "_locked" suffix.

List of functions that get this suffix:
job_txn_ref		   job_txn_del_job
job_txn_apply		   job_state_transition
job_should_pause	   job_event_cancelled
job_event_completed	   job_event_pending
job_event_ready		   job_event_idle
job_do_yield		   job_timer_not_pending
job_do_dismiss		   job_conclude
job_update_rc		   job_commit
job_abort		   job_clean
job_finalize_single	   job_cancel_async
job_completed_txn_abort	   job_prepare
job_needs_finalize	   job_do_finalize
job_transition_to_pending  job_completed_txn_success
job_completed		   job_cancel_err
job_force_cancel_err

Note that "locked" refers to the *nop* job_lock/unlock, and not
real_job_lock/unlock.

No functional change intended.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 job.c | 247 +++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 141 insertions(+), 106 deletions(-)

diff --git a/job.c b/job.c
index 55b92b2332..4f4b387625 100644
--- a/job.c
+++ b/job.c
@@ -113,7 +113,8 @@ JobTxn *job_txn_new(void)
     return txn;
 }
 
-static void job_txn_ref(JobTxn *txn)
+/* Called with job_mutex held. */
+static void job_txn_ref_locked(JobTxn *txn)
 {
     txn->refcnt++;
 }
@@ -145,10 +146,11 @@ static void job_txn_add_job(JobTxn *txn, Job *job)
     job->txn = txn;
 
     QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
-    job_txn_ref(txn);
+    job_txn_ref_locked(txn);
 }
 
-static void job_txn_del_job(Job *job)
+/* Called with job_mutex held. */
+static void job_txn_del_job_locked(Job *job)
 {
     if (job->txn) {
         QLIST_REMOVE(job, txn_list);
@@ -157,7 +159,8 @@ static void job_txn_del_job(Job *job)
     }
 }
 
-static int job_txn_apply(Job *job, int fn(Job *))
+/* Called with job_mutex held. */
+static int job_txn_apply_locked(Job *job, int fn(Job *))
 {
     AioContext *inner_ctx;
     Job *other_job, *next;
@@ -165,10 +168,10 @@ static int job_txn_apply(Job *job, int fn(Job *))
     int rc = 0;
 
     /*
-     * Similar to job_completed_txn_abort, we take each job's lock before
-     * applying fn, but since we assume that outer_ctx is held by the caller,
-     * we need to release it here to avoid holding the lock twice - which would
-     * break AIO_WAIT_WHILE from within fn.
+     * Similar to job_completed_txn_abort_locked, we take each job's lock
+     * before applying fn, but since we assume that outer_ctx is held by
+     * the caller, we need to release it here to avoid holding the lock
+     * twice - which would break AIO_WAIT_WHILE from within fn.
      */
     job_ref(job);
     aio_context_release(job->aio_context);
@@ -197,7 +200,8 @@ bool job_is_internal(Job *job)
     return (job->id == NULL);
 }
 
-static void job_state_transition(Job *job, JobStatus s1)
+/* Called with job_mutex held. */
+static void job_state_transition_locked(Job *job, JobStatus s1)
 {
     JobStatus s0 = job->status;
     assert(s1 >= 0 && s1 < JOB_STATUS__MAX);
@@ -322,7 +326,8 @@ static bool job_started(Job *job)
     return job->co;
 }
 
-static bool job_should_pause(Job *job)
+/* Called with job_mutex held. */
+static bool job_should_pause_locked(Job *job)
 {
     return job->pause_count > 0;
 }
@@ -402,7 +407,7 @@ void *job_create(const char *job_id, const JobDriver *driver, JobTxn *txn,
     notifier_list_init(&job->on_ready);
     notifier_list_init(&job->on_idle);
 
-    job_state_transition(job, JOB_STATUS_CREATED);
+    job_state_transition_locked(job, JOB_STATUS_CREATED);
     aio_timer_init(qemu_get_aio_context(), &job->sleep_timer,
                    QEMU_CLOCK_REALTIME, SCALE_NS,
                    job_sleep_timer_cb, job);
@@ -468,31 +473,36 @@ void job_progress_increase_remaining(Job *job, uint64_t delta)
 
 /**
  * To be called when a cancelled job is finalised.
+ * Called with job_mutex held.
  */
-static void job_event_cancelled(Job *job)
+static void job_event_cancelled_locked(Job *job)
 {
     notifier_list_notify(&job->on_finalize_cancelled, job);
 }
 
 /**
  * To be called when a successfully completed job is finalised.
+ * Called with job_mutex held.
  */
-static void job_event_completed(Job *job)
+static void job_event_completed_locked(Job *job)
 {
     notifier_list_notify(&job->on_finalize_completed, job);
 }
 
-static void job_event_pending(Job *job)
+/* Called with job_mutex held. */
+static void job_event_pending_locked(Job *job)
 {
     notifier_list_notify(&job->on_pending, job);
 }
 
-static void job_event_ready(Job *job)
+/* Called with job_mutex held. */
+static void job_event_ready_locked(Job *job)
 {
     notifier_list_notify(&job->on_ready, job);
 }
 
-static void job_event_idle(Job *job)
+/* Called with job_mutex held. */
+static void job_event_idle_locked(Job *job)
 {
     notifier_list_notify(&job->on_idle, job);
 }
@@ -530,20 +540,24 @@ void job_enter(Job *job)
     job_enter_cond(job, NULL);
 }
 
-/* Yield, and schedule a timer to reenter the coroutine after @ns nanoseconds.
+/*
+ * Yield, and schedule a timer to reenter the coroutine after @ns nanoseconds.
  * Reentering the job coroutine with job_enter() before the timer has expired
  * is allowed and cancels the timer.
  *
  * If @ns is (uint64_t) -1, no timer is scheduled and job_enter() must be
- * called explicitly. */
-static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
+ * called explicitly.
+ *
+ * Called with job_mutex held, but releases it temporarily.
+ */
+static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns)
 {
     real_job_lock();
     if (ns != -1) {
         timer_mod(&job->sleep_timer, ns);
     }
     job->busy = false;
-    job_event_idle(job);
+    job_event_idle_locked(job);
     real_job_unlock();
     job_unlock();
     qemu_coroutine_yield();
@@ -558,7 +572,7 @@ static void coroutine_fn job_pause_point_locked(Job *job)
 {
     assert(job && job_started(job));
 
-    if (!job_should_pause(job)) {
+    if (!job_should_pause_locked(job)) {
         return;
     }
     if (job_is_cancelled_locked(job)) {
@@ -571,15 +585,15 @@ static void coroutine_fn job_pause_point_locked(Job *job)
         job_lock();
     }
 
-    if (job_should_pause(job) && !job_is_cancelled_locked(job)) {
+    if (job_should_pause_locked(job) && !job_is_cancelled_locked(job)) {
         JobStatus status = job->status;
-        job_state_transition(job, status == JOB_STATUS_READY
-                                  ? JOB_STATUS_STANDBY
-                                  : JOB_STATUS_PAUSED);
+        job_state_transition_locked(job, status == JOB_STATUS_READY
+                                    ? JOB_STATUS_STANDBY
+                                    : JOB_STATUS_PAUSED);
         job->paused = true;
-        job_do_yield(job, -1);
+        job_do_yield_locked(job, -1);
         job->paused = false;
-        job_state_transition(job, status);
+        job_state_transition_locked(job, status);
     }
 
     if (job->driver->resume) {
@@ -605,8 +619,8 @@ void job_yield(Job *job)
         return;
     }
 
-    if (!job_should_pause(job)) {
-        job_do_yield(job, -1);
+    if (!job_should_pause_locked(job)) {
+        job_do_yield_locked(job, -1);
     }
 
     job_pause_point_locked(job);
@@ -622,15 +636,15 @@ void coroutine_fn job_sleep_ns(Job *job, int64_t ns)
         return;
     }
 
-    if (!job_should_pause(job)) {
-        job_do_yield(job, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + ns);
+    if (!job_should_pause_locked(job)) {
+        job_do_yield_locked(job, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + ns);
     }
 
     job_pause_point_locked(job);
 }
 
 /* Assumes the job_mutex is held */
-static bool job_timer_not_pending(Job *job)
+static bool job_timer_not_pending_locked(Job *job)
 {
     return !timer_pending(&job->sleep_timer);
 }
@@ -652,7 +666,7 @@ void job_resume(Job *job)
     }
 
     /* kick only if no timer is pending */
-    job_enter_cond(job, job_timer_not_pending);
+    job_enter_cond(job, job_timer_not_pending_locked);
 }
 
 void job_user_pause(Job *job, Error **errp)
@@ -693,16 +707,17 @@ void job_user_resume(Job *job, Error **errp)
     job_resume(job);
 }
 
-static void job_do_dismiss(Job *job)
+/* Called with job_mutex held. */
+static void job_do_dismiss_locked(Job *job)
 {
     assert(job);
     job->busy = false;
     job->paused = false;
     job->deferred_to_main_loop = true;
 
-    job_txn_del_job(job);
+    job_txn_del_job_locked(job);
 
-    job_state_transition(job, JOB_STATUS_NULL);
+    job_state_transition_locked(job, JOB_STATUS_NULL);
     job_unref(job);
 }
 
@@ -715,7 +730,7 @@ void job_dismiss(Job **jobptr, Error **errp)
         return;
     }
 
-    job_do_dismiss(job);
+    job_do_dismiss_locked(job);
     *jobptr = NULL;
 }
 
@@ -723,18 +738,20 @@ void job_early_fail(Job *job)
 {
     JOB_LOCK_GUARD();
     assert(job->status == JOB_STATUS_CREATED);
-    job_do_dismiss(job);
+    job_do_dismiss_locked(job);
 }
 
-static void job_conclude(Job *job)
+/* Called with job_mutex held. */
+static void job_conclude_locked(Job *job)
 {
-    job_state_transition(job, JOB_STATUS_CONCLUDED);
+    job_state_transition_locked(job, JOB_STATUS_CONCLUDED);
     if (job->auto_dismiss || !job_started(job)) {
-        job_do_dismiss(job);
+        job_do_dismiss_locked(job);
     }
 }
 
-static void job_update_rc(Job *job)
+/* Called with job_mutex held. */
+static void job_update_rc_locked(Job *job)
 {
     if (!job->ret && job_is_cancelled_locked(job)) {
         job->ret = -ECANCELED;
@@ -743,11 +760,12 @@ static void job_update_rc(Job *job)
         if (!job->err) {
             error_setg(&job->err, "%s", strerror(-job->ret));
         }
-        job_state_transition(job, JOB_STATUS_ABORTING);
+        job_state_transition_locked(job, JOB_STATUS_ABORTING);
     }
 }
 
-static void job_commit(Job *job)
+/* Called with job_mutex held, but releases it temporarily. */
+static void job_commit_locked(Job *job)
 {
     assert(!job->ret);
     GLOBAL_STATE_CODE();
@@ -758,7 +776,8 @@ static void job_commit(Job *job)
     }
 }
 
-static void job_abort(Job *job)
+/* Called with job_mutex held, but releases it temporarily. */
+static void job_abort_locked(Job *job)
 {
     assert(job->ret);
     GLOBAL_STATE_CODE();
@@ -769,7 +788,8 @@ static void job_abort(Job *job)
     }
 }
 
-static void job_clean(Job *job)
+/* Called with job_mutex held, but releases it temporarily. */
+static void job_clean_locked(Job *job)
 {
     GLOBAL_STATE_CODE();
     if (job->driver->clean) {
@@ -779,21 +799,22 @@ static void job_clean(Job *job)
     }
 }
 
-static int job_finalize_single(Job *job)
+/* Called with job_mutex held, but releases it temporarily. */
+static int job_finalize_single_locked(Job *job)
 {
     int job_ret;
 
     assert(job_is_completed_locked(job));
 
     /* Ensure abort is called for late-transactional failures */
-    job_update_rc(job);
+    job_update_rc_locked(job);
 
     if (!job->ret) {
-        job_commit(job);
+        job_commit_locked(job);
     } else {
-        job_abort(job);
+        job_abort_locked(job);
     }
-    job_clean(job);
+    job_clean_locked(job);
 
     if (job->cb) {
         job_ret = job->ret;
@@ -805,18 +826,19 @@ static int job_finalize_single(Job *job)
     /* Emit events only if we actually started */
     if (job_started(job)) {
         if (job_is_cancelled_locked(job)) {
-            job_event_cancelled(job);
+            job_event_cancelled_locked(job);
         } else {
-            job_event_completed(job);
+            job_event_completed_locked(job);
         }
     }
 
-    job_txn_del_job(job);
-    job_conclude(job);
+    job_txn_del_job_locked(job);
+    job_conclude_locked(job);
     return 0;
 }
 
-static void job_cancel_async(Job *job, bool force)
+/* Called with job_mutex held, but releases it temporarily. */
+static void job_cancel_async_locked(Job *job, bool force)
 {
     GLOBAL_STATE_CODE();
     if (job->driver->cancel) {
@@ -854,7 +876,8 @@ static void job_cancel_async(Job *job, bool force)
     }
 }
 
-static void job_completed_txn_abort(Job *job)
+/* Called with job_mutex held. */
+static void job_completed_txn_abort_locked(Job *job)
 {
     AioContext *ctx;
     JobTxn *txn = job->txn;
@@ -867,12 +890,12 @@ static void job_completed_txn_abort(Job *job)
         return;
     }
     txn->aborting = true;
-    job_txn_ref(txn);
+    job_txn_ref_locked(txn);
 
     /*
      * We can only hold the single job's AioContext lock while calling
-     * job_finalize_single() because the finalization callbacks can involve
-     * calls of AIO_WAIT_WHILE(), which could deadlock otherwise.
+     * job_finalize_single_locked() because the finalization callbacks can
+     * involve calls of AIO_WAIT_WHILE(), which could deadlock otherwise.
      * Note that the job's AioContext may change when it is finalized.
      */
     job_ref(job);
@@ -890,7 +913,7 @@ static void job_completed_txn_abort(Job *job)
              * Therefore, pass force=true to terminate all other jobs as quickly
              * as possible.
              */
-            job_cancel_async(other_job, true);
+            job_cancel_async_locked(other_job, true);
             aio_context_release(ctx);
         }
     }
@@ -906,13 +929,13 @@ static void job_completed_txn_abort(Job *job)
             assert(job_cancel_requested_locked(other_job));
             job_finish_sync(other_job, NULL, NULL);
         }
-        job_finalize_single(other_job);
+        job_finalize_single_locked(other_job);
         aio_context_release(ctx);
     }
 
     /*
      * Use job_ref()/job_unref() so we can read the AioContext here
-     * even if the job went away during job_finalize_single().
+     * even if the job went away during job_finalize_single_locked().
      */
     aio_context_acquire(job->aio_context);
     job_unref(job);
@@ -920,7 +943,8 @@ static void job_completed_txn_abort(Job *job)
     job_txn_unref(txn);
 }
 
-static int job_prepare(Job *job)
+/* Called with job_mutex held, but releases it temporarily. */
+static int job_prepare_locked(Job *job)
 {
     int ret;
 
@@ -930,27 +954,29 @@ static int job_prepare(Job *job)
         ret = job->driver->prepare(job);
         job_lock();
         job->ret = ret;
-        job_update_rc(job);
+        job_update_rc_locked(job);
     }
     return job->ret;
 }
 
-static int job_needs_finalize(Job *job)
+/* Called with job_mutex held. */
+static int job_needs_finalize_locked(Job *job)
 {
     return !job->auto_finalize;
 }
 
-static void job_do_finalize(Job *job)
+/* Called with job_mutex held. */
+static void job_do_finalize_locked(Job *job)
 {
     int rc;
     assert(job && job->txn);
 
     /* prepare the transaction to complete */
-    rc = job_txn_apply(job, job_prepare);
+    rc = job_txn_apply_locked(job, job_prepare_locked);
     if (rc) {
-        job_completed_txn_abort(job);
+        job_completed_txn_abort_locked(job);
     } else {
-        job_txn_apply(job, job_finalize_single);
+        job_txn_apply_locked(job, job_finalize_single_locked);
     }
 }
 
@@ -960,14 +986,15 @@ void job_finalize(Job *job, Error **errp)
     if (job_apply_verb(job, JOB_VERB_FINALIZE, errp)) {
         return;
     }
-    job_do_finalize(job);
+    job_do_finalize_locked(job);
 }
 
-static int job_transition_to_pending(Job *job)
+/* Called with job_mutex held. */
+static int job_transition_to_pending_locked(Job *job)
 {
-    job_state_transition(job, JOB_STATUS_PENDING);
+    job_state_transition_locked(job, JOB_STATUS_PENDING);
     if (!job->auto_finalize) {
-        job_event_pending(job);
+        job_event_pending_locked(job);
     }
     return 0;
 }
@@ -975,16 +1002,17 @@ static int job_transition_to_pending(Job *job)
 void job_transition_to_ready(Job *job)
 {
     JOB_LOCK_GUARD();
-    job_state_transition(job, JOB_STATUS_READY);
-    job_event_ready(job);
+    job_state_transition_locked(job, JOB_STATUS_READY);
+    job_event_ready_locked(job);
 }
 
-static void job_completed_txn_success(Job *job)
+/* Called with job_mutex held. */
+static void job_completed_txn_success_locked(Job *job)
 {
     JobTxn *txn = job->txn;
     Job *other_job;
 
-    job_state_transition(job, JOB_STATUS_WAITING);
+    job_state_transition_locked(job, JOB_STATUS_WAITING);
 
     /*
      * Successful completion, see if there are other running jobs in this
@@ -997,24 +1025,25 @@ static void job_completed_txn_success(Job *job)
         assert(other_job->ret == 0);
     }
 
-    job_txn_apply(job, job_transition_to_pending);
+    job_txn_apply_locked(job, job_transition_to_pending_locked);
 
     /* If no jobs need manual finalization, automatically do so */
-    if (job_txn_apply(job, job_needs_finalize) == 0) {
-        job_do_finalize(job);
+    if (job_txn_apply_locked(job, job_needs_finalize_locked) == 0) {
+        job_do_finalize_locked(job);
     }
 }
 
-static void job_completed(Job *job)
+/* Called with job_mutex held. */
+static void job_completed_locked(Job *job)
 {
     assert(job && job->txn && !job_is_completed_locked(job));
 
-    job_update_rc(job);
+    job_update_rc_locked(job);
     trace_job_completed(job, job->ret);
     if (job->ret) {
-        job_completed_txn_abort(job);
+        job_completed_txn_abort_locked(job);
     } else {
-        job_completed_txn_success(job);
+        job_completed_txn_success_locked(job);
     }
 }
 
@@ -1036,15 +1065,16 @@ static void job_exit(void *opaque)
      * drain block nodes, and if .drained_poll still returned true, we would
      * deadlock. */
     job->busy = false;
-    job_event_idle(job);
+    job_event_idle_locked(job);
 
-    job_completed(job);
+    job_completed_locked(job);
 
     /*
-     * Note that calling job_completed can move the job to a different
-     * aio_context, so we cannot cache from above. job_txn_apply takes care of
-     * acquiring the new lock, and we ref/unref to avoid job_completed freeing
-     * the job underneath us.
+     * Note that calling job_completed_locked can move the job to a different
+     * aio_context, so we cannot cache from above.
+     * job_txn_apply_locked takes care of
+     * acquiring the new lock, and we ref/unref to avoid job_completed_locked
+     * freeing the job underneath us.
      */
     ctx = job->aio_context;
     job_unref(job);
@@ -1083,7 +1113,7 @@ void job_start(Job *job)
         job->pause_count--;
         job->busy = true;
         job->paused = false;
-        job_state_transition(job, JOB_STATUS_RUNNING);
+        job_state_transition_locked(job, JOB_STATUS_RUNNING);
     }
     aio_co_enter(job->aio_context, job->co);
 }
@@ -1091,25 +1121,25 @@ void job_start(Job *job)
 void job_cancel(Job *job, bool force)
 {
     if (job->status == JOB_STATUS_CONCLUDED) {
-        job_do_dismiss(job);
+        job_do_dismiss_locked(job);
         return;
     }
-    job_cancel_async(job, force);
+    job_cancel_async_locked(job, force);
     if (!job_started(job)) {
-        job_completed(job);
+        job_completed_locked(job);
     } else if (job->deferred_to_main_loop) {
         /*
-         * job_cancel_async() ignores soft-cancel requests for jobs
+         * job_cancel_async_locked() ignores soft-cancel requests for jobs
          * that are already done (i.e. deferred to the main loop).  We
          * have to check again whether the job is really cancelled.
          * (job_cancel_requested_locked() and job_is_cancelled_locked()
-         * are equivalent here, because job_cancel_async() will
+         * are equivalent here, because job_cancel_async_locked() will
          * make soft-cancel requests no-ops when deferred_to_main_loop is true.
          * We choose to call job_is_cancelled_locked() to show that we invoke
-         * job_completed_txn_abort() only for force-cancelled jobs.)
+         * job_completed_txn_abort_locked() only for force-cancelled jobs.)
          */
         if (job_is_cancelled_locked(job)) {
-            job_completed_txn_abort(job);
+            job_completed_txn_abort_locked(job);
         }
     } else {
         job_enter_cond(job, NULL);
@@ -1124,18 +1154,23 @@ void job_user_cancel(Job *job, bool force, Error **errp)
     job_cancel(job, force);
 }
 
-/* A wrapper around job_cancel() taking an Error ** parameter so it may be
+/*
+ * A wrapper around job_cancel() taking an Error ** parameter so it may be
  * used with job_finish_sync() without the need for (rather nasty) function
- * pointer casts there. */
-static void job_cancel_err(Job *job, Error **errp)
+ * pointer casts there.
+ *
+ * Called with job_mutex held.
+ */
+static void job_cancel_err_locked(Job *job, Error **errp)
 {
     job_cancel(job, false);
 }
 
 /**
- * Same as job_cancel_err(), but force-cancel.
+ * Same as job_cancel_err_locked(), but force-cancel.
+ * Called with job_mutex held.
  */
-static void job_force_cancel_err(Job *job, Error **errp)
+static void job_force_cancel_err_locked(Job *job, Error **errp)
 {
     job_cancel(job, true);
 }
@@ -1143,9 +1178,9 @@ static void job_force_cancel_err(Job *job, Error **errp)
 int job_cancel_sync(Job *job, bool force)
 {
     if (force) {
-        return job_finish_sync(job, &job_force_cancel_err, NULL);
+        return job_finish_sync(job, &job_force_cancel_err_locked, NULL);
     } else {
-        return job_finish_sync(job, &job_cancel_err, NULL);
+        return job_finish_sync(job, &job_cancel_err_locked, NULL);
     }
 }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 11/18] job.h: rename job API functions called with job_mutex held
  2022-06-16 13:18 [PATCH v7 00/18] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (9 preceding siblings ...)
  2022-06-16 13:18 ` [PATCH v7 10/18] jobs: rename static functions called with job_mutex held Emanuele Giuseppe Esposito
@ 2022-06-16 13:18 ` Emanuele Giuseppe Esposito
  2022-06-16 13:18 ` [PATCH v7 12/18] block_job: rename block_job " Emanuele Giuseppe Esposito
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-16 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel,
	Emanuele Giuseppe Esposito

With the *nop* job_lock/unlock placed, rename the job functions
of the job API that are always under job_mutex, adding "_locked"
suffix.

List of functions that get this suffix:
job_txn_unref		job_txn_add_job
job_ref			job_unref
job_enter_cond		job_finish_sync
job_is_ready		job_pause
job_resume		job_user_pause
job_user_paused		job_user_resume
job_next		job_get
job_apply_verb		job_complete
job_cancel		job_user_cancel
job_cancel_sync		job_complete_sync
job_finalize		job_dismiss

Note that "locked" refers to the *nop* job_lock/unlock, and not
real_job_lock/unlock.

No functional change intended.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block.c                          |   2 +-
 block/mirror.c                   |   2 +-
 block/replication.c              |   4 +-
 blockdev.c                       |  32 ++++----
 blockjob.c                       |  14 ++--
 include/qemu/job.h               | 119 ++++++++++++++++++----------
 job-qmp.c                        |  26 +++----
 job.c                            | 129 ++++++++++++++++---------------
 qemu-img.c                       |   6 +-
 tests/unit/test-bdrv-drain.c     |   2 +-
 tests/unit/test-block-iothread.c |   4 +-
 tests/unit/test-blockjob-txn.c   |  14 ++--
 tests/unit/test-blockjob.c       |  30 +++----
 13 files changed, 213 insertions(+), 171 deletions(-)

diff --git a/block.c b/block.c
index b6f0d860d2..36ee0090c6 100644
--- a/block.c
+++ b/block.c
@@ -4979,7 +4979,7 @@ static void bdrv_close(BlockDriverState *bs)
 void bdrv_close_all(void)
 {
     WITH_JOB_LOCK_GUARD() {
-        assert(job_next(NULL) == NULL);
+        assert(job_next_locked(NULL) == NULL);
     }
     GLOBAL_STATE_CODE();
 
diff --git a/block/mirror.c b/block/mirror.c
index f5c6bac24f..b38676e19d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1158,7 +1158,7 @@ static void mirror_complete(Job *job, Error **errp)
     /* If the job is paused, it will be re-entered when it is resumed */
     WITH_JOB_LOCK_GUARD() {
         if (!job->paused) {
-            job_enter_cond(job, NULL);
+            job_enter_cond_locked(job, NULL);
         }
     }
 }
diff --git a/block/replication.c b/block/replication.c
index a03b28726e..50ea778937 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -150,7 +150,7 @@ static void replication_close(BlockDriverState *bs)
         commit_job = &s->commit_job->job;
         assert(commit_job->aio_context == qemu_get_current_aio_context());
         WITH_JOB_LOCK_GUARD() {
-            job_cancel_sync(commit_job, false);
+            job_cancel_sync_locked(commit_job, false);
         }
     }
 
@@ -729,7 +729,7 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
          */
         if (s->backup_job) {
             WITH_JOB_LOCK_GUARD() {
-                job_cancel_sync(&s->backup_job->job, true);
+                job_cancel_sync_locked(&s->backup_job->job, true);
             }
         }
 
diff --git a/blockdev.c b/blockdev.c
index 6f83783f10..deb33b8f1e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -157,7 +157,7 @@ void blockdev_mark_auto_del(BlockBackend *blk)
             AioContext *aio_context = job->job.aio_context;
             aio_context_acquire(aio_context);
 
-            job_cancel(&job->job, false);
+            job_cancel_locked(&job->job, false);
 
             aio_context_release(aio_context);
         }
@@ -1841,7 +1841,7 @@ static void drive_backup_abort(BlkActionState *common)
         aio_context_acquire(aio_context);
 
         WITH_JOB_LOCK_GUARD() {
-            job_cancel_sync(&state->job->job, true);
+            job_cancel_sync_locked(&state->job->job, true);
         }
 
         aio_context_release(aio_context);
@@ -1944,7 +1944,7 @@ static void blockdev_backup_abort(BlkActionState *common)
         aio_context_acquire(aio_context);
 
         WITH_JOB_LOCK_GUARD() {
-            job_cancel_sync(&state->job->job, true);
+            job_cancel_sync_locked(&state->job->job, true);
         }
 
         aio_context_release(aio_context);
@@ -2396,7 +2396,7 @@ exit:
     }
 
     WITH_JOB_LOCK_GUARD() {
-        job_txn_unref(block_job_txn);
+        job_txn_unref_locked(block_job_txn);
     }
 }
 
@@ -3371,14 +3371,14 @@ void qmp_block_job_cancel(const char *device,
         force = false;
     }
 
-    if (job_user_paused(&job->job) && !force) {
+    if (job_user_paused_locked(&job->job) && !force) {
         error_setg(errp, "The block job for device '%s' is currently paused",
                    device);
         goto out;
     }
 
     trace_qmp_block_job_cancel(job);
-    job_user_cancel(&job->job, force, errp);
+    job_user_cancel_locked(&job->job, force, errp);
 out:
     aio_context_release(aio_context);
 }
@@ -3396,7 +3396,7 @@ void qmp_block_job_pause(const char *device, Error **errp)
     }
 
     trace_qmp_block_job_pause(job);
-    job_user_pause(&job->job, errp);
+    job_user_pause_locked(&job->job, errp);
     aio_context_release(aio_context);
 }
 
@@ -3413,7 +3413,7 @@ void qmp_block_job_resume(const char *device, Error **errp)
     }
 
     trace_qmp_block_job_resume(job);
-    job_user_resume(&job->job, errp);
+    job_user_resume_locked(&job->job, errp);
     aio_context_release(aio_context);
 }
 
@@ -3430,7 +3430,7 @@ void qmp_block_job_complete(const char *device, Error **errp)
     }
 
     trace_qmp_block_job_complete(job);
-    job_complete(&job->job, errp);
+    job_complete_locked(&job->job, errp);
     aio_context_release(aio_context);
 }
 
@@ -3447,16 +3447,16 @@ void qmp_block_job_finalize(const char *id, Error **errp)
     }
 
     trace_qmp_block_job_finalize(job);
-    job_ref(&job->job);
-    job_finalize(&job->job, errp);
+    job_ref_locked(&job->job);
+    job_finalize_locked(&job->job, errp);
 
     /*
-     * Job's context might have changed via job_finalize (and job_txn_apply
-     * automatically acquires the new one), so make sure we release the correct
-     * one.
+     * Job's context might have changed via job_finalize_locked
+     * (and job_txn_apply automatically acquires the new one),
+     * so make sure we release the correct one.
      */
     aio_context = block_job_get_aio_context(job);
-    job_unref(&job->job);
+    job_unref_locked(&job->job);
     aio_context_release(aio_context);
 }
 
@@ -3475,7 +3475,7 @@ void qmp_block_job_dismiss(const char *id, Error **errp)
 
     trace_qmp_block_job_dismiss(bjob);
     job = &bjob->job;
-    job_dismiss(&job, errp);
+    job_dismiss_locked(&job, errp);
     aio_context_release(aio_context);
 }
 
diff --git a/blockjob.c b/blockjob.c
index d726efe679..02a98630c9 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -65,7 +65,7 @@ BlockJob *block_job_next(BlockJob *bjob)
     GLOBAL_STATE_CODE();
 
     do {
-        job = job_next(job);
+        job = job_next_locked(job);
     } while (job && !is_block_job(job));
 
     return job ? container_of(job, BlockJob, job) : NULL;
@@ -73,7 +73,7 @@ BlockJob *block_job_next(BlockJob *bjob)
 
 BlockJob *block_job_get(const char *id)
 {
-    Job *job = job_get(id);
+    Job *job = job_get_locked(id);
     GLOBAL_STATE_CODE();
 
     if (job && is_block_job(job)) {
@@ -103,7 +103,7 @@ static void child_job_drained_begin(BdrvChild *c)
 {
     BlockJob *job = c->opaque;
     WITH_JOB_LOCK_GUARD() {
-        job_pause(&job->job);
+        job_pause_locked(&job->job);
     }
 }
 
@@ -135,7 +135,7 @@ static void child_job_drained_end(BdrvChild *c, int *drained_end_counter)
 {
     BlockJob *job = c->opaque;
     WITH_JOB_LOCK_GUARD() {
-        job_resume(&job->job);
+        job_resume_locked(&job->job);
     }
 }
 
@@ -284,7 +284,7 @@ bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
 
     GLOBAL_STATE_CODE();
 
-    if (job_apply_verb(&job->job, JOB_VERB_SET_SPEED, errp) < 0) {
+    if (job_apply_verb_locked(&job->job, JOB_VERB_SET_SPEED, errp) < 0) {
         return false;
     }
     if (speed < 0) {
@@ -308,7 +308,7 @@ bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
     }
 
     /* kick only if a timer is pending */
-    job_enter_cond(&job->job, job_timer_pending);
+    job_enter_cond_locked(&job->job, job_timer_pending);
 
     return true;
 }
@@ -563,7 +563,7 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
     if (action == BLOCK_ERROR_ACTION_STOP) {
         WITH_JOB_LOCK_GUARD() {
             if (!job->job.user_paused) {
-                job_pause(&job->job);
+                job_pause_locked(&job->job);
                 /*
                  * make the pause user visible, which will be
                  * resumed from QMP.
diff --git a/include/qemu/job.h b/include/qemu/job.h
index 275d593715..246af068a1 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -341,7 +341,7 @@ void job_unlock(void);
 
 /**
  * Allocate and return a new job transaction. Jobs can be added to the
- * transaction using job_txn_add_job().
+ * transaction using job_txn_add_job_locked().
  *
  * The transaction is automatically freed when the last job completes or is
  * cancelled.
@@ -353,10 +353,12 @@ void job_unlock(void);
 JobTxn *job_txn_new(void);
 
 /**
- * Release a reference that was previously acquired with job_txn_add_job or
- * job_txn_new. If it's the last reference to the object, it will be freed.
+ * Release a reference that was previously acquired with job_txn_add_job_locked
+ * or job_txn_new. If it's the last reference to the object, it will be freed.
+ *
+ * Called between job_lock and job_unlock.
  */
-void job_txn_unref(JobTxn *txn);
+void job_txn_unref_locked(JobTxn *txn);
 
 /**
  * Create a new long-running job and return it.
@@ -375,16 +377,20 @@ void *job_create(const char *job_id, const JobDriver *driver, JobTxn *txn,
                  void *opaque, Error **errp);
 
 /**
- * Add a reference to Job refcnt, it will be decreased with job_unref, and then
- * be freed if it comes to be the last reference.
+ * Add a reference to Job refcnt, it will be decreased with job_unref_locked,
+ * and then be freed if it comes to be the last reference.
+ *
+ * Called between job_lock and job_unlock.
  */
-void job_ref(Job *job);
+void job_ref_locked(Job *job);
 
 /**
- * Release a reference that was previously acquired with job_ref() or
+ * Release a reference that was previously acquired with job_ref_locked() or
  * job_create(). If it's the last reference to the object, it will be freed.
+ *
+ * Called between job_lock and job_unlock, but might release it temporarily.
  */
-void job_unref(Job *job);
+void job_unref_locked(Job *job);
 
 /**
  * @job: The job that has made progress
@@ -423,8 +429,10 @@ void job_progress_increase_remaining(Job *job, uint64_t delta);
  * Conditionally enter the job coroutine if the job is ready to run, not
  * already busy and fn() returns true. fn() is called while under the job_lock
  * critical section.
+ *
+ * Called between job_lock and job_unlock, but might release it temporarily.
  */
-void job_enter_cond(Job *job, bool(*fn)(Job *job));
+void job_enter_cond_locked(Job *job, bool(*fn)(Job *job));
 
 /**
  * @job: A job that has not yet been started.
@@ -444,8 +452,8 @@ void job_enter(Job *job);
 /**
  * @job: The job that is ready to pause.
  *
- * Pause now if job_pause() has been called. Jobs that perform lots of I/O
- * must call this between requests so that the job can be paused.
+ * Pause now if job_pause_locked() has been called. Jobs that perform lots of
+ * I/O must call this between requests so that the job can be paused.
  */
 void coroutine_fn job_pause_point(Job *job);
 
@@ -511,50 +519,68 @@ bool job_is_ready_locked(Job *job);
 
 /**
  * Request @job to pause at the next pause point. Must be paired with
- * job_resume(). If the job is supposed to be resumed by user action, call
- * job_user_pause() instead.
+ * job_resume_locked(). If the job is supposed to be resumed by user action,
+ * call job_user_pause_locked() instead.
+ *
+ * Called between job_lock and job_unlock.
  */
-void job_pause(Job *job);
+void job_pause_locked(Job *job);
 
-/** Resumes a @job paused with job_pause. */
-void job_resume(Job *job);
+/**
+ * Resumes a @job paused with job_pause_locked.
+ * Called between job_lock and job_unlock.
+ */
+void job_resume_locked(Job *job);
 
 /**
  * Asynchronously pause the specified @job.
- * Do not allow a resume until a matching call to job_user_resume.
+ * Do not allow a resume until a matching call to job_user_resume_locked.
+ *
+ * Called between job_lock and job_unlock.
  */
-void job_user_pause(Job *job, Error **errp);
+void job_user_pause_locked(Job *job, Error **errp);
 
-/** Returns true if the job is user-paused. */
-bool job_user_paused(Job *job);
+/**
+ * Returns true if the job is user-paused.
+ * Called between job_lock and job_unlock.
+ */
+bool job_user_paused_locked(Job *job);
 
 /**
  * Resume the specified @job.
- * Must be paired with a preceding job_user_pause.
+ * Must be paired with a preceding job_user_pause_locked.
+ *
+ * Called between job_lock and job_unlock, but might release it temporarily.
  */
-void job_user_resume(Job *job, Error **errp);
+void job_user_resume_locked(Job *job, Error **errp);
 
 /**
  * Get the next element from the list of block jobs after @job, or the
  * first one if @job is %NULL.
  *
  * Returns the requested job, or %NULL if there are no more jobs left.
+ *
+ * Called between job_lock and job_unlock.
  */
-Job *job_next(Job *job);
+Job *job_next_locked(Job *job);
 
 /**
  * Get the job identified by @id (which must not be %NULL).
  *
  * Returns the requested job, or %NULL if it doesn't exist.
+ *
+ * Called between job_lock and job_unlock.
  */
-Job *job_get(const char *id);
+Job *job_get_locked(const char *id);
 
 /**
  * Check whether the verb @verb can be applied to @job in its current state.
  * Returns 0 if the verb can be applied; otherwise errp is set and -EPERM
  * returned.
+ *
+ * Called between job_lock and job_unlock.
  */
-int job_apply_verb(Job *job, JobVerb verb, Error **errp);
+int job_apply_verb_locked(Job *job, JobVerb verb, Error **errp);
 
 /** The @job could not be started, free it. */
 void job_early_fail(Job *job);
@@ -562,20 +588,27 @@ void job_early_fail(Job *job);
 /** Moves the @job from RUNNING to READY */
 void job_transition_to_ready(Job *job);
 
-/** Asynchronously complete the specified @job. */
-void job_complete(Job *job, Error **errp);
+/**
+ * Asynchronously complete the specified @job.
+ * Called between job_lock and job_unlock, but it releases the lock temporarily.
+ */
+void job_complete_locked(Job *job, Error **errp);
 
 /**
  * Asynchronously cancel the specified @job. If @force is true, the job should
  * be cancelled immediately without waiting for a consistent state.
+ *
+ * Called between job_lock and job_unlock.
  */
-void job_cancel(Job *job, bool force);
+void job_cancel_locked(Job *job, bool force);
 
 /**
- * Cancels the specified job like job_cancel(), but may refuse to do so if the
- * operation isn't meaningful in the current state of the job.
+ * Cancels the specified job like job_cancel_locked(), but may refuse to do so
+ * if the operation isn't meaningful in the current state of the job.
+ *
+ * Called between job_lock and job_unlock.
  */
-void job_user_cancel(Job *job, bool force, Error **errp);
+void job_user_cancel_locked(Job *job, bool force, Error **errp);
 
 /**
  * Synchronously cancel the @job.  The completion callback is called
@@ -587,15 +620,16 @@ void job_user_cancel(Job *job, bool force, Error **errp);
  * during the call, or -ECANCELED if it was canceled.
  *
  * Callers must hold the AioContext lock of job->aio_context.
+ * Called between job_lock and job_unlock.
  */
-int job_cancel_sync(Job *job, bool force);
+int job_cancel_sync_locked(Job *job, bool force);
 
-/** Synchronously force-cancels all jobs using job_cancel_sync(). */
+/** Synchronously force-cancels all jobs using job_cancel_sync_locked(). */
 void job_cancel_sync_all(void);
 
 /**
  * @job: The job to be completed.
- * @errp: Error object which may be set by job_complete(); this is not
+ * @errp: Error object which may be set by job_complete_locked(); this is not
  *        necessarily set on every error, the job return value has to be
  *        checked as well.
  *
@@ -606,8 +640,9 @@ void job_cancel_sync_all(void);
  * Returns the return value from the job.
  *
  * Callers must hold the AioContext lock of job->aio_context.
+ * Called between job_lock and job_unlock.
  */
-int job_complete_sync(Job *job, Error **errp);
+int job_complete_sync_locked(Job *job, Error **errp);
 
 /**
  * For a @job that has finished its work and is pending awaiting explicit
@@ -616,14 +651,18 @@ int job_complete_sync(Job *job, Error **errp);
  * FIXME: Make the below statement universally true:
  * For jobs that support the manual workflow mode, all graph changes that occur
  * as a result will occur after this command and before a successful reply.
+ *
+ * Called between job_lock and job_unlock.
  */
-void job_finalize(Job *job, Error **errp);
+void job_finalize_locked(Job *job, Error **errp);
 
 /**
  * Remove the concluded @job from the query list and resets the passed pointer
  * to %NULL. Returns an error if the job is not actually concluded.
+ *
+ * Called between job_lock and job_unlock.
  */
-void job_dismiss(Job **job, Error **errp);
+void job_dismiss_locked(Job **job, Error **errp);
 
 /**
  * Synchronously finishes the given @job. If @finish is given, it is called to
@@ -633,7 +672,9 @@ void job_dismiss(Job **job, Error **errp);
  * cancelled before completing, and -errno in other error cases.
  *
  * Callers must hold the AioContext lock of job->aio_context.
+ * Called between job_lock and job_unlock.
  */
-int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp);
+int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
+                           Error **errp);
 
 #endif
diff --git a/job-qmp.c b/job-qmp.c
index 58ca9b6632..c2eabae09c 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -39,7 +39,7 @@ static Job *find_job_locked(const char *id, AioContext **aio_context, Error **er
 
     *aio_context = NULL;
 
-    job = job_get(id);
+    job = job_get_locked(id);
     if (!job) {
         error_setg(errp, "Job not found");
         return NULL;
@@ -64,7 +64,7 @@ void qmp_job_cancel(const char *id, Error **errp)
     }
 
     trace_qmp_job_cancel(job);
-    job_user_cancel(job, true, errp);
+    job_user_cancel_locked(job, true, errp);
     aio_context_release(aio_context);
 }
 
@@ -81,7 +81,7 @@ void qmp_job_pause(const char *id, Error **errp)
     }
 
     trace_qmp_job_pause(job);
-    job_user_pause(job, errp);
+    job_user_pause_locked(job, errp);
     aio_context_release(aio_context);
 }
 
@@ -98,7 +98,7 @@ void qmp_job_resume(const char *id, Error **errp)
     }
 
     trace_qmp_job_resume(job);
-    job_user_resume(job, errp);
+    job_user_resume_locked(job, errp);
     aio_context_release(aio_context);
 }
 
@@ -115,7 +115,7 @@ void qmp_job_complete(const char *id, Error **errp)
     }
 
     trace_qmp_job_complete(job);
-    job_complete(job, errp);
+    job_complete_locked(job, errp);
     aio_context_release(aio_context);
 }
 
@@ -132,16 +132,16 @@ void qmp_job_finalize(const char *id, Error **errp)
     }
 
     trace_qmp_job_finalize(job);
-    job_ref(job);
-    job_finalize(job, errp);
+    job_ref_locked(job);
+    job_finalize_locked(job, errp);
 
     /*
-     * Job's context might have changed via job_finalize (and job_txn_apply
-     * automatically acquires the new one), so make sure we release the correct
-     * one.
+     * Job's context might have changed via job_finalize_locked
+     * (and job_txn_apply automatically acquires the new one),
+     * so make sure we release the correct one.
      */
     aio_context = job->aio_context;
-    job_unref(job);
+    job_unref_locked(job);
     aio_context_release(aio_context);
 }
 
@@ -158,7 +158,7 @@ void qmp_job_dismiss(const char *id, Error **errp)
     }
 
     trace_qmp_job_dismiss(job);
-    job_dismiss(&job, errp);
+    job_dismiss_locked(&job, errp);
     aio_context_release(aio_context);
 }
 
@@ -194,7 +194,7 @@ JobInfoList *qmp_query_jobs(Error **errp)
 
     JOB_LOCK_GUARD();
 
-    for (job = job_next(NULL); job; job = job_next(job)) {
+    for (job = job_next_locked(NULL); job; job = job_next_locked(job)) {
         JobInfo *value;
         AioContext *aio_context;
 
diff --git a/job.c b/job.c
index 4f4b387625..5c0cb37175 100644
--- a/job.c
+++ b/job.c
@@ -119,7 +119,7 @@ static void job_txn_ref_locked(JobTxn *txn)
     txn->refcnt++;
 }
 
-void job_txn_unref(JobTxn *txn)
+void job_txn_unref_locked(JobTxn *txn)
 {
     if (txn && --txn->refcnt == 0) {
         g_free(txn);
@@ -136,7 +136,7 @@ void job_txn_unref(JobTxn *txn)
  *
  * If @txn is NULL, the function does nothing.
  */
-static void job_txn_add_job(JobTxn *txn, Job *job)
+static void job_txn_add_job_locked(JobTxn *txn, Job *job)
 {
     if (!txn) {
         return;
@@ -154,7 +154,7 @@ static void job_txn_del_job_locked(Job *job)
 {
     if (job->txn) {
         QLIST_REMOVE(job, txn_list);
-        job_txn_unref(job->txn);
+        job_txn_unref_locked(job->txn);
         job->txn = NULL;
     }
 }
@@ -173,7 +173,7 @@ static int job_txn_apply_locked(Job *job, int fn(Job *))
      * the caller, we need to release it here to avoid holding the lock
      * twice - which would break AIO_WAIT_WHILE from within fn.
      */
-    job_ref(job);
+    job_ref_locked(job);
     aio_context_release(job->aio_context);
 
     QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
@@ -191,7 +191,7 @@ static int job_txn_apply_locked(Job *job, int fn(Job *))
      * can't use a local variable to cache it.
      */
     aio_context_acquire(job->aio_context);
-    job_unref(job);
+    job_unref_locked(job);
     return rc;
 }
 
@@ -216,7 +216,7 @@ static void job_state_transition_locked(Job *job, JobStatus s1)
     }
 }
 
-int job_apply_verb(Job *job, JobVerb verb, Error **errp)
+int job_apply_verb_locked(Job *job, JobVerb verb, Error **errp)
 {
     JobStatus s0 = job->status;
     assert(verb >= 0 && verb < JOB_VERB__MAX);
@@ -332,7 +332,7 @@ static bool job_should_pause_locked(Job *job)
     return job->pause_count > 0;
 }
 
-Job *job_next(Job *job)
+Job *job_next_locked(Job *job)
 {
     if (!job) {
         return QLIST_FIRST(&jobs);
@@ -340,7 +340,7 @@ Job *job_next(Job *job)
     return QLIST_NEXT(job, job_list);
 }
 
-Job *job_get(const char *id)
+Job *job_get_locked(const char *id)
 {
     Job *job;
 
@@ -377,7 +377,7 @@ void *job_create(const char *job_id, const JobDriver *driver, JobTxn *txn,
             error_setg(errp, "Invalid job ID '%s'", job_id);
             return NULL;
         }
-        if (job_get(job_id)) {
+        if (job_get_locked(job_id)) {
             error_setg(errp, "Job ID '%s' already in use", job_id);
             return NULL;
         }
@@ -418,21 +418,21 @@ void *job_create(const char *job_id, const JobDriver *driver, JobTxn *txn,
      * consolidating the job management logic */
     if (!txn) {
         txn = job_txn_new();
-        job_txn_add_job(txn, job);
-        job_txn_unref(txn);
+        job_txn_add_job_locked(txn, job);
+        job_txn_unref_locked(txn);
     } else {
-        job_txn_add_job(txn, job);
+        job_txn_add_job_locked(txn, job);
     }
 
     return job;
 }
 
-void job_ref(Job *job)
+void job_ref_locked(Job *job)
 {
     ++job->refcnt;
 }
 
-void job_unref(Job *job)
+void job_unref_locked(Job *job)
 {
     GLOBAL_STATE_CODE();
 
@@ -507,7 +507,7 @@ static void job_event_idle_locked(Job *job)
     notifier_list_notify(&job->on_idle, job);
 }
 
-void job_enter_cond(Job *job, bool(*fn)(Job *job))
+void job_enter_cond_locked(Job *job, bool(*fn)(Job *job))
 {
     if (!job_started(job)) {
         return;
@@ -537,7 +537,7 @@ void job_enter_cond(Job *job, bool(*fn)(Job *job))
 void job_enter(Job *job)
 {
     JOB_LOCK_GUARD();
-    job_enter_cond(job, NULL);
+    job_enter_cond_locked(job, NULL);
 }
 
 /*
@@ -563,7 +563,7 @@ static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns)
     qemu_coroutine_yield();
     job_lock();
 
-    /* Set by job_enter_cond() before re-entering the coroutine.  */
+    /* Set by job_enter_cond_locked() before re-entering the coroutine.  */
     assert(job->busy);
 }
 
@@ -649,15 +649,15 @@ static bool job_timer_not_pending_locked(Job *job)
     return !timer_pending(&job->sleep_timer);
 }
 
-void job_pause(Job *job)
+void job_pause_locked(Job *job)
 {
     job->pause_count++;
     if (!job->paused) {
-        job_enter_cond(job, NULL);
+        job_enter_cond_locked(job, NULL);
     }
 }
 
-void job_resume(Job *job)
+void job_resume_locked(Job *job)
 {
     assert(job->pause_count > 0);
     job->pause_count--;
@@ -666,12 +666,12 @@ void job_resume(Job *job)
     }
 
     /* kick only if no timer is pending */
-    job_enter_cond(job, job_timer_not_pending_locked);
+    job_enter_cond_locked(job, job_timer_not_pending_locked);
 }
 
-void job_user_pause(Job *job, Error **errp)
+void job_user_pause_locked(Job *job, Error **errp)
 {
-    if (job_apply_verb(job, JOB_VERB_PAUSE, errp)) {
+    if (job_apply_verb_locked(job, JOB_VERB_PAUSE, errp)) {
         return;
     }
     if (job->user_paused) {
@@ -679,15 +679,15 @@ void job_user_pause(Job *job, Error **errp)
         return;
     }
     job->user_paused = true;
-    job_pause(job);
+    job_pause_locked(job);
 }
 
-bool job_user_paused(Job *job)
+bool job_user_paused_locked(Job *job)
 {
     return job->user_paused;
 }
 
-void job_user_resume(Job *job, Error **errp)
+void job_user_resume_locked(Job *job, Error **errp)
 {
     assert(job);
     GLOBAL_STATE_CODE();
@@ -695,7 +695,7 @@ void job_user_resume(Job *job, Error **errp)
         error_setg(errp, "Can't resume a job that was not paused");
         return;
     }
-    if (job_apply_verb(job, JOB_VERB_RESUME, errp)) {
+    if (job_apply_verb_locked(job, JOB_VERB_RESUME, errp)) {
         return;
     }
     if (job->driver->user_resume) {
@@ -704,7 +704,7 @@ void job_user_resume(Job *job, Error **errp)
         job_lock();
     }
     job->user_paused = false;
-    job_resume(job);
+    job_resume_locked(job);
 }
 
 /* Called with job_mutex held. */
@@ -718,15 +718,15 @@ static void job_do_dismiss_locked(Job *job)
     job_txn_del_job_locked(job);
 
     job_state_transition_locked(job, JOB_STATUS_NULL);
-    job_unref(job);
+    job_unref_locked(job);
 }
 
-void job_dismiss(Job **jobptr, Error **errp)
+void job_dismiss_locked(Job **jobptr, Error **errp)
 {
     Job *job = *jobptr;
     /* similarly to _complete, this is QMP-interface only. */
     assert(job->id);
-    if (job_apply_verb(job, JOB_VERB_DISMISS, errp)) {
+    if (job_apply_verb_locked(job, JOB_VERB_DISMISS, errp)) {
         return;
     }
 
@@ -898,7 +898,7 @@ static void job_completed_txn_abort_locked(Job *job)
      * involve calls of AIO_WAIT_WHILE(), which could deadlock otherwise.
      * Note that the job's AioContext may change when it is finalized.
      */
-    job_ref(job);
+    job_ref_locked(job);
     aio_context_release(job->aio_context);
 
     /* Other jobs are effectively cancelled by us, set the status for
@@ -927,20 +927,20 @@ static void job_completed_txn_abort_locked(Job *job)
         aio_context_acquire(ctx);
         if (!job_is_completed_locked(other_job)) {
             assert(job_cancel_requested_locked(other_job));
-            job_finish_sync(other_job, NULL, NULL);
+            job_finish_sync_locked(other_job, NULL, NULL);
         }
         job_finalize_single_locked(other_job);
         aio_context_release(ctx);
     }
 
     /*
-     * Use job_ref()/job_unref() so we can read the AioContext here
-     * even if the job went away during job_finalize_single_locked().
+     * Use job_ref_locked()/job_unref_locked() so we can read the AioContext
+     * here even if the job went away during job_finalize_single_locked().
      */
     aio_context_acquire(job->aio_context);
-    job_unref(job);
+    job_unref_locked(job);
 
-    job_txn_unref(txn);
+    job_txn_unref_locked(txn);
 }
 
 /* Called with job_mutex held, but releases it temporarily. */
@@ -980,10 +980,10 @@ static void job_do_finalize_locked(Job *job)
     }
 }
 
-void job_finalize(Job *job, Error **errp)
+void job_finalize_locked(Job *job, Error **errp)
 {
     assert(job && job->id);
-    if (job_apply_verb(job, JOB_VERB_FINALIZE, errp)) {
+    if (job_apply_verb_locked(job, JOB_VERB_FINALIZE, errp)) {
         return;
     }
     job_do_finalize_locked(job);
@@ -1057,7 +1057,7 @@ static void job_exit(void *opaque)
     AioContext *ctx;
     JOB_LOCK_GUARD();
 
-    job_ref(job);
+    job_ref_locked(job);
     aio_context_acquire(job->aio_context);
 
     /* This is a lie, we're not quiescent, but still doing the completion
@@ -1077,7 +1077,7 @@ static void job_exit(void *opaque)
      * freeing the job underneath us.
      */
     ctx = job->aio_context;
-    job_unref(job);
+    job_unref_locked(job);
     aio_context_release(ctx);
 }
 
@@ -1118,7 +1118,7 @@ void job_start(Job *job)
     aio_co_enter(job->aio_context, job->co);
 }
 
-void job_cancel(Job *job, bool force)
+void job_cancel_locked(Job *job, bool force)
 {
     if (job->status == JOB_STATUS_CONCLUDED) {
         job_do_dismiss_locked(job);
@@ -1142,28 +1142,28 @@ void job_cancel(Job *job, bool force)
             job_completed_txn_abort_locked(job);
         }
     } else {
-        job_enter_cond(job, NULL);
+        job_enter_cond_locked(job, NULL);
     }
 }
 
-void job_user_cancel(Job *job, bool force, Error **errp)
+void job_user_cancel_locked(Job *job, bool force, Error **errp)
 {
-    if (job_apply_verb(job, JOB_VERB_CANCEL, errp)) {
+    if (job_apply_verb_locked(job, JOB_VERB_CANCEL, errp)) {
         return;
     }
-    job_cancel(job, force);
+    job_cancel_locked(job, force);
 }
 
 /*
- * A wrapper around job_cancel() taking an Error ** parameter so it may be
- * used with job_finish_sync() without the need for (rather nasty) function
- * pointer casts there.
+ * A wrapper around job_cancel_locked() taking an Error ** parameter
+ * so it may be used with job_finish_sync_locked() without the need
+ * for (rather nasty) function pointer casts there.
  *
  * Called with job_mutex held.
  */
 static void job_cancel_err_locked(Job *job, Error **errp)
 {
-    job_cancel(job, false);
+    job_cancel_locked(job, false);
 }
 
 /**
@@ -1172,15 +1172,15 @@ static void job_cancel_err_locked(Job *job, Error **errp)
  */
 static void job_force_cancel_err_locked(Job *job, Error **errp)
 {
-    job_cancel(job, true);
+    job_cancel_locked(job, true);
 }
 
-int job_cancel_sync(Job *job, bool force)
+int job_cancel_sync_locked(Job *job, bool force)
 {
     if (force) {
-        return job_finish_sync(job, &job_force_cancel_err_locked, NULL);
+        return job_finish_sync_locked(job, &job_force_cancel_err_locked, NULL);
     } else {
-        return job_finish_sync(job, &job_cancel_err_locked, NULL);
+        return job_finish_sync_locked(job, &job_cancel_err_locked, NULL);
     }
 }
 
@@ -1190,25 +1190,25 @@ void job_cancel_sync_all(void)
     AioContext *aio_context;
 
     JOB_LOCK_GUARD();
-    while ((job = job_next(NULL))) {
+    while ((job = job_next_locked(NULL))) {
         aio_context = job->aio_context;
         aio_context_acquire(aio_context);
-        job_cancel_sync(job, true);
+        job_cancel_sync_locked(job, true);
         aio_context_release(aio_context);
     }
 }
 
-int job_complete_sync(Job *job, Error **errp)
+int job_complete_sync_locked(Job *job, Error **errp)
 {
-    return job_finish_sync(job, job_complete, errp);
+    return job_finish_sync_locked(job, job_complete_locked, errp);
 }
 
-void job_complete(Job *job, Error **errp)
+void job_complete_locked(Job *job, Error **errp)
 {
     /* Should not be reachable via external interface for internal jobs */
     assert(job->id);
     GLOBAL_STATE_CODE();
-    if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
+    if (job_apply_verb_locked(job, JOB_VERB_COMPLETE, errp)) {
         return;
     }
     if (job_cancel_requested_locked(job) || !job->driver->complete) {
@@ -1222,19 +1222,20 @@ void job_complete(Job *job, Error **errp)
     job_lock();
 }
 
-int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
+int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
+                           Error **errp)
 {
     Error *local_err = NULL;
     int ret;
 
-    job_ref(job);
+    job_ref_locked(job);
 
     if (finish) {
         finish(job, &local_err);
     }
     if (local_err) {
         error_propagate(errp, local_err);
-        job_unref(job);
+        job_unref_locked(job);
         return -EBUSY;
     }
 
@@ -1245,6 +1246,6 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
 
     ret = (job_is_cancelled_locked(job) && job->ret == 0)
           ? -ECANCELED : job->ret;
-    job_unref(job);
+    job_unref_locked(job);
     return ret;
 }
diff --git a/qemu-img.c b/qemu-img.c
index d1f5eda687..f0b7f71e78 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -913,7 +913,7 @@ static void run_block_job(BlockJob *job, Error **errp)
 
     aio_context_acquire(aio_context);
     WITH_JOB_LOCK_GUARD() {
-        job_ref(&job->job);
+        job_ref_locked(&job->job);
         do {
             float progress = 0.0f;
             job_unlock();
@@ -930,11 +930,11 @@ static void run_block_job(BlockJob *job, Error **errp)
                  !job_is_completed_locked(&job->job));
 
         if (!job_is_completed_locked(&job->job)) {
-            ret = job_complete_sync(&job->job, errp);
+            ret = job_complete_sync_locked(&job->job, errp);
         } else {
             ret = job->job.ret;
         }
-        job_unref(&job->job);
+        job_unref_locked(&job->job);
     }
     aio_context_release(aio_context);
 
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 181458eecb..0db056ea63 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -1018,7 +1018,7 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
 
     aio_context_acquire(ctx);
     WITH_JOB_LOCK_GUARD() {
-        ret = job_complete_sync(&job->job, &error_abort);
+        ret = job_complete_sync_locked(&job->job, &error_abort);
     }
     g_assert_cmpint(ret, ==, (result == TEST_JOB_SUCCESS ? 0 : -EIO));
 
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index 9866262f79..89e7f0fffb 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -457,7 +457,7 @@ static void test_attach_blockjob(void)
 
     aio_context_acquire(ctx);
     WITH_JOB_LOCK_GUARD() {
-        job_complete_sync(&tjob->common.job, &error_abort);
+        job_complete_sync_locked(&tjob->common.job, &error_abort);
     }
     blk_set_aio_context(blk, qemu_get_aio_context(), &error_abort);
     aio_context_release(ctx);
@@ -633,7 +633,7 @@ static void test_propagate_mirror(void)
                  false, "filter_node", MIRROR_COPY_MODE_BACKGROUND,
                  &error_abort);
     WITH_JOB_LOCK_GUARD() {
-        job = job_get("job0");
+        job = job_get_locked("job0");
     }
     filter = bdrv_find_node("filter_node");
 
diff --git a/tests/unit/test-blockjob-txn.c b/tests/unit/test-blockjob-txn.c
index 0355e54001..8dc1eaefc8 100644
--- a/tests/unit/test-blockjob-txn.c
+++ b/tests/unit/test-blockjob-txn.c
@@ -118,7 +118,7 @@ static void test_single_job(int expected)
 
     WITH_JOB_LOCK_GUARD() {
         if (expected == -ECANCELED) {
-            job_cancel(&job->job, false);
+            job_cancel_locked(&job->job, false);
         }
     }
 
@@ -128,7 +128,7 @@ static void test_single_job(int expected)
     g_assert_cmpint(result, ==, expected);
 
     WITH_JOB_LOCK_GUARD() {
-        job_txn_unref(txn);
+        job_txn_unref_locked(txn);
     }
 }
 
@@ -165,13 +165,13 @@ static void test_pair_jobs(int expected1, int expected2)
      * use-after-free bugs as possible.
      */
     WITH_JOB_LOCK_GUARD() {
-        job_txn_unref(txn);
+        job_txn_unref_locked(txn);
 
         if (expected1 == -ECANCELED) {
-            job_cancel(&job1->job, false);
+            job_cancel_locked(&job1->job, false);
         }
         if (expected2 == -ECANCELED) {
-            job_cancel(&job2->job, false);
+            job_cancel_locked(&job2->job, false);
         }
     }
 
@@ -226,7 +226,7 @@ static void test_pair_jobs_fail_cancel_race(void)
     job_start(&job2->job);
 
     WITH_JOB_LOCK_GUARD() {
-        job_cancel(&job1->job, false);
+        job_cancel_locked(&job1->job, false);
     }
 
     /* Now make job2 finish before the main loop kicks jobs.  This simulates
@@ -243,7 +243,7 @@ static void test_pair_jobs_fail_cancel_race(void)
     g_assert_cmpint(result2, ==, -ECANCELED);
 
     WITH_JOB_LOCK_GUARD() {
-        job_txn_unref(txn);
+        job_txn_unref_locked(txn);
     }
 }
 
diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c
index ab7958dad5..8280b1e0c9 100644
--- a/tests/unit/test-blockjob.c
+++ b/tests/unit/test-blockjob.c
@@ -212,7 +212,7 @@ static CancelJob *create_common(Job **pjob)
                   JOB_MANUAL_FINALIZE | JOB_MANUAL_DISMISS);
     job = &bjob->job;
     WITH_JOB_LOCK_GUARD() {
-        job_ref(job);
+        job_ref_locked(job);
         assert(job->status == JOB_STATUS_CREATED);
     }
 
@@ -234,13 +234,13 @@ static void cancel_common(CancelJob *s)
     aio_context_acquire(ctx);
 
     WITH_JOB_LOCK_GUARD() {
-        job_cancel_sync(&job->job, true);
+        job_cancel_sync_locked(&job->job, true);
         if (sts != JOB_STATUS_CREATED && sts != JOB_STATUS_CONCLUDED) {
             Job *dummy = &job->job;
-            job_dismiss(&dummy, &error_abort);
+            job_dismiss_locked(&dummy, &error_abort);
         }
         assert(job->job.status == JOB_STATUS_NULL);
-        job_unref(&job->job);
+        job_unref_locked(&job->job);
     }
     destroy_blk(blk);
 
@@ -288,7 +288,7 @@ static void test_cancel_paused(void)
     assert(job->status == JOB_STATUS_RUNNING);
 
     WITH_JOB_LOCK_GUARD() {
-        job_user_pause(job, &error_abort);
+        job_user_pause_locked(job, &error_abort);
     }
     job_enter(job);
     assert(job->status == JOB_STATUS_PAUSED);
@@ -336,7 +336,7 @@ static void test_cancel_standby(void)
     assert(job->status == JOB_STATUS_READY);
 
     WITH_JOB_LOCK_GUARD() {
-        job_user_pause(job, &error_abort);
+        job_user_pause_locked(job, &error_abort);
     }
     job_enter(job);
     assert(job->status == JOB_STATUS_STANDBY);
@@ -363,7 +363,7 @@ static void test_cancel_pending(void)
     assert(job->status == JOB_STATUS_READY);
 
     WITH_JOB_LOCK_GUARD() {
-        job_complete(job, &error_abort);
+        job_complete_locked(job, &error_abort);
     }
     job_enter(job);
     while (!job->deferred_to_main_loop) {
@@ -395,7 +395,7 @@ static void test_cancel_concluded(void)
     assert(job->status == JOB_STATUS_READY);
 
     WITH_JOB_LOCK_GUARD() {
-        job_complete(job, &error_abort);
+        job_complete_locked(job, &error_abort);
     }
     job_enter(job);
     while (!job->deferred_to_main_loop) {
@@ -407,7 +407,7 @@ static void test_cancel_concluded(void)
 
     aio_context_acquire(job->aio_context);
     WITH_JOB_LOCK_GUARD() {
-        job_finalize(job, &error_abort);
+        job_finalize_locked(job, &error_abort);
     }
     aio_context_release(job->aio_context);
     assert(job->status == JOB_STATUS_CONCLUDED);
@@ -456,7 +456,7 @@ static const BlockJobDriver test_yielding_driver = {
 };
 
 /*
- * Test that job_complete() works even on jobs that are in a paused
+ * Test that job_complete_locked() works even on jobs that are in a paused
  * state (i.e., STANDBY).
  *
  * To do this, run YieldingJob in an IO thread, get it into the READY
@@ -464,7 +464,7 @@ static const BlockJobDriver test_yielding_driver = {
  * acquire the context so the job will not be entered and will thus
  * remain on STANDBY.
  *
- * job_complete() should still work without error.
+ * job_complete_locked() should still work without error.
  *
  * Note that on the QMP interface, it is impossible to lock an IO
  * thread before a drained section ends.  In practice, the
@@ -526,16 +526,16 @@ static void test_complete_in_standby(void)
         assert(job->status == JOB_STATUS_STANDBY);
 
         /* Even though the job is on standby, this should work */
-        job_complete(job, &error_abort);
+        job_complete_locked(job, &error_abort);
 
         /* The test is done now, clean up. */
-        job_finish_sync(job, NULL, &error_abort);
+        job_finish_sync_locked(job, NULL, &error_abort);
         assert(job->status == JOB_STATUS_PENDING);
 
-        job_finalize(job, &error_abort);
+        job_finalize_locked(job, &error_abort);
         assert(job->status == JOB_STATUS_CONCLUDED);
 
-        job_dismiss(&job, &error_abort);
+        job_dismiss_locked(&job, &error_abort);
     }
 
     destroy_blk(blk);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 12/18] block_job: rename block_job functions called with job_mutex held
  2022-06-16 13:18 [PATCH v7 00/18] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (10 preceding siblings ...)
  2022-06-16 13:18 ` [PATCH v7 11/18] job.h: rename job API " Emanuele Giuseppe Esposito
@ 2022-06-16 13:18 ` Emanuele Giuseppe Esposito
  2022-06-16 13:18 ` [PATCH v7 13/18] job.h: define unlocked functions Emanuele Giuseppe Esposito
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-16 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel,
	Emanuele Giuseppe Esposito

Just as for the job API, rename block_job functions that are
always called under job lock.

No functional change intended.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block.c                  |  3 ++-
 block/backup.c           |  4 ++--
 blockdev.c               | 12 +++++++-----
 blockjob.c               | 39 ++++++++++++++++++++++-----------------
 include/block/blockjob.h | 29 ++++++++++++++++++-----------
 monitor/qmp-cmds.c       |  5 +++--
 qemu-img.c               |  2 +-
 7 files changed, 55 insertions(+), 39 deletions(-)

diff --git a/block.c b/block.c
index 36ee0090c6..d1ea17551d 100644
--- a/block.c
+++ b/block.c
@@ -6168,7 +6168,8 @@ XDbgBlockGraph *bdrv_get_xdbg_block_graph(Error **errp)
     }
 
     WITH_JOB_LOCK_GUARD() {
-        for (job = block_job_next(NULL); job; job = block_job_next(job)) {
+        for (job = block_job_next_locked(NULL); job;
+             job = block_job_next_locked(job)) {
             GSList *el;
 
             xdbg_graph_add_node(gr, job, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
diff --git a/block/backup.c b/block/backup.c
index b2b649e305..4db9376657 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -314,8 +314,8 @@ static void coroutine_fn backup_set_speed(BlockJob *job, int64_t speed)
     BackupBlockJob *s = container_of(job, BackupBlockJob, common);
 
     /*
-     * block_job_set_speed() is called first from block_job_create(), when we
-     * don't yet have s->bcs.
+     * block_job_set_speed_locked() is called first from block_job_create(),
+     * when we don't yet have s->bcs.
      */
     if (s->bcs) {
         block_copy_set_speed(s->bcs, speed);
diff --git a/blockdev.c b/blockdev.c
index deb33b8f1e..6a80822f4d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -152,7 +152,8 @@ void blockdev_mark_auto_del(BlockBackend *blk)
 
     JOB_LOCK_GUARD();
 
-    for (job = block_job_next(NULL); job; job = block_job_next(job)) {
+    for (job = block_job_next_locked(NULL); job;
+         job = block_job_next_locked(job)) {
         if (block_job_has_bdrv(job, blk_bs(blk))) {
             AioContext *aio_context = job->job.aio_context;
             aio_context_acquire(aio_context);
@@ -3325,7 +3326,7 @@ static BlockJob *find_block_job_locked(const char *id,
 
     *aio_context = NULL;
 
-    job = block_job_get(id);
+    job = block_job_get_locked(id);
     if (!job) {
         error_set(errp, ERROR_CLASS_DEVICE_NOT_ACTIVE,
                   "Block job '%s' not found", id);
@@ -3350,7 +3351,7 @@ void qmp_block_job_set_speed(const char *device, int64_t speed, Error **errp)
         return;
     }
 
-    block_job_set_speed(job, speed, errp);
+    block_job_set_speed_locked(job, speed, errp);
     aio_context_release(aio_context);
 }
 
@@ -3755,7 +3756,8 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 
     JOB_LOCK_GUARD();
 
-    for (job = block_job_next(NULL); job; job = block_job_next(job)) {
+    for (job = block_job_next_locked(NULL); job;
+         job = block_job_next_locked(job)) {
         BlockJobInfo *value;
         AioContext *aio_context;
 
@@ -3764,7 +3766,7 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
         }
         aio_context = block_job_get_aio_context(job);
         aio_context_acquire(aio_context);
-        value = block_job_query(job, errp);
+        value = block_job_query_locked(job, errp);
         aio_context_release(aio_context);
         if (!value) {
             qapi_free_BlockJobInfoList(head);
diff --git a/blockjob.c b/blockjob.c
index 02a98630c9..0745f4e745 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -59,7 +59,7 @@ static bool is_block_job(Job *job)
            job_type(job) == JOB_TYPE_STREAM;
 }
 
-BlockJob *block_job_next(BlockJob *bjob)
+BlockJob *block_job_next_locked(BlockJob *bjob)
 {
     Job *job = bjob ? &bjob->job : NULL;
     GLOBAL_STATE_CODE();
@@ -71,7 +71,7 @@ BlockJob *block_job_next(BlockJob *bjob)
     return job ? container_of(job, BlockJob, job) : NULL;
 }
 
-BlockJob *block_job_get(const char *id)
+BlockJob *block_job_get_locked(const char *id)
 {
     Job *job = job_get_locked(id);
     GLOBAL_STATE_CODE();
@@ -256,7 +256,8 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
     return 0;
 }
 
-static void block_job_on_idle(Notifier *n, void *opaque)
+/* Called with job_mutex lock held. */
+static void block_job_on_idle_locked(Notifier *n, void *opaque)
 {
     aio_wait_kick();
 }
@@ -277,7 +278,7 @@ static bool job_timer_pending(Job *job)
     return timer_pending(&job->sleep_timer);
 }
 
-bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
+bool block_job_set_speed_locked(BlockJob *job, int64_t speed, Error **errp)
 {
     const BlockJobDriver *drv = block_job_driver(job);
     int64_t old_speed = job->speed;
@@ -319,7 +320,7 @@ int64_t block_job_ratelimit_get_delay(BlockJob *job, uint64_t n)
     return ratelimit_calculate_delay(&job->limit, n);
 }
 
-BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
+BlockJobInfo *block_job_query_locked(BlockJob *job, Error **errp)
 {
     BlockJobInfo *info;
     uint64_t progress_current, progress_total;
@@ -364,7 +365,8 @@ static void block_job_iostatus_set_err(BlockJob *job, int error)
     }
 }
 
-static void block_job_event_cancelled(Notifier *n, void *opaque)
+/* Called with job_mutex lock held. */
+static void block_job_event_cancelled_locked(Notifier *n, void *opaque)
 {
     BlockJob *job = opaque;
     uint64_t progress_current, progress_total;
@@ -383,7 +385,8 @@ static void block_job_event_cancelled(Notifier *n, void *opaque)
                                         job->speed);
 }
 
-static void block_job_event_completed(Notifier *n, void *opaque)
+/* Called with job_mutex lock held. */
+static void block_job_event_completed_locked(Notifier *n, void *opaque)
 {
     BlockJob *job = opaque;
     const char *msg = NULL;
@@ -409,7 +412,8 @@ static void block_job_event_completed(Notifier *n, void *opaque)
                                         msg);
 }
 
-static void block_job_event_pending(Notifier *n, void *opaque)
+/* Called with job_mutex lock held. */
+static void block_job_event_pending_locked(Notifier *n, void *opaque)
 {
     BlockJob *job = opaque;
 
@@ -421,7 +425,8 @@ static void block_job_event_pending(Notifier *n, void *opaque)
                                       job->job.id);
 }
 
-static void block_job_event_ready(Notifier *n, void *opaque)
+/* Called with job_mutex lock held. */
+static void block_job_event_ready_locked(Notifier *n, void *opaque)
 {
     BlockJob *job = opaque;
     uint64_t progress_current, progress_total;
@@ -471,11 +476,11 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
 
     ratelimit_init(&job->limit);
 
-    job->finalize_cancelled_notifier.notify = block_job_event_cancelled;
-    job->finalize_completed_notifier.notify = block_job_event_completed;
-    job->pending_notifier.notify = block_job_event_pending;
-    job->ready_notifier.notify = block_job_event_ready;
-    job->idle_notifier.notify = block_job_on_idle;
+    job->finalize_cancelled_notifier.notify = block_job_event_cancelled_locked;
+    job->finalize_completed_notifier.notify = block_job_event_completed_locked;
+    job->pending_notifier.notify = block_job_event_pending_locked;
+    job->ready_notifier.notify = block_job_event_ready_locked;
+    job->idle_notifier.notify = block_job_on_idle_locked;
 
     WITH_JOB_LOCK_GUARD() {
         notifier_list_add(&job->job.on_finalize_cancelled,
@@ -498,7 +503,7 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
 
     WITH_JOB_LOCK_GUARD() {
-        ret = block_job_set_speed(job, speed, errp);
+        ret = block_job_set_speed_locked(job, speed, errp);
     }
     if (!ret) {
         goto fail;
@@ -511,7 +516,7 @@ fail:
     return NULL;
 }
 
-void block_job_iostatus_reset(BlockJob *job)
+void block_job_iostatus_reset_locked(BlockJob *job)
 {
     GLOBAL_STATE_CODE();
     if (job->iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
@@ -526,7 +531,7 @@ void block_job_user_resume(Job *job)
     BlockJob *bjob = container_of(job, BlockJob, job);
     GLOBAL_STATE_CODE();
     WITH_JOB_LOCK_GUARD() {
-        block_job_iostatus_reset(bjob);
+        block_job_iostatus_reset_locked(bjob);
     }
 }
 
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 6525e16fd5..76c9a0d822 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -46,7 +46,7 @@ typedef struct BlockJob {
     /** Status that is published by the query-block-jobs QMP API */
     BlockDeviceIoStatus iostatus;
 
-    /** Speed that was set with @block_job_set_speed.  */
+    /** Speed that was set with @block_job_set_speed_locked.  */
     int64_t speed;
 
     /** Rate limiting data structure for implementing @speed. */
@@ -82,25 +82,27 @@ typedef struct BlockJob {
  */
 
 /**
- * block_job_next:
+ * block_job_next_locked:
  * @job: A block job, or %NULL.
  *
  * Get the next element from the list of block jobs after @job, or the
  * first one if @job is %NULL.
  *
  * Returns the requested job, or %NULL if there are no more jobs left.
+ * Called with job_mutex lock held.
  */
-BlockJob *block_job_next(BlockJob *job);
+BlockJob *block_job_next_locked(BlockJob *job);
 
 /**
- * block_job_get:
+ * block_job_get_locked:
  * @id: The id of the block job.
  *
  * Get the block job identified by @id (which must not be %NULL).
  *
  * Returns the requested job, or %NULL if it doesn't exist.
+ * Called with job_mutex lock held.
  */
-BlockJob *block_job_get(const char *id);
+BlockJob *block_job_get_locked(const char *id);
 
 /**
  * block_job_add_bdrv:
@@ -135,32 +137,37 @@ void block_job_remove_all_bdrv(BlockJob *job);
 bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs);
 
 /**
- * block_job_set_speed:
+ * block_job_set_speed_locked:
  * @job: The job to set the speed for.
  * @speed: The new value
  * @errp: Error object.
  *
  * Set a rate-limiting parameter for the job; the actual meaning may
  * vary depending on the job type.
+ *
+ * Called with job_mutex lock held. May temporarily release the lock.
  */
-bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp);
+bool block_job_set_speed_locked(BlockJob *job, int64_t speed, Error **errp);
 
 /**
- * block_job_query:
+ * block_job_query_locked:
  * @job: The job to get information about.
  *
  * Return information about a job.
+ * Called with job_mutex lock held.
  */
-BlockJobInfo *block_job_query(BlockJob *job, Error **errp);
+BlockJobInfo *block_job_query_locked(BlockJob *job, Error **errp);
 
 /**
- * block_job_iostatus_reset:
+ * block_job_iostatus_reset_locked:
  * @job: The job whose I/O status should be reset.
  *
  * Reset I/O status on @job and on BlockDriverState objects it uses,
  * other than job->blk.
+ *
+ * Called with job_mutex lock held.
  */
-void block_job_iostatus_reset(BlockJob *job);
+void block_job_iostatus_reset_locked(BlockJob *job);
 
 /*
  * block_job_get_aio_context:
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 39d9d06a81..1897ed7a13 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -134,8 +134,9 @@ void qmp_cont(Error **errp)
     }
 
     WITH_JOB_LOCK_GUARD() {
-        for (job = block_job_next(NULL); job; job = block_job_next(job)) {
-            block_job_iostatus_reset(job);
+        for (job = block_job_next_locked(NULL); job;
+             job = block_job_next_locked(job)) {
+            block_job_iostatus_reset_locked(job);
         }
     }
 
diff --git a/qemu-img.c b/qemu-img.c
index f0b7f71e78..289d88a156 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1089,7 +1089,7 @@ static int img_commit(int argc, char **argv)
     }
 
     WITH_JOB_LOCK_GUARD() {
-        job = block_job_get("commit");
+        job = block_job_get_locked("commit");
     }
     assert(job);
     run_block_job(job, &local_err);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 13/18] job.h: define unlocked functions
  2022-06-16 13:18 [PATCH v7 00/18] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (11 preceding siblings ...)
  2022-06-16 13:18 ` [PATCH v7 12/18] block_job: rename block_job " Emanuele Giuseppe Esposito
@ 2022-06-16 13:18 ` Emanuele Giuseppe Esposito
  2022-06-16 13:18 ` [PATCH v7 14/18] commit and mirror: create new nodes using bdrv_get_aio_context, and not the job aiocontext Emanuele Giuseppe Esposito
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-16 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel,
	Emanuele Giuseppe Esposito

All these functions assume that the lock is not held, and acquire
it internally.

These functions will be useful when job_lock is globally applied,
as they will allow callers to access the job struct fields
without worrying about the job lock.

Update also the comments in blockjob.c (and move them in job.c).

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

No functional change intended.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 blockjob.c         | 20 --------------------
 include/qemu/job.h | 36 +++++++++++++++++++++++++++++++++---
 job.c              | 16 ++++++++++++++++
 3 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 0745f4e745..2c075db45b 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -36,21 +36,6 @@
 #include "qemu/main-loop.h"
 #include "qemu/timer.h"
 
-/*
- * The block job API is composed of two categories of functions.
- *
- * The first includes functions used by the monitor.  The monitor is
- * peculiar in that it accesses the block job list with block_job_get, and
- * therefore needs consistency across block_job_get and the actual operation
- * (e.g. block_job_set_speed).  The consistency is achieved with
- * aio_context_acquire/release.  These functions are declared in blockjob.h.
- *
- * The second includes functions used by the block job drivers and sometimes
- * by the core block layer.  These do not care about locking, because the
- * whole coroutine runs under the AioContext lock, and are declared in
- * blockjob_int.h.
- */
-
 static bool is_block_job(Job *job)
 {
     return job_type(job) == JOB_TYPE_BACKUP ||
@@ -446,11 +431,6 @@ static void block_job_event_ready_locked(Notifier *n, void *opaque)
 }
 
 
-/*
- * API for block job drivers and the block layer.  These functions are
- * declared in blockjob_int.h.
- */
-
 void *block_job_create(const char *job_id, const BlockJobDriver *driver,
                        JobTxn *txn, BlockDriverState *bs, uint64_t perm,
                        uint64_t shared_perm, int64_t speed, int flags,
diff --git a/include/qemu/job.h b/include/qemu/job.h
index 246af068a1..2c9011329a 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -362,6 +362,7 @@ void job_txn_unref_locked(JobTxn *txn);
 
 /**
  * Create a new long-running job and return it.
+ * Called with job_mutex *not* held.
  *
  * @job_id: The id of the newly-created job, or %NULL for internal jobs
  * @driver: The class object for the newly-created job.
@@ -397,6 +398,8 @@ void job_unref_locked(Job *job);
  * @done: How much progress the job made since the last call
  *
  * Updates the progress counter of the job.
+ *
+ * Progress API is thread safe.
  */
 void job_progress_update(Job *job, uint64_t done);
 
@@ -407,6 +410,8 @@ void job_progress_update(Job *job, uint64_t done);
  *
  * Sets the expected end value of the progress counter of a job so that a
  * completion percentage can be calculated when the progress is updated.
+ *
+ * Progress API is thread safe.
  */
 void job_progress_set_remaining(Job *job, uint64_t remaining);
 
@@ -422,6 +427,8 @@ void job_progress_set_remaining(Job *job, uint64_t remaining);
  * length before, and job_progress_update() afterwards.
  * (So the operation acts as a parenthesis in regards to the main job
  * operation running in background.)
+ *
+ * Progress API is thread safe.
  */
 void job_progress_increase_remaining(Job *job, uint64_t delta);
 
@@ -439,6 +446,8 @@ void job_enter_cond_locked(Job *job, bool(*fn)(Job *job));
  *
  * Begins execution of a job.
  * Takes ownership of one reference to the job object.
+ *
+ * Called with job_mutex *not* held.
  */
 void job_start(Job *job);
 
@@ -446,6 +455,7 @@ void job_start(Job *job);
  * @job: The job to enter.
  *
  * Continue the specified job by entering the coroutine.
+ * Called with job_mutex lock *not* held.
  */
 void job_enter(Job *job);
 
@@ -454,6 +464,9 @@ void job_enter(Job *job);
  *
  * Pause now if job_pause_locked() has been called. Jobs that perform lots of
  * I/O must call this between requests so that the job can be paused.
+ *
+ * Called with job_mutex *not* held (we don't want the coroutine
+ * to yield with the lock held!).
  */
 void coroutine_fn job_pause_point(Job *job);
 
@@ -461,6 +474,8 @@ void coroutine_fn job_pause_point(Job *job);
  * @job: The job that calls the function.
  *
  * Yield the job coroutine.
+ * Called with job_mutex *not* held (we don't want the coroutine
+ * to yield with the lock held!).
  */
 void job_yield(Job *job);
 
@@ -471,6 +486,9 @@ void job_yield(Job *job);
  * Put the job to sleep (assuming that it wasn't canceled) for @ns
  * %QEMU_CLOCK_REALTIME nanoseconds.  Canceling the job will immediately
  * interrupt the wait.
+ *
+ * Called with job_mutex *not* held (we don't want the coroutine
+ * to yield with the lock held!).
  */
 void coroutine_fn job_sleep_ns(Job *job, int64_t ns);
 
@@ -582,10 +600,16 @@ Job *job_get_locked(const char *id);
  */
 int job_apply_verb_locked(Job *job, JobVerb verb, Error **errp);
 
-/** The @job could not be started, free it. */
+/**
+ * The @job could not be started, free it.
+ * Called with job_mutex *not* held.
+ */
 void job_early_fail(Job *job);
 
-/** Moves the @job from RUNNING to READY */
+/**
+ * Moves the @job from RUNNING to READY.
+ * Called with job_mutex *not* held.
+ */
 void job_transition_to_ready(Job *job);
 
 /**
@@ -624,7 +648,13 @@ void job_user_cancel_locked(Job *job, bool force, Error **errp);
  */
 int job_cancel_sync_locked(Job *job, bool force);
 
-/** Synchronously force-cancels all jobs using job_cancel_sync_locked(). */
+/**
+ * Synchronously force-cancels all jobs using job_cancel_sync_locked().
+ *
+ * Called with job_lock *not* held, unlike most other APIs consumed
+ * by the monitor! This is primarly to avoid adding unnecessary lock-unlock
+ * patterns in the caller.
+ */
 void job_cancel_sync_all(void);
 
 /**
diff --git a/job.c b/job.c
index 5c0cb37175..b6b9431b2d 100644
--- a/job.c
+++ b/job.c
@@ -32,12 +32,27 @@
 #include "trace/trace-root.h"
 #include "qapi/qapi-events-job.h"
 
+/*
+ * The job API is composed of two categories of functions.
+ *
+ * The first includes functions used by the monitor.  The monitor is
+ * peculiar in that it accesses the block job list with job_get, and
+ * therefore needs consistency across job_get and the actual operation
+ * (e.g. job_user_cancel). To achieve this consistency, the caller
+ * calls job_lock/job_unlock itself around the whole operation.
+ *
+ *
+ * The second includes functions used by the block job drivers and sometimes
+ * by the core block layer. These delegate the locking to the callee instead.
+ */
+
 /*
  * job_mutex protects the jobs list, but also makes the
  * struct job fields thread-safe.
  */
 QemuMutex job_mutex;
 
+/* Protected by job_mutex */
 static QLIST_HEAD(, Job) jobs = QLIST_HEAD_INITIALIZER(jobs);
 
 /* Job State Transition Table */
@@ -353,6 +368,7 @@ Job *job_get_locked(const char *id)
     return NULL;
 }
 
+/* Called with job_mutex *not* held. */
 static void job_sleep_timer_cb(void *opaque)
 {
     Job *job = opaque;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 14/18] commit and mirror: create new nodes using bdrv_get_aio_context, and not the job aiocontext
  2022-06-16 13:18 [PATCH v7 00/18] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (12 preceding siblings ...)
  2022-06-16 13:18 ` [PATCH v7 13/18] job.h: define unlocked functions Emanuele Giuseppe Esposito
@ 2022-06-16 13:18 ` Emanuele Giuseppe Esposito
  2022-06-16 13:18 ` [PATCH v7 15/18] job: detect change of aiocontext within job coroutine Emanuele Giuseppe Esposito
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-16 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel,
	Emanuele Giuseppe Esposito

We are always using the given bs AioContext, so there is no need
to take the job ones (which is identical anyways).
This also reduces the point we need to check when protecting
job.aio_context field.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block/commit.c | 4 ++--
 block/mirror.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 851d1c557a..336f799172 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -370,7 +370,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
         goto fail;
     }
 
-    s->base = blk_new(s->common.job.aio_context,
+    s->base = blk_new(bdrv_get_aio_context(bs),
                       base_perms,
                       BLK_PERM_CONSISTENT_READ
                       | BLK_PERM_WRITE_UNCHANGED);
@@ -382,7 +382,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
     s->base_bs = base;
 
     /* Required permissions are already taken with block_job_add_bdrv() */
-    s->top = blk_new(s->common.job.aio_context, 0, BLK_PERM_ALL);
+    s->top = blk_new(bdrv_get_aio_context(bs), 0, BLK_PERM_ALL);
     ret = blk_insert_bs(s->top, top, errp);
     if (ret < 0) {
         goto fail;
diff --git a/block/mirror.c b/block/mirror.c
index b38676e19d..1977e25171 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1728,7 +1728,7 @@ static BlockJob *mirror_start_job(
         goto fail;
     }
 
-    s->target = blk_new(s->common.job.aio_context,
+    s->target = blk_new(bdrv_get_aio_context(bs),
                         target_perms, target_shared_perms);
     ret = blk_insert_bs(s->target, target, errp);
     if (ret < 0) {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 15/18] job: detect change of aiocontext within job coroutine
  2022-06-16 13:18 [PATCH v7 00/18] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (13 preceding siblings ...)
  2022-06-16 13:18 ` [PATCH v7 14/18] commit and mirror: create new nodes using bdrv_get_aio_context, and not the job aiocontext Emanuele Giuseppe Esposito
@ 2022-06-16 13:18 ` Emanuele Giuseppe Esposito
  2022-06-16 13:18 ` [PATCH v7 16/18] jobs: protect job.aio_context with BQL and job_mutex Emanuele Giuseppe Esposito
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-16 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

From: Paolo Bonzini <pbonzini@redhat.com>

We want to make sure access of job->aio_context is always done
under either BQL or job_mutex. The problem is that using
aio_co_enter(job->aiocontext, job->co) in job_start and job_enter_cond
makes the coroutine immediately resume, so we can't hold the job lock.
And caching it is not safe either, as it might change.

job_start is under BQL, so it can freely read job->aiocontext, but
job_enter_cond is not. In order to fix this, use aio_co_wake():
the advantage is that it won't use job->aiocontext, but the
main disadvantage is that it won't be able to detect a change of
job AioContext.

Calling bdrv_try_set_aio_context() will issue the following calls
(simplified):
* in terms of  bdrv callbacks:
  .drained_begin -> .set_aio_context -> .drained_end
* in terms of child_job functions:
  child_job_drained_begin -> child_job_set_aio_context -> child_job_drained_end
* in terms of job functions:
  job_pause_locked -> job_set_aio_context -> job_resume_locked

We can see that after setting the new aio_context, job_resume_locked
calls again job_enter_cond, which then invokes aio_co_wake(). But
while job->aiocontext has been set in job_set_aio_context,
job->co->ctx has not changed, so the coroutine would be entering in
the wrong aiocontext.

Using aio_co_schedule in job_resume_locked() might seem as a valid
alternative, but the problem is that the bh resuming the coroutine
is not scheduled immediately, and if in the meanwhile another
bdrv_try_set_aio_context() is run (see test_propagate_mirror() in
test-block-iothread.c), we would have the first schedule in the
wrong aiocontext, and the second set of drains won't even manage
to schedule the coroutine, as job->busy would still be true from
the previous job_resume_locked().

The solution is to stick with aio_co_wake(), but then detect every time
the coroutine resumes back from yielding if job->aio_context
has changed. If so, we can reschedule it to the new context.

Check for the aiocontext change in job_do_yield_locked because:
1) aio_co_reschedule_self requires to be in the running coroutine
2) since child_job_set_aio_context allows changing the aiocontext only
   while the job is paused, this is the exact place where the coroutine
   resumes, before running JobDriver's code.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 job.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/job.c b/job.c
index b6b9431b2d..389c134a90 100644
--- a/job.c
+++ b/job.c
@@ -543,11 +543,12 @@ void job_enter_cond_locked(Job *job, bool(*fn)(Job *job))
         return;
     }
 
-    assert(!job->deferred_to_main_loop);
     timer_del(&job->sleep_timer);
     job->busy = true;
     real_job_unlock();
-    aio_co_enter(job->aio_context, job->co);
+    job_unlock();
+    aio_co_wake(job->co);
+    job_lock();
 }
 
 void job_enter(Job *job)
@@ -568,6 +569,8 @@ void job_enter(Job *job)
  */
 static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns)
 {
+    AioContext *next_aio_context;
+
     real_job_lock();
     if (ns != -1) {
         timer_mod(&job->sleep_timer, ns);
@@ -579,6 +582,19 @@ static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns)
     qemu_coroutine_yield();
     job_lock();
 
+    next_aio_context = job->aio_context;
+    /*
+     * Coroutine has resumed, but in the meanwhile the job AioContext
+     * might have changed via bdrv_try_set_aio_context(), so we need to move
+     * the coroutine too in the new aiocontext.
+     */
+    while (qemu_get_current_aio_context() != next_aio_context) {
+        job_unlock();
+        aio_co_reschedule_self(next_aio_context);
+        job_lock();
+        next_aio_context = job->aio_context;
+    }
+
     /* Set by job_enter_cond_locked() before re-entering the coroutine.  */
     assert(job->busy);
 }
@@ -1122,6 +1138,8 @@ static void coroutine_fn job_co_entry(void *opaque)
 
 void job_start(Job *job)
 {
+    assert(qemu_in_main_thread());
+
     WITH_JOB_LOCK_GUARD() {
         assert(job && !job_started(job) && job->paused &&
             job->driver && job->driver->run);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 16/18] jobs: protect job.aio_context with BQL and job_mutex
  2022-06-16 13:18 [PATCH v7 00/18] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (14 preceding siblings ...)
  2022-06-16 13:18 ` [PATCH v7 15/18] job: detect change of aiocontext within job coroutine Emanuele Giuseppe Esposito
@ 2022-06-16 13:18 ` Emanuele Giuseppe Esposito
  2022-06-16 13:18 ` [PATCH v7 17/18] job.c: enable job lock/unlock and remove Aiocontext locks Emanuele Giuseppe Esposito
  2022-06-16 13:18 ` [PATCH v7 18/18] block_job_query: remove atomic read Emanuele Giuseppe Esposito
  17 siblings, 0 replies; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-16 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel,
	Emanuele Giuseppe Esposito

In order to make it thread safe, implement a "fake rwlock",
where we allow reads under BQL *or* job_mutex held, but
writes only under BQL *and* job_mutex.

The only write we have is in child_job_set_aio_ctx, which always
happens under drain (so the job is paused).
For this reason, introduce job_set_aio_context and make sure that
the context is set under BQL, job_mutex and drain.
Also make sure all other places where the aiocontext is read
are protected.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block/replication.c |  2 +-
 blockjob.c          |  3 ++-
 include/qemu/job.h  | 19 ++++++++++++++++++-
 job.c               | 12 ++++++++++++
 4 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index 50ea778937..68018948b9 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -148,8 +148,8 @@ static void replication_close(BlockDriverState *bs)
     }
     if (s->stage == BLOCK_REPLICATION_FAILOVER) {
         commit_job = &s->commit_job->job;
-        assert(commit_job->aio_context == qemu_get_current_aio_context());
         WITH_JOB_LOCK_GUARD() {
+            assert(commit_job->aio_context == qemu_get_current_aio_context());
             job_cancel_sync_locked(commit_job, false);
         }
     }
diff --git a/blockjob.c b/blockjob.c
index 2c075db45b..8b9e10813d 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -154,12 +154,13 @@ static void child_job_set_aio_ctx(BdrvChild *c, AioContext *ctx,
         bdrv_set_aio_context_ignore(sibling->bs, ctx, ignore);
     }
 
-    job->job.aio_context = ctx;
+    job_set_aio_context(&job->job, ctx);
 }
 
 static AioContext *child_job_get_parent_aio_context(BdrvChild *c)
 {
     BlockJob *job = c->opaque;
+    assert(qemu_in_main_thread());
 
     return job->job.aio_context;
 }
diff --git a/include/qemu/job.h b/include/qemu/job.h
index 2c9011329a..d0834906e9 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -77,7 +77,12 @@ typedef struct Job {
 
     /** Protected by AioContext lock */
 
-    /** AioContext to run the job coroutine in */
+    /**
+     * AioContext to run the job coroutine in.
+     * This field can be read when holding either the BQL (so we are in
+     * the main loop) or the job_mutex.
+     * It can be only written when we hold *both* BQL and job_mutex.
+     */
     AioContext *aio_context;
 
     /** Reference count of the block job */
@@ -707,4 +712,16 @@ void job_dismiss_locked(Job **job, Error **errp);
 int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
                            Error **errp);
 
+/**
+ * Sets the @job->aio_context.
+ * Called with job_mutex *not* held.
+ *
+ * This function must run in the main thread to protect against
+ * concurrent read in job_finish_sync_locked(),
+ * takes the job_mutex lock to protect against the read in
+ * job_do_yield_locked(), and must be called when the coroutine
+ * is quiescent.
+ */
+void job_set_aio_context(Job *job, AioContext *ctx);
+
 #endif
diff --git a/job.c b/job.c
index 389c134a90..8af53b11c8 100644
--- a/job.c
+++ b/job.c
@@ -368,6 +368,17 @@ Job *job_get_locked(const char *id)
     return NULL;
 }
 
+void job_set_aio_context(Job *job, AioContext *ctx)
+{
+    /* protect against read in job_finish_sync_locked and job_start */
+    assert(qemu_in_main_thread());
+    /* protect against read in job_do_yield_locked */
+    JOB_LOCK_GUARD();
+    /* ensure the coroutine is quiescent while the AioContext is changed */
+    assert(job->pause_count > 0);
+    job->aio_context = ctx;
+}
+
 /* Called with job_mutex *not* held. */
 static void job_sleep_timer_cb(void *opaque)
 {
@@ -1261,6 +1272,7 @@ int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
 {
     Error *local_err = NULL;
     int ret;
+    assert(qemu_in_main_thread());
 
     job_ref_locked(job);
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 17/18] job.c: enable job lock/unlock and remove Aiocontext locks
  2022-06-16 13:18 [PATCH v7 00/18] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (15 preceding siblings ...)
  2022-06-16 13:18 ` [PATCH v7 16/18] jobs: protect job.aio_context with BQL and job_mutex Emanuele Giuseppe Esposito
@ 2022-06-16 13:18 ` Emanuele Giuseppe Esposito
  2022-06-16 13:18 ` [PATCH v7 18/18] block_job_query: remove atomic read Emanuele Giuseppe Esposito
  17 siblings, 0 replies; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-16 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel,
	Emanuele Giuseppe Esposito

Change the job_{lock/unlock} and macros to use job_mutex.

Now that they are not nop anymore, remove the aiocontext
to avoid deadlocks.

Therefore:
- when possible, remove completely the aiocontext lock/unlock pair
- if it is used by some other function too, reduce the locking
section as much as possible, leaving the job API outside.

There is only one JobDriver callback, ->free() that assumes that
the aiocontext lock is held (because it calls bdrv_unref), so for
now keep that under aiocontext lock.

Also remove real_job_{lock/unlock}, as they are replaced by the
public functions.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 blockdev.c                       | 72 ++++---------------------
 include/qemu/job.h               | 19 +++----
 job-qmp.c                        | 44 +++------------
 job.c                            | 92 ++++++--------------------------
 tests/unit/test-bdrv-drain.c     |  4 +-
 tests/unit/test-block-iothread.c |  2 +-
 tests/unit/test-blockjob.c       | 13 ++---
 7 files changed, 49 insertions(+), 197 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 6a80822f4d..727b778329 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -155,12 +155,7 @@ void blockdev_mark_auto_del(BlockBackend *blk)
     for (job = block_job_next_locked(NULL); job;
          job = block_job_next_locked(job)) {
         if (block_job_has_bdrv(job, blk_bs(blk))) {
-            AioContext *aio_context = job->job.aio_context;
-            aio_context_acquire(aio_context);
-
             job_cancel_locked(&job->job, false);
-
-            aio_context_release(aio_context);
         }
     }
 
@@ -1836,16 +1831,9 @@ static void drive_backup_abort(BlkActionState *common)
     DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
 
     if (state->job) {
-        AioContext *aio_context;
-
-        aio_context = bdrv_get_aio_context(state->bs);
-        aio_context_acquire(aio_context);
-
         WITH_JOB_LOCK_GUARD() {
             job_cancel_sync_locked(&state->job->job, true);
         }
-
-        aio_context_release(aio_context);
     }
 }
 
@@ -1939,16 +1927,9 @@ static void blockdev_backup_abort(BlkActionState *common)
     BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, common);
 
     if (state->job) {
-        AioContext *aio_context;
-
-        aio_context = bdrv_get_aio_context(state->bs);
-        aio_context_acquire(aio_context);
-
         WITH_JOB_LOCK_GUARD() {
             job_cancel_sync_locked(&state->job->job, true);
         }
-
-        aio_context_release(aio_context);
     }
 }
 
@@ -3313,19 +3294,14 @@ out:
 }
 
 /*
- * Get a block job using its ID and acquire its AioContext.
- * Called with job_mutex held.
+ * Get a block job using its ID. Called with job_mutex held.
  */
-static BlockJob *find_block_job_locked(const char *id,
-                                       AioContext **aio_context,
-                                       Error **errp)
+static BlockJob *find_block_job_locked(const char *id, Error **errp)
 {
     BlockJob *job;
 
     assert(id != NULL);
 
-    *aio_context = NULL;
-
     job = block_job_get_locked(id);
     if (!job) {
         error_set(errp, ERROR_CLASS_DEVICE_NOT_ACTIVE,
@@ -3333,36 +3309,30 @@ static BlockJob *find_block_job_locked(const char *id,
         return NULL;
     }
 
-    *aio_context = block_job_get_aio_context(job);
-    aio_context_acquire(*aio_context);
-
     return job;
 }
 
 void qmp_block_job_set_speed(const char *device, int64_t speed, Error **errp)
 {
-    AioContext *aio_context;
     BlockJob *job;
 
     JOB_LOCK_GUARD();
-    job = find_block_job_locked(device, &aio_context, errp);
+    job = find_block_job_locked(device, errp);
 
     if (!job) {
         return;
     }
 
     block_job_set_speed_locked(job, speed, errp);
-    aio_context_release(aio_context);
 }
 
 void qmp_block_job_cancel(const char *device,
                           bool has_force, bool force, Error **errp)
 {
-    AioContext *aio_context;
     BlockJob *job;
 
     JOB_LOCK_GUARD();
-    job = find_block_job_locked(device, &aio_context, errp);
+    job = find_block_job_locked(device, errp);
 
     if (!job) {
         return;
@@ -3375,22 +3345,19 @@ void qmp_block_job_cancel(const char *device,
     if (job_user_paused_locked(&job->job) && !force) {
         error_setg(errp, "The block job for device '%s' is currently paused",
                    device);
-        goto out;
+        return;
     }
 
     trace_qmp_block_job_cancel(job);
     job_user_cancel_locked(&job->job, force, errp);
-out:
-    aio_context_release(aio_context);
 }
 
 void qmp_block_job_pause(const char *device, Error **errp)
 {
-    AioContext *aio_context;
     BlockJob *job;
 
     JOB_LOCK_GUARD();
-    job = find_block_job_locked(device, &aio_context, errp);
+    job = find_block_job_locked(device, errp);
 
     if (!job) {
         return;
@@ -3398,16 +3365,14 @@ void qmp_block_job_pause(const char *device, Error **errp)
 
     trace_qmp_block_job_pause(job);
     job_user_pause_locked(&job->job, errp);
-    aio_context_release(aio_context);
 }
 
 void qmp_block_job_resume(const char *device, Error **errp)
 {
-    AioContext *aio_context;
     BlockJob *job;
 
     JOB_LOCK_GUARD();
-    job = find_block_job_locked(device, &aio_context, errp);
+    job = find_block_job_locked(device, errp);
 
     if (!job) {
         return;
@@ -3415,16 +3380,14 @@ void qmp_block_job_resume(const char *device, Error **errp)
 
     trace_qmp_block_job_resume(job);
     job_user_resume_locked(&job->job, errp);
-    aio_context_release(aio_context);
 }
 
 void qmp_block_job_complete(const char *device, Error **errp)
 {
-    AioContext *aio_context;
     BlockJob *job;
 
     JOB_LOCK_GUARD();
-    job = find_block_job_locked(device, &aio_context, errp);
+    job = find_block_job_locked(device, errp);
 
     if (!job) {
         return;
@@ -3432,16 +3395,14 @@ void qmp_block_job_complete(const char *device, Error **errp)
 
     trace_qmp_block_job_complete(job);
     job_complete_locked(&job->job, errp);
-    aio_context_release(aio_context);
 }
 
 void qmp_block_job_finalize(const char *id, Error **errp)
 {
-    AioContext *aio_context;
     BlockJob *job;
 
     JOB_LOCK_GUARD();
-    job = find_block_job_locked(id, &aio_context, errp);
+    job = find_block_job_locked(id, errp);
 
     if (!job) {
         return;
@@ -3451,24 +3412,16 @@ void qmp_block_job_finalize(const char *id, Error **errp)
     job_ref_locked(&job->job);
     job_finalize_locked(&job->job, errp);
 
-    /*
-     * Job's context might have changed via job_finalize_locked
-     * (and job_txn_apply automatically acquires the new one),
-     * so make sure we release the correct one.
-     */
-    aio_context = block_job_get_aio_context(job);
     job_unref_locked(&job->job);
-    aio_context_release(aio_context);
 }
 
 void qmp_block_job_dismiss(const char *id, Error **errp)
 {
-    AioContext *aio_context;
     BlockJob *bjob;
     Job *job;
 
     JOB_LOCK_GUARD();
-    bjob = find_block_job_locked(id, &aio_context, errp);
+    bjob = find_block_job_locked(id, errp);
 
     if (!bjob) {
         return;
@@ -3477,7 +3430,6 @@ void qmp_block_job_dismiss(const char *id, Error **errp)
     trace_qmp_block_job_dismiss(bjob);
     job = &bjob->job;
     job_dismiss_locked(&job, errp);
-    aio_context_release(aio_context);
 }
 
 void qmp_change_backing_file(const char *device,
@@ -3759,15 +3711,11 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
     for (job = block_job_next_locked(NULL); job;
          job = block_job_next_locked(job)) {
         BlockJobInfo *value;
-        AioContext *aio_context;
 
         if (block_job_is_internal(job)) {
             continue;
         }
-        aio_context = block_job_get_aio_context(job);
-        aio_context_acquire(aio_context);
         value = block_job_query_locked(job, errp);
-        aio_context_release(aio_context);
         if (!value) {
             qapi_free_BlockJobInfoList(head);
             return NULL;
diff --git a/include/qemu/job.h b/include/qemu/job.h
index d0834906e9..75c206a93b 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -75,13 +75,14 @@ typedef struct Job {
     ProgressMeter progress;
 
 
-    /** Protected by AioContext lock */
+    /** Protected by job_mutex */
 
     /**
      * AioContext to run the job coroutine in.
-     * This field can be read when holding either the BQL (so we are in
-     * the main loop) or the job_mutex.
-     * It can be only written when we hold *both* BQL and job_mutex.
+     * The job Aiocontext can be read when holding *either*
+     * the BQL (so we are in the main loop) or the job_mutex.
+     * It can only be written when we hold *both* BQL
+     * and the job_mutex.
      */
     AioContext *aio_context;
 
@@ -106,7 +107,7 @@ typedef struct Job {
     /**
      * Set to false by the job while the coroutine has yielded and may be
      * re-entered by job_enter(). There may still be I/O or event loop activity
-     * pending. Accessed under block_job_mutex (in blockjob.c).
+     * pending. Accessed under job_mutex.
      *
      * When the job is deferred to the main loop, busy is true as long as the
      * bottom half is still pending.
@@ -322,9 +323,9 @@ typedef enum JobCreateFlags {
 
 extern QemuMutex job_mutex;
 
-#define JOB_LOCK_GUARD() /* QEMU_LOCK_GUARD(&job_mutex) */
+#define JOB_LOCK_GUARD() QEMU_LOCK_GUARD(&job_mutex)
 
-#define WITH_JOB_LOCK_GUARD() /* WITH_QEMU_LOCK_GUARD(&job_mutex) */
+#define WITH_JOB_LOCK_GUARD() WITH_QEMU_LOCK_GUARD(&job_mutex)
 
 /**
  * job_lock:
@@ -648,7 +649,6 @@ void job_user_cancel_locked(Job *job, bool force, Error **errp);
  * Returns the return value from the job if the job actually completed
  * during the call, or -ECANCELED if it was canceled.
  *
- * Callers must hold the AioContext lock of job->aio_context.
  * Called between job_lock and job_unlock.
  */
 int job_cancel_sync_locked(Job *job, bool force);
@@ -673,8 +673,6 @@ void job_cancel_sync_all(void);
  * function).
  *
  * Returns the return value from the job.
- *
- * Callers must hold the AioContext lock of job->aio_context.
  * Called between job_lock and job_unlock.
  */
 int job_complete_sync_locked(Job *job, Error **errp);
@@ -706,7 +704,6 @@ void job_dismiss_locked(Job **job, Error **errp);
  * Returns 0 if the job is successfully completed, -ECANCELED if the job was
  * cancelled before completing, and -errno in other error cases.
  *
- * Callers must hold the AioContext lock of job->aio_context.
  * Called between job_lock and job_unlock.
  */
 int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
diff --git a/job-qmp.c b/job-qmp.c
index c2eabae09c..96d67246d2 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -30,34 +30,27 @@
 #include "trace/trace-root.h"
 
 /*
- * Get a block job using its ID and acquire its AioContext.
- * Called with job_mutex held.
+ * Get a block job using its ID. Called with job_mutex held.
  */
-static Job *find_job_locked(const char *id, AioContext **aio_context, Error **errp)
+static Job *find_job_locked(const char *id, Error **errp)
 {
     Job *job;
 
-    *aio_context = NULL;
-
     job = job_get_locked(id);
     if (!job) {
         error_setg(errp, "Job not found");
         return NULL;
     }
 
-    *aio_context = job->aio_context;
-    aio_context_acquire(*aio_context);
-
     return job;
 }
 
 void qmp_job_cancel(const char *id, Error **errp)
 {
-    AioContext *aio_context;
     Job *job;
 
     JOB_LOCK_GUARD();
-    job = find_job_locked(id, &aio_context, errp);
+    job = find_job_locked(id, errp);
 
     if (!job) {
         return;
@@ -65,16 +58,14 @@ void qmp_job_cancel(const char *id, Error **errp)
 
     trace_qmp_job_cancel(job);
     job_user_cancel_locked(job, true, errp);
-    aio_context_release(aio_context);
 }
 
 void qmp_job_pause(const char *id, Error **errp)
 {
-    AioContext *aio_context;
     Job *job;
 
     JOB_LOCK_GUARD();
-    job = find_job_locked(id, &aio_context, errp);
+    job = find_job_locked(id, errp);
 
     if (!job) {
         return;
@@ -82,16 +73,14 @@ void qmp_job_pause(const char *id, Error **errp)
 
     trace_qmp_job_pause(job);
     job_user_pause_locked(job, errp);
-    aio_context_release(aio_context);
 }
 
 void qmp_job_resume(const char *id, Error **errp)
 {
-    AioContext *aio_context;
     Job *job;
 
     JOB_LOCK_GUARD();
-    job = find_job_locked(id, &aio_context, errp);
+    job = find_job_locked(id, errp);
 
     if (!job) {
         return;
@@ -99,16 +88,14 @@ void qmp_job_resume(const char *id, Error **errp)
 
     trace_qmp_job_resume(job);
     job_user_resume_locked(job, errp);
-    aio_context_release(aio_context);
 }
 
 void qmp_job_complete(const char *id, Error **errp)
 {
-    AioContext *aio_context;
     Job *job;
 
     JOB_LOCK_GUARD();
-    job = find_job_locked(id, &aio_context, errp);
+    job = find_job_locked(id, errp);
 
     if (!job) {
         return;
@@ -116,16 +103,14 @@ void qmp_job_complete(const char *id, Error **errp)
 
     trace_qmp_job_complete(job);
     job_complete_locked(job, errp);
-    aio_context_release(aio_context);
 }
 
 void qmp_job_finalize(const char *id, Error **errp)
 {
-    AioContext *aio_context;
     Job *job;
 
     JOB_LOCK_GUARD();
-    job = find_job_locked(id, &aio_context, errp);
+    job = find_job_locked(id, errp);
 
     if (!job) {
         return;
@@ -135,23 +120,15 @@ void qmp_job_finalize(const char *id, Error **errp)
     job_ref_locked(job);
     job_finalize_locked(job, errp);
 
-    /*
-     * Job's context might have changed via job_finalize_locked
-     * (and job_txn_apply automatically acquires the new one),
-     * so make sure we release the correct one.
-     */
-    aio_context = job->aio_context;
     job_unref_locked(job);
-    aio_context_release(aio_context);
 }
 
 void qmp_job_dismiss(const char *id, Error **errp)
 {
-    AioContext *aio_context;
     Job *job;
 
     JOB_LOCK_GUARD();
-    job = find_job_locked(id, &aio_context, errp);
+    job = find_job_locked(id, errp);
 
     if (!job) {
         return;
@@ -159,7 +136,6 @@ void qmp_job_dismiss(const char *id, Error **errp)
 
     trace_qmp_job_dismiss(job);
     job_dismiss_locked(&job, errp);
-    aio_context_release(aio_context);
 }
 
 static JobInfo *job_query_single(Job *job, Error **errp)
@@ -196,15 +172,11 @@ JobInfoList *qmp_query_jobs(Error **errp)
 
     for (job = job_next_locked(NULL); job; job = job_next_locked(job)) {
         JobInfo *value;
-        AioContext *aio_context;
 
         if (job_is_internal(job)) {
             continue;
         }
-        aio_context = job->aio_context;
-        aio_context_acquire(aio_context);
         value = job_query_single(job, errp);
-        aio_context_release(aio_context);
         if (!value) {
             qapi_free_JobInfoList(head);
             return NULL;
diff --git a/job.c b/job.c
index 8af53b11c8..1032d46a07 100644
--- a/job.c
+++ b/job.c
@@ -96,21 +96,11 @@ struct JobTxn {
 };
 
 void job_lock(void)
-{
-    /* nop */
-}
-
-void job_unlock(void)
-{
-    /* nop */
-}
-
-static void real_job_lock(void)
 {
     qemu_mutex_lock(&job_mutex);
 }
 
-static void real_job_unlock(void)
+void job_unlock(void)
 {
     qemu_mutex_unlock(&job_mutex);
 }
@@ -177,7 +167,6 @@ static void job_txn_del_job_locked(Job *job)
 /* Called with job_mutex held. */
 static int job_txn_apply_locked(Job *job, int fn(Job *))
 {
-    AioContext *inner_ctx;
     Job *other_job, *next;
     JobTxn *txn = job->txn;
     int rc = 0;
@@ -189,23 +178,14 @@ static int job_txn_apply_locked(Job *job, int fn(Job *))
      * twice - which would break AIO_WAIT_WHILE from within fn.
      */
     job_ref_locked(job);
-    aio_context_release(job->aio_context);
 
     QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
-        inner_ctx = other_job->aio_context;
-        aio_context_acquire(inner_ctx);
         rc = fn(other_job);
-        aio_context_release(inner_ctx);
         if (rc) {
             break;
         }
     }
 
-    /*
-     * Note that job->aio_context might have been changed by calling fn, so we
-     * can't use a local variable to cache it.
-     */
-    aio_context_acquire(job->aio_context);
     job_unref_locked(job);
     return rc;
 }
@@ -469,8 +449,12 @@ void job_unref_locked(Job *job)
         assert(!job->txn);
 
         if (job->driver->free) {
+            AioContext *aio_context = job->aio_context;
             job_unlock();
+            /* FIXME: aiocontext lock is required because cb calls blk_unref */
+            aio_context_acquire(aio_context);
             job->driver->free(job);
+            aio_context_release(aio_context);
             job_lock();
         }
 
@@ -543,20 +527,16 @@ void job_enter_cond_locked(Job *job, bool(*fn)(Job *job))
         return;
     }
 
-    real_job_lock();
     if (job->busy) {
-        real_job_unlock();
         return;
     }
 
     if (fn && !fn(job)) {
-        real_job_unlock();
         return;
     }
 
     timer_del(&job->sleep_timer);
     job->busy = true;
-    real_job_unlock();
     job_unlock();
     aio_co_wake(job->co);
     job_lock();
@@ -582,13 +562,11 @@ static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns)
 {
     AioContext *next_aio_context;
 
-    real_job_lock();
     if (ns != -1) {
         timer_mod(&job->sleep_timer, ns);
     }
     job->busy = false;
     job_event_idle_locked(job);
-    real_job_unlock();
     job_unlock();
     qemu_coroutine_yield();
     job_lock();
@@ -846,12 +824,15 @@ static void job_clean_locked(Job *job)
 static int job_finalize_single_locked(Job *job)
 {
     int job_ret;
+    AioContext *ctx = job->aio_context;
 
     assert(job_is_completed_locked(job));
 
     /* Ensure abort is called for late-transactional failures */
     job_update_rc_locked(job);
 
+    aio_context_acquire(ctx);
+
     if (!job->ret) {
         job_commit_locked(job);
     } else {
@@ -859,6 +840,8 @@ static int job_finalize_single_locked(Job *job)
     }
     job_clean_locked(job);
 
+    aio_context_release(ctx);
+
     if (job->cb) {
         job_ret = job->ret;
         job_unlock();
@@ -922,7 +905,6 @@ static void job_cancel_async_locked(Job *job, bool force)
 /* Called with job_mutex held. */
 static void job_completed_txn_abort_locked(Job *job)
 {
-    AioContext *ctx;
     JobTxn *txn = job->txn;
     Job *other_job;
 
@@ -935,54 +917,28 @@ static void job_completed_txn_abort_locked(Job *job)
     txn->aborting = true;
     job_txn_ref_locked(txn);
 
-    /*
-     * We can only hold the single job's AioContext lock while calling
-     * job_finalize_single_locked() because the finalization callbacks can
-     * involve calls of AIO_WAIT_WHILE(), which could deadlock otherwise.
-     * Note that the job's AioContext may change when it is finalized.
-     */
-    job_ref_locked(job);
-    aio_context_release(job->aio_context);
-
     /* Other jobs are effectively cancelled by us, set the status for
      * them; this job, however, may or may not be cancelled, depending
      * on the caller, so leave it. */
     QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
         if (other_job != job) {
-            ctx = other_job->aio_context;
-            aio_context_acquire(ctx);
             /*
              * This is a transaction: If one job failed, no result will matter.
              * Therefore, pass force=true to terminate all other jobs as quickly
              * as possible.
              */
             job_cancel_async_locked(other_job, true);
-            aio_context_release(ctx);
         }
     }
     while (!QLIST_EMPTY(&txn->jobs)) {
         other_job = QLIST_FIRST(&txn->jobs);
-        /*
-         * The job's AioContext may change, so store it in @ctx so we
-         * release the same context that we have acquired before.
-         */
-        ctx = other_job->aio_context;
-        aio_context_acquire(ctx);
         if (!job_is_completed_locked(other_job)) {
             assert(job_cancel_requested_locked(other_job));
             job_finish_sync_locked(other_job, NULL, NULL);
         }
         job_finalize_single_locked(other_job);
-        aio_context_release(ctx);
     }
 
-    /*
-     * Use job_ref_locked()/job_unref_locked() so we can read the AioContext
-     * here even if the job went away during job_finalize_single_locked().
-     */
-    aio_context_acquire(job->aio_context);
-    job_unref_locked(job);
-
     job_txn_unref_locked(txn);
 }
 
@@ -990,15 +946,20 @@ static void job_completed_txn_abort_locked(Job *job)
 static int job_prepare_locked(Job *job)
 {
     int ret;
+    AioContext *ctx = job->aio_context;
 
     GLOBAL_STATE_CODE();
+
     if (job->ret == 0 && job->driver->prepare) {
         job_unlock();
+        aio_context_acquire(ctx);
         ret = job->driver->prepare(job);
+        aio_context_release(ctx);
         job_lock();
         job->ret = ret;
         job_update_rc_locked(job);
     }
+
     return job->ret;
 }
 
@@ -1097,12 +1058,8 @@ static void job_completed_locked(Job *job)
 static void job_exit(void *opaque)
 {
     Job *job = (Job *)opaque;
-    AioContext *ctx;
     JOB_LOCK_GUARD();
 
-    job_ref_locked(job);
-    aio_context_acquire(job->aio_context);
-
     /* This is a lie, we're not quiescent, but still doing the completion
      * callbacks. However, completion callbacks tend to involve operations that
      * drain block nodes, and if .drained_poll still returned true, we would
@@ -1111,17 +1068,6 @@ static void job_exit(void *opaque)
     job_event_idle_locked(job);
 
     job_completed_locked(job);
-
-    /*
-     * Note that calling job_completed_locked can move the job to a different
-     * aio_context, so we cannot cache from above.
-     * job_txn_apply_locked takes care of
-     * acquiring the new lock, and we ref/unref to avoid job_completed_locked
-     * freeing the job underneath us.
-     */
-    ctx = job->aio_context;
-    job_unref_locked(job);
-    aio_context_release(ctx);
 }
 
 /**
@@ -1232,14 +1178,10 @@ int job_cancel_sync_locked(Job *job, bool force)
 void job_cancel_sync_all(void)
 {
     Job *job;
-    AioContext *aio_context;
 
     JOB_LOCK_GUARD();
     while ((job = job_next_locked(NULL))) {
-        aio_context = job->aio_context;
-        aio_context_acquire(aio_context);
         job_cancel_sync_locked(job, true);
-        aio_context_release(aio_context);
     }
 }
 
@@ -1286,8 +1228,8 @@ int job_finish_sync_locked(Job *job, void (*finish)(Job *, Error **errp),
     }
 
     job_unlock();
-    AIO_WAIT_WHILE(job->aio_context,
-                   (job_enter(job), !job_is_completed(job)));
+    AIO_WAIT_WHILE_UNLOCKED(job->aio_context,
+                            (job_enter(job), !job_is_completed(job)));
     job_lock();
 
     ret = (job_is_cancelled_locked(job) && job->ret == 0)
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 0db056ea63..4924ceb562 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -930,9 +930,9 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
         tjob->prepare_ret = -EIO;
         break;
     }
+    aio_context_release(ctx);
 
     job_start(&job->job);
-    aio_context_release(ctx);
 
     if (use_iothread) {
         /* job_co_entry() is run in the I/O thread, wait for the actual job
@@ -1016,12 +1016,12 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
         g_assert_true(job->job.busy); /* We're in qemu_co_sleep_ns() */
     }
 
-    aio_context_acquire(ctx);
     WITH_JOB_LOCK_GUARD() {
         ret = job_complete_sync_locked(&job->job, &error_abort);
     }
     g_assert_cmpint(ret, ==, (result == TEST_JOB_SUCCESS ? 0 : -EIO));
 
+    aio_context_acquire(ctx);
     if (use_iothread) {
         blk_set_aio_context(blk_src, qemu_get_aio_context(), &error_abort);
         assert(blk_get_aio_context(blk_target) == qemu_get_aio_context());
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index 89e7f0fffb..9d7c8be00f 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -455,10 +455,10 @@ static void test_attach_blockjob(void)
         aio_poll(qemu_get_aio_context(), false);
     }
 
-    aio_context_acquire(ctx);
     WITH_JOB_LOCK_GUARD() {
         job_complete_sync_locked(&tjob->common.job, &error_abort);
     }
+    aio_context_acquire(ctx);
     blk_set_aio_context(blk, qemu_get_aio_context(), &error_abort);
     aio_context_release(ctx);
 
diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c
index 8280b1e0c9..d6fc52f80a 100644
--- a/tests/unit/test-blockjob.c
+++ b/tests/unit/test-blockjob.c
@@ -228,10 +228,6 @@ static void cancel_common(CancelJob *s)
     BlockJob *job = &s->common;
     BlockBackend *blk = s->blk;
     JobStatus sts = job->job.status;
-    AioContext *ctx;
-
-    ctx = job->job.aio_context;
-    aio_context_acquire(ctx);
 
     WITH_JOB_LOCK_GUARD() {
         job_cancel_sync_locked(&job->job, true);
@@ -244,7 +240,6 @@ static void cancel_common(CancelJob *s)
     }
     destroy_blk(blk);
 
-    aio_context_release(ctx);
 }
 
 static void test_cancel_created(void)
@@ -405,11 +400,9 @@ static void test_cancel_concluded(void)
     aio_poll(qemu_get_aio_context(), true);
     assert(job->status == JOB_STATUS_PENDING);
 
-    aio_context_acquire(job->aio_context);
     WITH_JOB_LOCK_GUARD() {
         job_finalize_locked(job, &error_abort);
     }
-    aio_context_release(job->aio_context);
     assert(job->status == JOB_STATUS_CONCLUDED);
 
     cancel_common(s);
@@ -503,13 +496,11 @@ static void test_complete_in_standby(void)
 
     /* Wait for the job to become READY */
     job_start(job);
-    aio_context_acquire(ctx);
     /*
      * Here we are waiting for the status to change, so don't bother
      * protecting the read every time.
      */
-    AIO_WAIT_WHILE(ctx, job->status != JOB_STATUS_READY);
-    aio_context_release(ctx);
+    AIO_WAIT_WHILE_UNLOCKED(ctx, job->status != JOB_STATUS_READY);
 
     /* Begin the drained section, pausing the job */
     bdrv_drain_all_begin();
@@ -529,6 +520,7 @@ static void test_complete_in_standby(void)
         job_complete_locked(job, &error_abort);
 
         /* The test is done now, clean up. */
+        aio_context_release(ctx);
         job_finish_sync_locked(job, NULL, &error_abort);
         assert(job->status == JOB_STATUS_PENDING);
 
@@ -538,6 +530,7 @@ static void test_complete_in_standby(void)
         job_dismiss_locked(&job, &error_abort);
     }
 
+    aio_context_acquire(ctx);
     destroy_blk(blk);
     aio_context_release(ctx);
     iothread_join(iothread);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 18/18] block_job_query: remove atomic read
  2022-06-16 13:18 [PATCH v7 00/18] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (16 preceding siblings ...)
  2022-06-16 13:18 ` [PATCH v7 17/18] job.c: enable job lock/unlock and remove Aiocontext locks Emanuele Giuseppe Esposito
@ 2022-06-16 13:18 ` Emanuele Giuseppe Esposito
  17 siblings, 0 replies; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-16 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel,
	Emanuele Giuseppe Esposito

Not sure what the atomic here was supposed to do, since job.busy
is protected by the job lock. Since the whole function
is called under job_mutex, just remove the atomic.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 blockjob.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/blockjob.c b/blockjob.c
index 8b9e10813d..d84ddca363 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -324,7 +324,7 @@ BlockJobInfo *block_job_query_locked(BlockJob *job, Error **errp)
     info = g_new0(BlockJobInfo, 1);
     info->type      = g_strdup(job_type_str(&job->job));
     info->device    = g_strdup(job->job.id);
-    info->busy      = qatomic_read(&job->job.busy);
+    info->busy      = job->job.busy;
     info->paused    = job->job.pause_count > 0;
     info->offset    = progress_current;
     info->len       = progress_total;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 01/18] job.c: make job_mutex and job_lock/unlock() public
  2022-06-16 13:18 ` [PATCH v7 01/18] job.c: make job_mutex and job_lock/unlock() public Emanuele Giuseppe Esposito
@ 2022-06-21 13:47   ` Vladimir Sementsov-Ogievskiy
  2022-06-24 18:22   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-06-21 13:47 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
> job mutex will be used to protect the job struct elements and list,
> replacing AioContext locks.
> 
> Right now use a shared lock for all jobs, in order to keep things
> simple. Once the AioContext lock is gone, we can introduce per-job
> locks.
> 
> To simplify the switch from aiocontext to job lock, introduce
> *nop*  lock/unlock functions and macros.
> We want to always call job_lock/unlock outside the AioContext locks,
> and not vice-versa, otherwise we might get a deadlock. This is not
> straightforward to do, and that's why we start with nop functions.
> Once everything is protected by job_lock/unlock, we can change the nop into
> an actual mutex and remove the aiocontext lock.
> 
> Since job_mutex is already being used, add static
> real_job_{lock/unlock} for the existing usage.
> 
> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
> Reviewed-by: Stefan Hajnoczi<stefanha@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 02/18] job.h: categorize fields in struct Job
  2022-06-16 13:18 ` [PATCH v7 02/18] job.h: categorize fields in struct Job Emanuele Giuseppe Esposito
@ 2022-06-21 14:29   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-06-21 14:29 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
> Categorize the fields in struct Job to understand which ones
> need to be protected by the job mutex and which don't.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>   include/qemu/job.h | 61 +++++++++++++++++++++++++++-------------------
>   1 file changed, 36 insertions(+), 25 deletions(-)
> 
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index d1192ffd61..876e13d549 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -40,27 +40,52 @@ typedef struct JobTxn JobTxn;
>    * Long-running operation.
>    */
>   typedef struct Job {
> +
> +    /* Fields set at initialization (job_create), and never modified */
> +
>       /** The ID of the job. May be NULL for internal jobs. */
>       char *id;
>   
> -    /** The type of this job. */
> +    /**
> +     * The type of this job.
> +     * All callbacks are called with job_mutex *not* held.
> +     */
>       const JobDriver *driver;
>   
> -    /** Reference count of the block job */
> -    int refcnt;
> -
> -    /** Current state; See @JobStatus for details. */
> -    JobStatus status;
> -
> -    /** AioContext to run the job coroutine in */
> -    AioContext *aio_context;
> -
>       /**
>        * The coroutine that executes the job.  If not NULL, it is reentered when
>        * busy is false and the job is cancelled.
> +     * Initialized in job_start()
>        */
>       Coroutine *co;
>   
> +    /** True if this job should automatically finalize itself */
> +    bool auto_finalize;
> +
> +    /** True if this job should automatically dismiss itself */
> +    bool auto_dismiss;
> +
> +    /** The completion function that will be called when the job completes.  */
> +    BlockCompletionFunc *cb;
> +
> +    /** The opaque value that is passed to the completion function.  */
> +    void *opaque;
> +
> +    /* ProgressMeter API is thread-safe */
> +    ProgressMeter progress;
> +
> +
> +    /** Protected by AioContext lock */

Previous groups stats with '/*'. Should /** be substituted by /* ?

> +
> +    /** AioContext to run the job coroutine in */
> +    AioContext *aio_context;

Not sure how much is it protected. Probably we read it without locking. But that should go away anyway.

> +
> +    /** Reference count of the block job */
> +    int refcnt;
> +
> +    /** Current state; See @JobStatus for details. */
> +    JobStatus status;
> +
>       /**
>        * Timer that is used by @job_sleep_ns. Accessed under job_mutex (in
>        * job.c).
> @@ -112,14 +137,6 @@ typedef struct Job {
>       /** Set to true when the job has deferred work to the main loop. */
>       bool deferred_to_main_loop;
>   
> -    /** True if this job should automatically finalize itself */
> -    bool auto_finalize;
> -
> -    /** True if this job should automatically dismiss itself */
> -    bool auto_dismiss;
> -
> -    ProgressMeter progress;
> -
>       /**
>        * Return code from @run and/or @prepare callback(s).
>        * Not final until the job has reached the CONCLUDED status.
> @@ -134,12 +151,6 @@ typedef struct Job {
>        */
>       Error *err;
>   
> -    /** The completion function that will be called when the job completes.  */
> -    BlockCompletionFunc *cb;
> -
> -    /** The opaque value that is passed to the completion function.  */
> -    void *opaque;
> -
>       /** Notifiers called when a cancelled job is finalised */
>       NotifierList on_finalize_cancelled;
>   
> @@ -167,6 +178,7 @@ typedef struct Job {
>   
>   /**
>    * Callbacks and other information about a Job driver.
> + * All callbacks are invoked with job_mutex *not* held.

Should this be in this patch? Seems related. But will we have a lot more comments like this in later patches?

>    */
>   struct JobDriver {
>   
> @@ -472,7 +484,6 @@ void job_yield(Job *job);
>    */
>   void coroutine_fn job_sleep_ns(Job *job, int64_t ns);
>   
> -

I'd drop this, looks like accidental unrelated style fixing.

>   /** Returns the JobType of a given Job. */
>   JobType job_type(const Job *job);
>   

as is, or with dropped 1-2 last hunks:
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 03/18] job.c: API functions not used outside should be static
  2022-06-16 13:18 ` [PATCH v7 03/18] job.c: API functions not used outside should be static Emanuele Giuseppe Esposito
@ 2022-06-21 14:34   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-06-21 14:34 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
> job_event_* functions can all be static, as they are not used
> outside job.c.
> 
> Same applies for job_txn_add_job().
> 
> Reviewed-by: Stefan Hajnoczi<stefanha@redhat.com>
> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 04/18] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED
  2022-06-16 13:18 ` [PATCH v7 04/18] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
@ 2022-06-21 14:40   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-06-21 14:40 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
> Same as AIO_WAIT_WHILE macro, but if we are in the Main loop
> do not release and then acquire ctx_ 's aiocontext.
> 
> Once all Aiocontext locks go away, this macro will replace
> AIO_WAIT_WHILE.
> 
> Reviewed-by: Stefan Hajnoczi<stefanha@redhat.com>
> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>

A bit strange that you put r-b marks above your s-o-b.

Usually, marks goes in historical order:
1. your s-o-b
2. reviewers r-b marks
3. maintainer's s-o-b mark


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 05/18] job.h: add _locked duplicates for job API functions called with and without job_mutex
  2022-06-16 13:18 ` [PATCH v7 05/18] job.h: add _locked duplicates for job API functions called with and without job_mutex Emanuele Giuseppe Esposito
@ 2022-06-21 15:03   ` Vladimir Sementsov-Ogievskiy
  2022-06-22 14:26     ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-06-21 15:03 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
> In preparation to the job_lock/unlock usage, create _locked
> duplicates of some functions, since they will be sometimes called with
> job_mutex held (mostly within job.c),
> and sometimes without (mostly from JobDrivers using the job API).
> 
> Therefore create a _locked version of such function, so that it
> can be used in both cases.
> 
> List of functions duplicated as _locked:
> job_is_ready (both versions are public)
> job_is_completed (both versions are public)
> job_is_cancelled (_locked version is public, needed by mirror.c)
> job_pause_point (_locked version is static, purely done to simplify the code)
> job_cancel_requested (_locked version is static)
> 
> Note: at this stage, job_{lock/unlock} and job lock guard macros
> are *nop*.

Great description, thanks!

> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>

Hmm, after this patch, part of public API has "called with/without lock" comments. But there are still public job_* functions that doesn't have this mark. That look inconsistent. I think, all public API without _locked suffix, should be called without a lock? If so, we don't need to write it for each function. And only mark _locked() functions with "called with lock held" marks.

> ---
>   include/qemu/job.h | 25 +++++++++++++++++++++---
>   job.c              | 48 ++++++++++++++++++++++++++++++++++++++++------
>   2 files changed, 64 insertions(+), 9 deletions(-)
> 

[..]

>   
> -/** Returns whether the job is ready to be completed. */
> +/** Just like job_is_completed, but called between job_lock and job_unlock */

I'd prefer phrasing "called with job_lock held". You wording make me think about

job_lock()
...
job_unlock()

foo()

job_lock()
...
job_unlock()

- foo() actually called between job_lock and job_unlock :)

(it's a nitpicking, you may ignore it :)

> +bool job_is_completed_locked(Job *job);
> +
> +/**
> + * Returns whether the job is ready to be completed.
> + * Called with job_mutex *not* held.
> + */
>   bool job_is_ready(Job *job);
>   
> +/** Just like job_is_ready, but called between job_lock and job_unlock */
> +bool job_is_ready_locked(Job *job);
> +
>   /**
>    * Request @job to pause at the next pause point. Must be paired with
>    * job_resume(). If the job is supposed to be resumed by user action, call
> diff --git a/job.c b/job.c
> index cafd597ba4..c4776985c4 100644
> --- a/job.c
> +++ b/job.c
> @@ -236,19 +236,32 @@ const char *job_type_str(const Job *job)
>       return JobType_str(job_type(job));
>   }
>   
> -bool job_is_cancelled(Job *job)
> +bool job_is_cancelled_locked(Job *job)
>   {
>       /* force_cancel may be true only if cancelled is true, too */
>       assert(job->cancelled || !job->force_cancel);
>       return job->force_cancel;
>   }
>   
> -bool job_cancel_requested(Job *job)
> +bool job_is_cancelled(Job *job)
> +{
> +    JOB_LOCK_GUARD();
> +    return job_is_cancelled_locked(job);
> +}
> +
> +/* Called with job_mutex held. */
> +static bool job_cancel_requested_locked(Job *job)
>   {
>       return job->cancelled;
>   }
>   
> -bool job_is_ready(Job *job)
> +bool job_cancel_requested(Job *job)
> +{
> +    JOB_LOCK_GUARD();
> +    return job_cancel_requested_locked(job);
> +}
> +
> +bool job_is_ready_locked(Job *job)
>   {
>       switch (job->status) {
>       case JOB_STATUS_UNDEFINED:
> @@ -270,7 +283,13 @@ bool job_is_ready(Job *job)
>       return false;
>   }
>   
> -bool job_is_completed(Job *job)
> +bool job_is_ready(Job *job)
> +{
> +    JOB_LOCK_GUARD();
> +    return job_is_ready_locked(job);
> +}
> +
> +bool job_is_completed_locked(Job *job)
>   {
>       switch (job->status) {
>       case JOB_STATUS_UNDEFINED:
> @@ -292,6 +311,12 @@ bool job_is_completed(Job *job)
>       return false;
>   }
>   
> +bool job_is_completed(Job *job)
> +{
> +    JOB_LOCK_GUARD();
> +    return job_is_completed_locked(job);
> +}
> +
>   static bool job_started(Job *job)
>   {
>       return job->co;
> @@ -521,7 +546,8 @@ static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
>       assert(job->busy);
>   }
>   
> -void coroutine_fn job_pause_point(Job *job)
> +/* Called with job_mutex held, but releases it temporarily. */
> +static void coroutine_fn job_pause_point_locked(Job *job)
>   {
>       assert(job && job_started(job));

In this function, we should now use job_pause_point_locked(), otherwise it looks incorrect. (I remember that lock is noop for now, but still, let's keep think as correct as possible)


And job_do_yield() takes lock by itself. How to resolve it?

>   
> @@ -552,6 +578,12 @@ void coroutine_fn job_pause_point(Job *job)
>       }
>   }
>   
> +void coroutine_fn job_pause_point(Job *job)
> +{
> +    JOB_LOCK_GUARD();
> +    job_pause_point_locked(job);
> +}
> +
>   void job_yield(Job *job)
>   {
>       assert(job->busy);
> @@ -949,11 +981,15 @@ static void job_completed(Job *job)
>       }
>   }
>   
> -/** Useful only as a type shim for aio_bh_schedule_oneshot. */
> +/**
> + * Useful only as a type shim for aio_bh_schedule_oneshot.
> + * Called with job_mutex *not* held.
> + */
>   static void job_exit(void *opaque)
>   {
>       Job *job = (Job *)opaque;
>       AioContext *ctx;
> +    JOB_LOCK_GUARD();

That's not part of this patch.. Doesn't relate to "add _locked duplicates"

>   
>       job_ref(job);
>       aio_context_acquire(job->aio_context);


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 06/18] jobs: protect jobs with job_lock/unlock
  2022-06-16 13:18 ` [PATCH v7 06/18] jobs: protect jobs with job_lock/unlock Emanuele Giuseppe Esposito
@ 2022-06-21 16:47   ` Vladimir Sementsov-Ogievskiy
  2022-06-21 17:09   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-06-21 16:47 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
> Introduce the job locking mechanism through the whole job API,

Not the whole, I think? As next patches introduces locking in more and more places..

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 06/18] jobs: protect jobs with job_lock/unlock
  2022-06-16 13:18 ` [PATCH v7 06/18] jobs: protect jobs with job_lock/unlock Emanuele Giuseppe Esposito
  2022-06-21 16:47   ` Vladimir Sementsov-Ogievskiy
@ 2022-06-21 17:09   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-06-21 17:09 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
>       }
> @@ -1939,7 +1943,9 @@ static void blockdev_backup_abort(BlkActionState *common)
>           aio_context = bdrv_get_aio_context(state->bs);
>           aio_context_acquire(aio_context);
>   
> -        job_cancel_sync(&state->job->job, true);
> +        WITH_JOB_LOCK_GUARD() {
> +            job_cancel_sync(&state->job->job, true);
> +        }
>   

Definitely this patch will be simplified, if we add job_cancel_sync_locked and make job_cancel_sync a wrapper on it, like other functions in 05 patch. And may be some other functions too.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 10/18] jobs: rename static functions called with job_mutex held
  2022-06-16 13:18 ` [PATCH v7 10/18] jobs: rename static functions called with job_mutex held Emanuele Giuseppe Esposito
@ 2022-06-21 17:26   ` Vladimir Sementsov-Ogievskiy
  2022-06-22 14:26     ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-06-21 17:26 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
> With the*nop*  job_lock/unlock placed, rename the static
> functions that are always under job_mutex, adding "_locked" suffix.
> 
> List of functions that get this suffix:
> job_txn_ref		   job_txn_del_job
> job_txn_apply		   job_state_transition
> job_should_pause	   job_event_cancelled
> job_event_completed	   job_event_pending
> job_event_ready		   job_event_idle
> job_do_yield		   job_timer_not_pending
> job_do_dismiss		   job_conclude
> job_update_rc		   job_commit
> job_abort		   job_clean
> job_finalize_single	   job_cancel_async
> job_completed_txn_abort	   job_prepare
> job_needs_finalize	   job_do_finalize
> job_transition_to_pending  job_completed_txn_success
> job_completed		   job_cancel_err
> job_force_cancel_err
> 
> Note that "locked" refers to the*nop*  job_lock/unlock, and not
> real_job_lock/unlock.
> 
> No functional change intended.
> 
> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>


Hmm. Maybe it was already discussed.. But for me it seems, that it would be simpler to review previous patches, that fix job_ API users to use locking properly, if this renaming go earlier.

Anyway, in this series, we can't update everything at once. So patch to patch, we make the code more and more correct. (yes I remember that lock() is a noop, but I should review thinking that it real, otherwise, how to review?)

So, I'm saying about formal correctness of using lock() unlock() function in connection with introduced _locked prifixes and in connection with how it should finally work.

You do:

05. introduce some _locked functions, that just duplicates, and job_pause_point_locked() is formally inconsistent, as I said.

06. Update a lot of places, to give them their final form (but not final, as some functions will be renamed to _locked, some not, hard to imagine)

07,08,09. Update some more, and even more places. very hard to track formal correctness of using locks

10-...: rename APIs.


What do you think about the following:

1. Introduce noop lock, and some internal _locked() versions, and keep formal consistency inside job.c, considering all public interfaces as unlocked:

  at this point:
   - everything correct inside job.c
   - no public interfaces with _locked prefix
   - all public interfaces take mutex internally
   - no external user take mutex by hand

We can rename all internal static functions at this step too.

2. Introduce some public _locked APIs, that we'll use in next patches

3. Now start fixing external users in several patches:
   
   - protect by mutex direct use of job fields
   - make wider locks and move to _locked APIs inside them where needed


In this scenario, every updated unit becomes formally correct after update, and after all steps everything is formally correct, and we can move to turning-on the mutex.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 05/18] job.h: add _locked duplicates for job API functions called with and without job_mutex
  2022-06-21 15:03   ` Vladimir Sementsov-Ogievskiy
@ 2022-06-22 14:26     ` Emanuele Giuseppe Esposito
  2022-06-22 18:12       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-22 14:26 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel



Am 21/06/2022 um 17:03 schrieb Vladimir Sementsov-Ogievskiy:
> On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
>> In preparation to the job_lock/unlock usage, create _locked
>> duplicates of some functions, since they will be sometimes called with
>> job_mutex held (mostly within job.c),
>> and sometimes without (mostly from JobDrivers using the job API).
>>
>> Therefore create a _locked version of such function, so that it
>> can be used in both cases.
>>
>> List of functions duplicated as _locked:
>> job_is_ready (both versions are public)
>> job_is_completed (both versions are public)
>> job_is_cancelled (_locked version is public, needed by mirror.c)
>> job_pause_point (_locked version is static, purely done to simplify
>> the code)
>> job_cancel_requested (_locked version is static)
>>
>> Note: at this stage, job_{lock/unlock} and job lock guard macros
>> are *nop*.
> 
> Great description, thanks!
> 
>>
>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> 
> Hmm, after this patch, part of public API has "called with/without lock"
> comments. But there are still public job_* functions that doesn't have
> this mark. That look inconsistent. I think, all public API without
> _locked suffix, should be called without a lock? If so, we don't need to
> write it for each function. And only mark _locked() functions with
> "called with lock held" marks.
> 
>> ---
>>   include/qemu/job.h | 25 +++++++++++++++++++++---
>>   job.c              | 48 ++++++++++++++++++++++++++++++++++++++++------
>>   2 files changed, 64 insertions(+), 9 deletions(-)
>>
> 
> [..]
> 
>>   -/** Returns whether the job is ready to be completed. */
>> +/** Just like job_is_completed, but called between job_lock and
>> job_unlock */
> 
> I'd prefer phrasing "called with job_lock held". You wording make me
> think about
> 
> job_lock()
> ...
> job_unlock()
> 
> foo()
> 
> job_lock()
> ...
> job_unlock()
> 
> - foo() actually called between job_lock and job_unlock :)
> 
> (it's a nitpicking, you may ignore it :)
> 
>> +bool job_is_completed_locked(Job *job);
>> +
>> +/**
>> + * Returns whether the job is ready to be completed.
>> + * Called with job_mutex *not* held.
>> + */
>>   bool job_is_ready(Job *job);
>>   +/** Just like job_is_ready, but called between job_lock and
>> job_unlock */
>> +bool job_is_ready_locked(Job *job);
>> +
>>   /**
>>    * Request @job to pause at the next pause point. Must be paired with
>>    * job_resume(). If the job is supposed to be resumed by user
>> action, call
>> diff --git a/job.c b/job.c
>> index cafd597ba4..c4776985c4 100644
>> --- a/job.c
>> +++ b/job.c
>> @@ -236,19 +236,32 @@ const char *job_type_str(const Job *job)
>>       return JobType_str(job_type(job));
>>   }
>>   -bool job_is_cancelled(Job *job)
>> +bool job_is_cancelled_locked(Job *job)
>>   {
>>       /* force_cancel may be true only if cancelled is true, too */
>>       assert(job->cancelled || !job->force_cancel);
>>       return job->force_cancel;
>>   }
>>   -bool job_cancel_requested(Job *job)
>> +bool job_is_cancelled(Job *job)
>> +{
>> +    JOB_LOCK_GUARD();
>> +    return job_is_cancelled_locked(job);
>> +}
>> +
>> +/* Called with job_mutex held. */
>> +static bool job_cancel_requested_locked(Job *job)
>>   {
>>       return job->cancelled;
>>   }
>>   -bool job_is_ready(Job *job)
>> +bool job_cancel_requested(Job *job)
>> +{
>> +    JOB_LOCK_GUARD();
>> +    return job_cancel_requested_locked(job);
>> +}
>> +
>> +bool job_is_ready_locked(Job *job)
>>   {
>>       switch (job->status) {
>>       case JOB_STATUS_UNDEFINED:
>> @@ -270,7 +283,13 @@ bool job_is_ready(Job *job)
>>       return false;
>>   }
>>   -bool job_is_completed(Job *job)
>> +bool job_is_ready(Job *job)
>> +{
>> +    JOB_LOCK_GUARD();
>> +    return job_is_ready_locked(job);
>> +}
>> +
>> +bool job_is_completed_locked(Job *job)
>>   {
>>       switch (job->status) {
>>       case JOB_STATUS_UNDEFINED:
>> @@ -292,6 +311,12 @@ bool job_is_completed(Job *job)
>>       return false;
>>   }
>>   +bool job_is_completed(Job *job)
>> +{
>> +    JOB_LOCK_GUARD();
>> +    return job_is_completed_locked(job);
>> +}
>> +
>>   static bool job_started(Job *job)
>>   {
>>       return job->co;
>> @@ -521,7 +546,8 @@ static void coroutine_fn job_do_yield(Job *job,
>> uint64_t ns)
>>       assert(job->busy);
>>   }
>>   -void coroutine_fn job_pause_point(Job *job)
>> +/* Called with job_mutex held, but releases it temporarily. */
>> +static void coroutine_fn job_pause_point_locked(Job *job)
>>   {
>>       assert(job && job_started(job));
> 
> In this function, we should now use job_pause_point_locked(), otherwise
> it looks incorrect. (I remember that lock is noop for now, but still,
> let's keep think as correct as possible)
> 

I miss your point here. What is incorrect?
> 
> And job_do_yield() takes lock by itself. How to resolve it?

You mean the real_job_lock/unlock taken in job_do_yield?

> 
>>   @@ -552,6 +578,12 @@ void coroutine_fn job_pause_point(Job *job)
>>       }
>>   }
>>   +void coroutine_fn job_pause_point(Job *job)
>> +{
>> +    JOB_LOCK_GUARD();
>> +    job_pause_point_locked(job);
>> +}
>> +
>>   void job_yield(Job *job)
>>   {
>>       assert(job->busy);
>> @@ -949,11 +981,15 @@ static void job_completed(Job *job)
>>       }
>>   }
>>   -/** Useful only as a type shim for aio_bh_schedule_oneshot. */
>> +/**
>> + * Useful only as a type shim for aio_bh_schedule_oneshot.
>> + * Called with job_mutex *not* held.
>> + */
>>   static void job_exit(void *opaque)
>>   {
>>       Job *job = (Job *)opaque;
>>       AioContext *ctx;
>> +    JOB_LOCK_GUARD();
> 
> That's not part of this patch.. Doesn't relate to "add _locked duplicates"
> 
>>         job_ref(job);
>>       aio_context_acquire(job->aio_context);
> 
> 



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 10/18] jobs: rename static functions called with job_mutex held
  2022-06-21 17:26   ` Vladimir Sementsov-Ogievskiy
@ 2022-06-22 14:26     ` Emanuele Giuseppe Esposito
  2022-06-22 18:38       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-22 14:26 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, Kevin Wolf
  Cc: Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel



Am 21/06/2022 um 19:26 schrieb Vladimir Sementsov-Ogievskiy:
> On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
>> With the*nop*  job_lock/unlock placed, rename the static
>> functions that are always under job_mutex, adding "_locked" suffix.
>>
>> List of functions that get this suffix:
>> job_txn_ref           job_txn_del_job
>> job_txn_apply           job_state_transition
>> job_should_pause       job_event_cancelled
>> job_event_completed       job_event_pending
>> job_event_ready           job_event_idle
>> job_do_yield           job_timer_not_pending
>> job_do_dismiss           job_conclude
>> job_update_rc           job_commit
>> job_abort           job_clean
>> job_finalize_single       job_cancel_async
>> job_completed_txn_abort       job_prepare
>> job_needs_finalize       job_do_finalize
>> job_transition_to_pending  job_completed_txn_success
>> job_completed           job_cancel_err
>> job_force_cancel_err
>>
>> Note that "locked" refers to the*nop*  job_lock/unlock, and not
>> real_job_lock/unlock.
>>
>> No functional change intended.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
> 
> 
> Hmm. Maybe it was already discussed.. But for me it seems, that it would
> be simpler to review previous patches, that fix job_ API users to use
> locking properly, if this renaming go earlier.
> 
> Anyway, in this series, we can't update everything at once. So patch to
> patch, we make the code more and more correct. (yes I remember that
> lock() is a noop, but I should review thinking that it real, otherwise,
> how to review?)
> 
> So, I'm saying about formal correctness of using lock() unlock()
> function in connection with introduced _locked prifixes and in
> connection with how it should finally work.
> 
> You do:
> 
> 05. introduce some _locked functions, that just duplicates, and
> job_pause_point_locked() is formally inconsistent, as I said.
> 
> 06. Update a lot of places, to give them their final form (but not
> final, as some functions will be renamed to _locked, some not, hard to
> imagine)
> 
> 07,08,09. Update some more, and even more places. very hard to track
> formal correctness of using locks
> 
> 10-...: rename APIs.
> 
> 
> What do you think about the following:
> 
> 1. Introduce noop lock, and some internal _locked() versions, and keep
> formal consistency inside job.c, considering all public interfaces as
> unlocked:
> 
>  at this point:
>   - everything correct inside job.c
>   - no public interfaces with _locked prefix
>   - all public interfaces take mutex internally
>   - no external user take mutex by hand
> 
> We can rename all internal static functions at this step too.
> 
> 2. Introduce some public _locked APIs, that we'll use in next patches
> 
> 3. Now start fixing external users in several patches:
>     - protect by mutex direct use of job fields
>   - make wider locks and move to _locked APIs inside them where needed
> 
> 
> In this scenario, every updated unit becomes formally correct after
> update, and after all steps everything is formally correct, and we can
> move to turning-on the mutex.
> 

I don't understand your logic also here, sorry :(

I assume you want to keep patch 1-4, then the problem is assing job_lock
and renaming functions in _locked.
So I would say the problem is in patch 5-6-10-11-12-13. All the others
should be self contained.

I understand patch 5 is a little hard to follow.

Now, I am not sure what you propose here but it seems that the end goal
is to just have the same result, but with additional intermediate steps
that are just "do this just because in the next patch will be useful".
I think the problem is that we are going to miss the "why we need the
lock" logic in the patches if we do so.

The logic I tried to convey in this order is the following:
- job.h: add _locked duplicates for job API functions called with and
without job_mutex
	Just create duplicates of functions

- jobs: protect jobs with job_lock/unlock
	QMP and monitor functions call APIs that assume lock is taken,
	drivers must take explicitly the lock

- jobs: rename static functions called with job_mutex held
- job.h: rename job API functions called with job_mutex held
- block_job: rename block_job functions called with job_mutex held
	*given* that some functions are always under lock, transform
	them in _locked. Requires the job_lock/unlock patch

- job.h: define unlocked functions
	Comments on the public functions that are not _locked


@Kevin, since you also had some feedbacks on the patch ordering, do you
agree with this ordering or you have some other ideas?

Following your suggestion, we could move patches 10-11-12-13 before
patch 6 "jobs: protect jobs with job_lock/unlock".

(Apologies for changing my mind, but being the second complain I am
starting to reconsider reordering the patches).

Thank you,
Emanuele



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 05/18] job.h: add _locked duplicates for job API functions called with and without job_mutex
  2022-06-22 14:26     ` Emanuele Giuseppe Esposito
@ 2022-06-22 18:12       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-06-22 18:12 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

On 6/22/22 17:26, Emanuele Giuseppe Esposito wrote:
> 
> 
> Am 21/06/2022 um 17:03 schrieb Vladimir Sementsov-Ogievskiy:
>> On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
>>> In preparation to the job_lock/unlock usage, create _locked
>>> duplicates of some functions, since they will be sometimes called with
>>> job_mutex held (mostly within job.c),
>>> and sometimes without (mostly from JobDrivers using the job API).
>>>
>>> Therefore create a _locked version of such function, so that it
>>> can be used in both cases.
>>>
>>> List of functions duplicated as _locked:
>>> job_is_ready (both versions are public)
>>> job_is_completed (both versions are public)
>>> job_is_cancelled (_locked version is public, needed by mirror.c)
>>> job_pause_point (_locked version is static, purely done to simplify
>>> the code)
>>> job_cancel_requested (_locked version is static)
>>>
>>> Note: at this stage, job_{lock/unlock} and job lock guard macros
>>> are *nop*.
>>
>> Great description, thanks!
>>
>>>
>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>
>> Hmm, after this patch, part of public API has "called with/without lock"
>> comments. But there are still public job_* functions that doesn't have
>> this mark. That look inconsistent. I think, all public API without
>> _locked suffix, should be called without a lock? If so, we don't need to
>> write it for each function. And only mark _locked() functions with
>> "called with lock held" marks.
>>
>>> ---
>>>    include/qemu/job.h | 25 +++++++++++++++++++++---
>>>    job.c              | 48 ++++++++++++++++++++++++++++++++++++++++------
>>>    2 files changed, 64 insertions(+), 9 deletions(-)
>>>
>>
>> [..]
>>
>>>    -/** Returns whether the job is ready to be completed. */
>>> +/** Just like job_is_completed, but called between job_lock and
>>> job_unlock */
>>
>> I'd prefer phrasing "called with job_lock held". You wording make me
>> think about
>>
>> job_lock()
>> ...
>> job_unlock()
>>
>> foo()
>>
>> job_lock()
>> ...
>> job_unlock()
>>
>> - foo() actually called between job_lock and job_unlock :)
>>
>> (it's a nitpicking, you may ignore it :)
>>
>>> +bool job_is_completed_locked(Job *job);
>>> +
>>> +/**
>>> + * Returns whether the job is ready to be completed.
>>> + * Called with job_mutex *not* held.
>>> + */
>>>    bool job_is_ready(Job *job);
>>>    +/** Just like job_is_ready, but called between job_lock and
>>> job_unlock */
>>> +bool job_is_ready_locked(Job *job);
>>> +
>>>    /**
>>>     * Request @job to pause at the next pause point. Must be paired with
>>>     * job_resume(). If the job is supposed to be resumed by user
>>> action, call
>>> diff --git a/job.c b/job.c
>>> index cafd597ba4..c4776985c4 100644
>>> --- a/job.c
>>> +++ b/job.c
>>> @@ -236,19 +236,32 @@ const char *job_type_str(const Job *job)
>>>        return JobType_str(job_type(job));
>>>    }
>>>    -bool job_is_cancelled(Job *job)
>>> +bool job_is_cancelled_locked(Job *job)
>>>    {
>>>        /* force_cancel may be true only if cancelled is true, too */
>>>        assert(job->cancelled || !job->force_cancel);
>>>        return job->force_cancel;
>>>    }
>>>    -bool job_cancel_requested(Job *job)
>>> +bool job_is_cancelled(Job *job)
>>> +{
>>> +    JOB_LOCK_GUARD();
>>> +    return job_is_cancelled_locked(job);
>>> +}
>>> +
>>> +/* Called with job_mutex held. */
>>> +static bool job_cancel_requested_locked(Job *job)
>>>    {
>>>        return job->cancelled;
>>>    }
>>>    -bool job_is_ready(Job *job)
>>> +bool job_cancel_requested(Job *job)
>>> +{
>>> +    JOB_LOCK_GUARD();
>>> +    return job_cancel_requested_locked(job);
>>> +}
>>> +
>>> +bool job_is_ready_locked(Job *job)
>>>    {
>>>        switch (job->status) {
>>>        case JOB_STATUS_UNDEFINED:
>>> @@ -270,7 +283,13 @@ bool job_is_ready(Job *job)
>>>        return false;
>>>    }
>>>    -bool job_is_completed(Job *job)
>>> +bool job_is_ready(Job *job)
>>> +{
>>> +    JOB_LOCK_GUARD();
>>> +    return job_is_ready_locked(job);
>>> +}
>>> +
>>> +bool job_is_completed_locked(Job *job)
>>>    {
>>>        switch (job->status) {
>>>        case JOB_STATUS_UNDEFINED:
>>> @@ -292,6 +311,12 @@ bool job_is_completed(Job *job)
>>>        return false;
>>>    }
>>>    +bool job_is_completed(Job *job)
>>> +{
>>> +    JOB_LOCK_GUARD();
>>> +    return job_is_completed_locked(job);
>>> +}
>>> +
>>>    static bool job_started(Job *job)
>>>    {
>>>        return job->co;
>>> @@ -521,7 +546,8 @@ static void coroutine_fn job_do_yield(Job *job,
>>> uint64_t ns)
>>>        assert(job->busy);
>>>    }
>>>    -void coroutine_fn job_pause_point(Job *job)
>>> +/* Called with job_mutex held, but releases it temporarily. */
>>> +static void coroutine_fn job_pause_point_locked(Job *job)
>>>    {
>>>        assert(job && job_started(job));
>>
>> In this function, we should now use job_pause_point_locked(), otherwise
>> it looks incorrect. (I remember that lock is noop for now, but still,
>> let's keep think as correct as possible)
>>
> 
> I miss your point here. What is incorrect?

Function called with lock held. But it calls job_pause_point(), which do lock the mutex. This will deadlock. That doesn't deadlock only because our mutex is noop. That's why I say "it looks incorrect".

>>
>> And job_do_yield() takes lock by itself. How to resolve it?
> 
> You mean the real_job_lock/unlock taken in job_do_yield?

Yes.. Hmm, but we can consider real_job_lock as something other, that can be taken under job_lock.

> 
>>
>>>    @@ -552,6 +578,12 @@ void coroutine_fn job_pause_point(Job *job)
>>>        }
>>>    }
>>>    +void coroutine_fn job_pause_point(Job *job)
>>> +{
>>> +    JOB_LOCK_GUARD();
>>> +    job_pause_point_locked(job);
>>> +}
>>> +
>>>    void job_yield(Job *job)
>>>    {
>>>        assert(job->busy);
>>> @@ -949,11 +981,15 @@ static void job_completed(Job *job)
>>>        }
>>>    }
>>>    -/** Useful only as a type shim for aio_bh_schedule_oneshot. */
>>> +/**
>>> + * Useful only as a type shim for aio_bh_schedule_oneshot.
>>> + * Called with job_mutex *not* held.
>>> + */
>>>    static void job_exit(void *opaque)
>>>    {
>>>        Job *job = (Job *)opaque;
>>>        AioContext *ctx;
>>> +    JOB_LOCK_GUARD();
>>
>> That's not part of this patch.. Doesn't relate to "add _locked duplicates"
>>
>>>          job_ref(job);
>>>        aio_context_acquire(job->aio_context);
>>
>>
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 10/18] jobs: rename static functions called with job_mutex held
  2022-06-22 14:26     ` Emanuele Giuseppe Esposito
@ 2022-06-22 18:38       ` Vladimir Sementsov-Ogievskiy
  2022-06-23  9:08         ` Emanuele Giuseppe Esposito
  2022-06-28  7:40         ` Emanuele Giuseppe Esposito
  0 siblings, 2 replies; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-06-22 18:38 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block, Kevin Wolf
  Cc: Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

On 6/22/22 17:26, Emanuele Giuseppe Esposito wrote:
> 
> 
> Am 21/06/2022 um 19:26 schrieb Vladimir Sementsov-Ogievskiy:
>> On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
>>> With the*nop*  job_lock/unlock placed, rename the static
>>> functions that are always under job_mutex, adding "_locked" suffix.
>>>
>>> List of functions that get this suffix:
>>> job_txn_ref           job_txn_del_job
>>> job_txn_apply           job_state_transition
>>> job_should_pause       job_event_cancelled
>>> job_event_completed       job_event_pending
>>> job_event_ready           job_event_idle
>>> job_do_yield           job_timer_not_pending
>>> job_do_dismiss           job_conclude
>>> job_update_rc           job_commit
>>> job_abort           job_clean
>>> job_finalize_single       job_cancel_async
>>> job_completed_txn_abort       job_prepare
>>> job_needs_finalize       job_do_finalize
>>> job_transition_to_pending  job_completed_txn_success
>>> job_completed           job_cancel_err
>>> job_force_cancel_err
>>>
>>> Note that "locked" refers to the*nop*  job_lock/unlock, and not
>>> real_job_lock/unlock.
>>>
>>> No functional change intended.
>>>
>>> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
>>
>>
>> Hmm. Maybe it was already discussed.. But for me it seems, that it would
>> be simpler to review previous patches, that fix job_ API users to use
>> locking properly, if this renaming go earlier.
>>
>> Anyway, in this series, we can't update everything at once. So patch to
>> patch, we make the code more and more correct. (yes I remember that
>> lock() is a noop, but I should review thinking that it real, otherwise,
>> how to review?)
>>
>> So, I'm saying about formal correctness of using lock() unlock()
>> function in connection with introduced _locked prifixes and in
>> connection with how it should finally work.
>>
>> You do:
>>
>> 05. introduce some _locked functions, that just duplicates, and
>> job_pause_point_locked() is formally inconsistent, as I said.
>>
>> 06. Update a lot of places, to give them their final form (but not
>> final, as some functions will be renamed to _locked, some not, hard to
>> imagine)
>>
>> 07,08,09. Update some more, and even more places. very hard to track
>> formal correctness of using locks
>>
>> 10-...: rename APIs.
>>
>>
>> What do you think about the following:
>>
>> 1. Introduce noop lock, and some internal _locked() versions, and keep
>> formal consistency inside job.c, considering all public interfaces as
>> unlocked:
>>
>>   at this point:
>>    - everything correct inside job.c
>>    - no public interfaces with _locked prefix
>>    - all public interfaces take mutex internally
>>    - no external user take mutex by hand
>>
>> We can rename all internal static functions at this step too.
>>
>> 2. Introduce some public _locked APIs, that we'll use in next patches
>>
>> 3. Now start fixing external users in several patches:
>>      - protect by mutex direct use of job fields
>>    - make wider locks and move to _locked APIs inside them where needed
>>
>>
>> In this scenario, every updated unit becomes formally correct after
>> update, and after all steps everything is formally correct, and we can
>> move to turning-on the mutex.
>>
> 
> I don't understand your logic also here, sorry :(
> 
> I assume you want to keep patch 1-4, then the problem is assing job_lock
> and renaming functions in _locked.
> So I would say the problem is in patch 5-6-10-11-12-13. All the others
> should be self contained.
> 
> I understand patch 5 is a little hard to follow.
> 
> Now, I am not sure what you propose here but it seems that the end goal
> is to just have the same result, but with additional intermediate steps
> that are just "do this just because in the next patch will be useful".
> I think the problem is that we are going to miss the "why we need the
> lock" logic in the patches if we do so.
> 
> The logic I tried to convey in this order is the following:
> - job.h: add _locked duplicates for job API functions called with and
> without job_mutex
> 	Just create duplicates of functions
> 
> - jobs: protect jobs with job_lock/unlock
> 	QMP and monitor functions call APIs that assume lock is taken,
> 	drivers must take explicitly the lock
> 
> - jobs: rename static functions called with job_mutex held
> - job.h: rename job API functions called with job_mutex held
> - block_job: rename block_job functions called with job_mutex held
> 	*given* that some functions are always under lock, transform
> 	them in _locked. Requires the job_lock/unlock patch
> 
> - job.h: define unlocked functions
> 	Comments on the public functions that are not _locked
> 
> 
> @Kevin, since you also had some feedbacks on the patch ordering, do you
> agree with this ordering or you have some other ideas?
> 
> Following your suggestion, we could move patches 10-11-12-13 before
> patch 6 "jobs: protect jobs with job_lock/unlock".
> 
> (Apologies for changing my mind, but being the second complain I am
> starting to reconsider reordering the patches).
> 

In two words, what I mean: let's keep the following invariant from patch to patch:

1. Function that has _locked() prefix is always called with lock held
2. Function that has _locked() prefix never calls functions that take lock by themselves so that would dead-lock
3. Function that is documented as "called with lock not held" is never called with lock held

That what I mean by "formal correctness": yes, we know that lock is noop, but still let's keep code logic to correspond function naming and comments that we add.


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 10/18] jobs: rename static functions called with job_mutex held
  2022-06-22 18:38       ` Vladimir Sementsov-Ogievskiy
@ 2022-06-23  9:08         ` Emanuele Giuseppe Esposito
  2022-06-23 11:10           ` Vladimir Sementsov-Ogievskiy
  2022-06-28  7:40         ` Emanuele Giuseppe Esposito
  1 sibling, 1 reply; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-23  9:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, Kevin Wolf
  Cc: Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel



Am 22/06/2022 um 20:38 schrieb Vladimir Sementsov-Ogievskiy:
> On 6/22/22 17:26, Emanuele Giuseppe Esposito wrote:
>>
>>
>> Am 21/06/2022 um 19:26 schrieb Vladimir Sementsov-Ogievskiy:
>>> On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
>>>> With the*nop*  job_lock/unlock placed, rename the static
>>>> functions that are always under job_mutex, adding "_locked" suffix.
>>>>
>>>> List of functions that get this suffix:
>>>> job_txn_ref           job_txn_del_job
>>>> job_txn_apply           job_state_transition
>>>> job_should_pause       job_event_cancelled
>>>> job_event_completed       job_event_pending
>>>> job_event_ready           job_event_idle
>>>> job_do_yield           job_timer_not_pending
>>>> job_do_dismiss           job_conclude
>>>> job_update_rc           job_commit
>>>> job_abort           job_clean
>>>> job_finalize_single       job_cancel_async
>>>> job_completed_txn_abort       job_prepare
>>>> job_needs_finalize       job_do_finalize
>>>> job_transition_to_pending  job_completed_txn_success
>>>> job_completed           job_cancel_err
>>>> job_force_cancel_err
>>>>
>>>> Note that "locked" refers to the*nop*  job_lock/unlock, and not
>>>> real_job_lock/unlock.
>>>>
>>>> No functional change intended.
>>>>
>>>> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
>>>
>>>
>>> Hmm. Maybe it was already discussed.. But for me it seems, that it would
>>> be simpler to review previous patches, that fix job_ API users to use
>>> locking properly, if this renaming go earlier.
>>>
>>> Anyway, in this series, we can't update everything at once. So patch to
>>> patch, we make the code more and more correct. (yes I remember that
>>> lock() is a noop, but I should review thinking that it real, otherwise,
>>> how to review?)
>>>
>>> So, I'm saying about formal correctness of using lock() unlock()
>>> function in connection with introduced _locked prifixes and in
>>> connection with how it should finally work.
>>>
>>> You do:
>>>
>>> 05. introduce some _locked functions, that just duplicates, and
>>> job_pause_point_locked() is formally inconsistent, as I said.
>>>
>>> 06. Update a lot of places, to give them their final form (but not
>>> final, as some functions will be renamed to _locked, some not, hard to
>>> imagine)
>>>
>>> 07,08,09. Update some more, and even more places. very hard to track
>>> formal correctness of using locks
>>>
>>> 10-...: rename APIs.
>>>
>>>
>>> What do you think about the following:
>>>
>>> 1. Introduce noop lock, and some internal _locked() versions, and keep
>>> formal consistency inside job.c, considering all public interfaces as
>>> unlocked:
>>>
>>>   at this point:
>>>    - everything correct inside job.c
>>>    - no public interfaces with _locked prefix
>>>    - all public interfaces take mutex internally
>>>    - no external user take mutex by hand
>>>
>>> We can rename all internal static functions at this step too.
>>>
>>> 2. Introduce some public _locked APIs, that we'll use in next patches
>>>
>>> 3. Now start fixing external users in several patches:
>>>      - protect by mutex direct use of job fields
>>>    - make wider locks and move to _locked APIs inside them where needed
>>>
>>>
>>> In this scenario, every updated unit becomes formally correct after
>>> update, and after all steps everything is formally correct, and we can
>>> move to turning-on the mutex.
>>>
>>
>> I don't understand your logic also here, sorry :(
>>
>> I assume you want to keep patch 1-4, then the problem is assing job_lock
>> and renaming functions in _locked.
>> So I would say the problem is in patch 5-6-10-11-12-13. All the others
>> should be self contained.
>>
>> I understand patch 5 is a little hard to follow.
>>
>> Now, I am not sure what you propose here but it seems that the end goal
>> is to just have the same result, but with additional intermediate steps
>> that are just "do this just because in the next patch will be useful".
>> I think the problem is that we are going to miss the "why we need the
>> lock" logic in the patches if we do so.
>>
>> The logic I tried to convey in this order is the following:
>> - job.h: add _locked duplicates for job API functions called with and
>> without job_mutex
>>     Just create duplicates of functions
>>
>> - jobs: protect jobs with job_lock/unlock
>>     QMP and monitor functions call APIs that assume lock is taken,
>>     drivers must take explicitly the lock
>>
>> - jobs: rename static functions called with job_mutex held
>> - job.h: rename job API functions called with job_mutex held
>> - block_job: rename block_job functions called with job_mutex held
>>     *given* that some functions are always under lock, transform
>>     them in _locked. Requires the job_lock/unlock patch
>>
>> - job.h: define unlocked functions
>>     Comments on the public functions that are not _locked
>>
>>
>> @Kevin, since you also had some feedbacks on the patch ordering, do you
>> agree with this ordering or you have some other ideas?
>>
>> Following your suggestion, we could move patches 10-11-12-13 before
>> patch 6 "jobs: protect jobs with job_lock/unlock".
>>
>> (Apologies for changing my mind, but being the second complain I am
>> starting to reconsider reordering the patches).
>>
> 
> In two words, what I mean: let's keep the following invariant from patch
> to patch:
> 
> 1. Function that has _locked() prefix is always called with lock held
> 2. Function that has _locked() prefix never calls functions that take
> lock by themselves so that would dead-lock
> 3. Function that is documented as "called with lock not held" is never
> called with lock held
> 
> That what I mean by "formal correctness": yes, we know that lock is
> noop, but still let's keep code logic to correspond function naming and
> comments that we add.
> 

Ok I get what you mean, but then we have useless changes for public
functions that eventually will only be _locked() like job_next_locked:

The function is always called in a loop, so it is pointless to take the
lock inside. Therefore the patch would be "incorrect" on its own anyways.

Then, we would have a patch where we add the lock guard inside, and
another one where we remove it and rename to _locked and take the lock
outside. Seems unnecessary to me.

Again, I understand it is difficult to review as it is now, but this
won't make it better IMO.

Thank you,
Emanuele



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 10/18] jobs: rename static functions called with job_mutex held
  2022-06-23  9:08         ` Emanuele Giuseppe Esposito
@ 2022-06-23 11:10           ` Vladimir Sementsov-Ogievskiy
  2022-06-23 11:19             ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-06-23 11:10 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block, Kevin Wolf
  Cc: Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

On 6/23/22 12:08, Emanuele Giuseppe Esposito wrote:
> 
> 
> Am 22/06/2022 um 20:38 schrieb Vladimir Sementsov-Ogievskiy:
>> On 6/22/22 17:26, Emanuele Giuseppe Esposito wrote:
>>>
>>>
>>> Am 21/06/2022 um 19:26 schrieb Vladimir Sementsov-Ogievskiy:
>>>> On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
>>>>> With the*nop*  job_lock/unlock placed, rename the static
>>>>> functions that are always under job_mutex, adding "_locked" suffix.
>>>>>
>>>>> List of functions that get this suffix:
>>>>> job_txn_ref           job_txn_del_job
>>>>> job_txn_apply           job_state_transition
>>>>> job_should_pause       job_event_cancelled
>>>>> job_event_completed       job_event_pending
>>>>> job_event_ready           job_event_idle
>>>>> job_do_yield           job_timer_not_pending
>>>>> job_do_dismiss           job_conclude
>>>>> job_update_rc           job_commit
>>>>> job_abort           job_clean
>>>>> job_finalize_single       job_cancel_async
>>>>> job_completed_txn_abort       job_prepare
>>>>> job_needs_finalize       job_do_finalize
>>>>> job_transition_to_pending  job_completed_txn_success
>>>>> job_completed           job_cancel_err
>>>>> job_force_cancel_err
>>>>>
>>>>> Note that "locked" refers to the*nop*  job_lock/unlock, and not
>>>>> real_job_lock/unlock.
>>>>>
>>>>> No functional change intended.
>>>>>
>>>>> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
>>>>
>>>>
>>>> Hmm. Maybe it was already discussed.. But for me it seems, that it would
>>>> be simpler to review previous patches, that fix job_ API users to use
>>>> locking properly, if this renaming go earlier.
>>>>
>>>> Anyway, in this series, we can't update everything at once. So patch to
>>>> patch, we make the code more and more correct. (yes I remember that
>>>> lock() is a noop, but I should review thinking that it real, otherwise,
>>>> how to review?)
>>>>
>>>> So, I'm saying about formal correctness of using lock() unlock()
>>>> function in connection with introduced _locked prifixes and in
>>>> connection with how it should finally work.
>>>>
>>>> You do:
>>>>
>>>> 05. introduce some _locked functions, that just duplicates, and
>>>> job_pause_point_locked() is formally inconsistent, as I said.
>>>>
>>>> 06. Update a lot of places, to give them their final form (but not
>>>> final, as some functions will be renamed to _locked, some not, hard to
>>>> imagine)
>>>>
>>>> 07,08,09. Update some more, and even more places. very hard to track
>>>> formal correctness of using locks
>>>>
>>>> 10-...: rename APIs.
>>>>
>>>>
>>>> What do you think about the following:
>>>>
>>>> 1. Introduce noop lock, and some internal _locked() versions, and keep
>>>> formal consistency inside job.c, considering all public interfaces as
>>>> unlocked:
>>>>
>>>>    at this point:
>>>>     - everything correct inside job.c
>>>>     - no public interfaces with _locked prefix
>>>>     - all public interfaces take mutex internally
>>>>     - no external user take mutex by hand
>>>>
>>>> We can rename all internal static functions at this step too.
>>>>
>>>> 2. Introduce some public _locked APIs, that we'll use in next patches
>>>>
>>>> 3. Now start fixing external users in several patches:
>>>>       - protect by mutex direct use of job fields
>>>>     - make wider locks and move to _locked APIs inside them where needed
>>>>
>>>>
>>>> In this scenario, every updated unit becomes formally correct after
>>>> update, and after all steps everything is formally correct, and we can
>>>> move to turning-on the mutex.
>>>>
>>>
>>> I don't understand your logic also here, sorry :(
>>>
>>> I assume you want to keep patch 1-4, then the problem is assing job_lock
>>> and renaming functions in _locked.
>>> So I would say the problem is in patch 5-6-10-11-12-13. All the others
>>> should be self contained.
>>>
>>> I understand patch 5 is a little hard to follow.
>>>
>>> Now, I am not sure what you propose here but it seems that the end goal
>>> is to just have the same result, but with additional intermediate steps
>>> that are just "do this just because in the next patch will be useful".
>>> I think the problem is that we are going to miss the "why we need the
>>> lock" logic in the patches if we do so.
>>>
>>> The logic I tried to convey in this order is the following:
>>> - job.h: add _locked duplicates for job API functions called with and
>>> without job_mutex
>>>      Just create duplicates of functions
>>>
>>> - jobs: protect jobs with job_lock/unlock
>>>      QMP and monitor functions call APIs that assume lock is taken,
>>>      drivers must take explicitly the lock
>>>
>>> - jobs: rename static functions called with job_mutex held
>>> - job.h: rename job API functions called with job_mutex held
>>> - block_job: rename block_job functions called with job_mutex held
>>>      *given* that some functions are always under lock, transform
>>>      them in _locked. Requires the job_lock/unlock patch
>>>
>>> - job.h: define unlocked functions
>>>      Comments on the public functions that are not _locked
>>>
>>>
>>> @Kevin, since you also had some feedbacks on the patch ordering, do you
>>> agree with this ordering or you have some other ideas?
>>>
>>> Following your suggestion, we could move patches 10-11-12-13 before
>>> patch 6 "jobs: protect jobs with job_lock/unlock".
>>>
>>> (Apologies for changing my mind, but being the second complain I am
>>> starting to reconsider reordering the patches).
>>>
>>
>> In two words, what I mean: let's keep the following invariant from patch
>> to patch:
>>
>> 1. Function that has _locked() prefix is always called with lock held
>> 2. Function that has _locked() prefix never calls functions that take
>> lock by themselves so that would dead-lock
>> 3. Function that is documented as "called with lock not held" is never
>> called with lock held
>>
>> That what I mean by "formal correctness": yes, we know that lock is
>> noop, but still let's keep code logic to correspond function naming and
>> comments that we add.
>>
> 
> Ok I get what you mean, but then we have useless changes for public
> functions that eventually will only be _locked() like job_next_locked:
> 
> The function is always called in a loop, so it is pointless to take the
> lock inside. Therefore the patch would be "incorrect" on its own anyways.
> 
> Then, we would have a patch where we add the lock guard inside, and
> another one where we remove it and rename to _locked and take the lock
> outside. Seems unnecessary to me.

For me it looks a bit simpler than you describe. And anyway keeping the correctness from patch to patch worth the complexity. I'll give an argument.

First what is the best practices? Best practices is when every patch is good and absolutely correct. So that you can apply any number of patches from the beginning of the series (01-NN), commit them to master and this will break neither compilation, nor tests, nor readability, nothing. This makes the review process iterable: if I'm OK with patches 01-03, I give them r-b and don't think about them. I don't have to keep in mind any tricky things. And I can review 04 several days later not rereading 01-03 (or at least I can consider applied 01-03 as a good correct base state). This way I'm sure, that if I reviewed all patches one-by-one, each one is correct, then the whole thing is correct.

A lot harder to review when we have only collective correctness: the whole series being applied make a correct thing, but we can't say it about intermediate states. In your series we can't be absolutely correct with each patch, as we have to switch from aio-context lock to mutex in one patch, that's why mutex is added as noop. That's a reasonable and (seems) unsolvable drawback. That's a thing I have to keep in mind during the whole review. But I'd prefer not add more such things, like comments and _locked suffixes that don't correspond to the code.

With the invariant that I propose, the following logic works:

If
    1. we keep the invariant from patch to patch
    AND
    2. at the end we have updated all users of the internal and external APIs, not missed some file or function
Then everything is correct at the end.

Without the invariant I can't prove that everything is correct at the end, as it is hard to follow the degree of correctness from patch to patch. In your way the only invariant that we have from patch to patch, is that mutex is noop, so all changes do nothing, and therefore they are correct. This way I can give an r-b to all such patches not thinking about details, they are noop. But when I finally have to review the patch that turns on the mutex, I'll have to recheck all internal and external API users, which is equivalent to review all the changes merged into one patch.



Consider the case with job_next. The most correct way to update it IMHO:

1. Add lock inside job_next() and add job_next_locked() - in one patch with other similar changes of job.c and job.h.

At this moment we have job_next() calls in a loop, which is not good (we want larger critical section), but that doesn't break the invariant I proposed above.

2. Update the loop: add larger critical section and switch to job_next_locked().

What is good here: we don't  need to unite updates of external API users into one patch, we can update file-by-file or subsystem-by-sybsystem.

3. Delete unused job_next() API


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 10/18] jobs: rename static functions called with job_mutex held
  2022-06-23 11:10           ` Vladimir Sementsov-Ogievskiy
@ 2022-06-23 11:19             ` Emanuele Giuseppe Esposito
  2022-06-23 11:58               ` Vladimir Sementsov-Ogievskiy
  2022-06-24 14:29               ` Kevin Wolf
  0 siblings, 2 replies; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-23 11:19 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, Kevin Wolf
  Cc: Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel



Am 23/06/2022 um 13:10 schrieb Vladimir Sementsov-Ogievskiy:
> On 6/23/22 12:08, Emanuele Giuseppe Esposito wrote:
>>
>>
>> Am 22/06/2022 um 20:38 schrieb Vladimir Sementsov-Ogievskiy:
>>> On 6/22/22 17:26, Emanuele Giuseppe Esposito wrote:
>>>>
>>>>
>>>> Am 21/06/2022 um 19:26 schrieb Vladimir Sementsov-Ogievskiy:
>>>>> On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
>>>>>> With the*nop*  job_lock/unlock placed, rename the static
>>>>>> functions that are always under job_mutex, adding "_locked" suffix.
>>>>>>
>>>>>> List of functions that get this suffix:
>>>>>> job_txn_ref           job_txn_del_job
>>>>>> job_txn_apply           job_state_transition
>>>>>> job_should_pause       job_event_cancelled
>>>>>> job_event_completed       job_event_pending
>>>>>> job_event_ready           job_event_idle
>>>>>> job_do_yield           job_timer_not_pending
>>>>>> job_do_dismiss           job_conclude
>>>>>> job_update_rc           job_commit
>>>>>> job_abort           job_clean
>>>>>> job_finalize_single       job_cancel_async
>>>>>> job_completed_txn_abort       job_prepare
>>>>>> job_needs_finalize       job_do_finalize
>>>>>> job_transition_to_pending  job_completed_txn_success
>>>>>> job_completed           job_cancel_err
>>>>>> job_force_cancel_err
>>>>>>
>>>>>> Note that "locked" refers to the*nop*  job_lock/unlock, and not
>>>>>> real_job_lock/unlock.
>>>>>>
>>>>>> No functional change intended.
>>>>>>
>>>>>> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
>>>>>
>>>>>
>>>>> Hmm. Maybe it was already discussed.. But for me it seems, that it
>>>>> would
>>>>> be simpler to review previous patches, that fix job_ API users to use
>>>>> locking properly, if this renaming go earlier.
>>>>>
>>>>> Anyway, in this series, we can't update everything at once. So
>>>>> patch to
>>>>> patch, we make the code more and more correct. (yes I remember that
>>>>> lock() is a noop, but I should review thinking that it real,
>>>>> otherwise,
>>>>> how to review?)
>>>>>
>>>>> So, I'm saying about formal correctness of using lock() unlock()
>>>>> function in connection with introduced _locked prifixes and in
>>>>> connection with how it should finally work.
>>>>>
>>>>> You do:
>>>>>
>>>>> 05. introduce some _locked functions, that just duplicates, and
>>>>> job_pause_point_locked() is formally inconsistent, as I said.
>>>>>
>>>>> 06. Update a lot of places, to give them their final form (but not
>>>>> final, as some functions will be renamed to _locked, some not, hard to
>>>>> imagine)
>>>>>
>>>>> 07,08,09. Update some more, and even more places. very hard to track
>>>>> formal correctness of using locks
>>>>>
>>>>> 10-...: rename APIs.
>>>>>
>>>>>
>>>>> What do you think about the following:
>>>>>
>>>>> 1. Introduce noop lock, and some internal _locked() versions, and keep
>>>>> formal consistency inside job.c, considering all public interfaces as
>>>>> unlocked:
>>>>>
>>>>>    at this point:
>>>>>     - everything correct inside job.c
>>>>>     - no public interfaces with _locked prefix
>>>>>     - all public interfaces take mutex internally
>>>>>     - no external user take mutex by hand
>>>>>
>>>>> We can rename all internal static functions at this step too.
>>>>>
>>>>> 2. Introduce some public _locked APIs, that we'll use in next patches
>>>>>
>>>>> 3. Now start fixing external users in several patches:
>>>>>       - protect by mutex direct use of job fields
>>>>>     - make wider locks and move to _locked APIs inside them where
>>>>> needed
>>>>>
>>>>>
>>>>> In this scenario, every updated unit becomes formally correct after
>>>>> update, and after all steps everything is formally correct, and we can
>>>>> move to turning-on the mutex.
>>>>>
>>>>
>>>> I don't understand your logic also here, sorry :(
>>>>
>>>> I assume you want to keep patch 1-4, then the problem is assing
>>>> job_lock
>>>> and renaming functions in _locked.
>>>> So I would say the problem is in patch 5-6-10-11-12-13. All the others
>>>> should be self contained.
>>>>
>>>> I understand patch 5 is a little hard to follow.
>>>>
>>>> Now, I am not sure what you propose here but it seems that the end goal
>>>> is to just have the same result, but with additional intermediate steps
>>>> that are just "do this just because in the next patch will be useful".
>>>> I think the problem is that we are going to miss the "why we need the
>>>> lock" logic in the patches if we do so.
>>>>
>>>> The logic I tried to convey in this order is the following:
>>>> - job.h: add _locked duplicates for job API functions called with and
>>>> without job_mutex
>>>>      Just create duplicates of functions
>>>>
>>>> - jobs: protect jobs with job_lock/unlock
>>>>      QMP and monitor functions call APIs that assume lock is taken,
>>>>      drivers must take explicitly the lock
>>>>
>>>> - jobs: rename static functions called with job_mutex held
>>>> - job.h: rename job API functions called with job_mutex held
>>>> - block_job: rename block_job functions called with job_mutex held
>>>>      *given* that some functions are always under lock, transform
>>>>      them in _locked. Requires the job_lock/unlock patch
>>>>
>>>> - job.h: define unlocked functions
>>>>      Comments on the public functions that are not _locked
>>>>
>>>>
>>>> @Kevin, since you also had some feedbacks on the patch ordering, do you
>>>> agree with this ordering or you have some other ideas?
>>>>
>>>> Following your suggestion, we could move patches 10-11-12-13 before
>>>> patch 6 "jobs: protect jobs with job_lock/unlock".
>>>>
>>>> (Apologies for changing my mind, but being the second complain I am
>>>> starting to reconsider reordering the patches).
>>>>
>>>
>>> In two words, what I mean: let's keep the following invariant from patch
>>> to patch:
>>>
>>> 1. Function that has _locked() prefix is always called with lock held
>>> 2. Function that has _locked() prefix never calls functions that take
>>> lock by themselves so that would dead-lock
>>> 3. Function that is documented as "called with lock not held" is never
>>> called with lock held
>>>
>>> That what I mean by "formal correctness": yes, we know that lock is
>>> noop, but still let's keep code logic to correspond function naming and
>>> comments that we add.
>>>
>>
>> Ok I get what you mean, but then we have useless changes for public
>> functions that eventually will only be _locked() like job_next_locked:
>>
>> The function is always called in a loop, so it is pointless to take the
>> lock inside. Therefore the patch would be "incorrect" on its own anyways.
>>
>> Then, we would have a patch where we add the lock guard inside, and
>> another one where we remove it and rename to _locked and take the lock
>> outside. Seems unnecessary to me.
> 
> For me it looks a bit simpler than you describe. And anyway keeping the
> correctness from patch to patch worth the complexity. I'll give an
> argument.
> 
> First what is the best practices? Best practices is when every patch is
> good and absolutely correct. So that you can apply any number of patches
> from the beginning of the series (01-NN), commit them to master and this
> will break neither compilation, nor tests, nor readability, nothing.
> This makes the review process iterable: if I'm OK with patches 01-03, I
> give them r-b and don't think about them. I don't have to keep in mind
> any tricky things. And I can review 04 several days later not rereading
> 01-03 (or at least I can consider applied 01-03 as a good correct base
> state). This way I'm sure, that if I reviewed all patches one-by-one,
> each one is correct, then the whole thing is correct.
> 
> A lot harder to review when we have only collective correctness: the
> whole series being applied make a correct thing, but we can't say it
> about intermediate states. In your series we can't be absolutely correct
> with each patch, as we have to switch from aio-context lock to mutex in
> one patch, that's why mutex is added as noop. That's a reasonable and
> (seems) unsolvable drawback. That's a thing I have to keep in mind
> during the whole review. But I'd prefer not add more such things, like
> comments and _locked suffixes that don't correspond to the code.
> 
> With the invariant that I propose, the following logic works:
> 
> If
>    1. we keep the invariant from patch to patch
>    AND
>    2. at the end we have updated all users of the internal and external
> APIs, not missed some file or function
> Then everything is correct at the end.
> 
> Without the invariant I can't prove that everything is correct at the
> end, as it is hard to follow the degree of correctness from patch to
> patch. In your way the only invariant that we have from patch to patch,
> is that mutex is noop, so all changes do nothing, and therefore they are
> correct. This way I can give an r-b to all such patches not thinking
> about details, they are noop. But when I finally have to review the
> patch that turns on the mutex, I'll have to recheck all internal and
> external API users, which is equivalent to review all the changes merged
> into one patch.
> 
> 
> 
> Consider the case with job_next. The most correct way to update it IMHO:
> 
> 1. Add lock inside job_next() and add job_next_locked() - in one patch
> with other similar changes of job.c and job.h.
> 
> At this moment we have job_next() calls in a loop, which is not good (we
> want larger critical section), but that doesn't break the invariant I
> proposed above.

The only thing I am pointing here is that this breaks "readability",
meaning if someone bisects here it will find a very weird situation
(aside from the fact that there is a noop lock).

But I guess this is fine, as long as I write it in the commit message.

And since these patch are waiting here for more than 3 months now, I
would say if the others (Kevin?) agree I will change the order with what
you proposed here.

Emanuele

> 
> 2. Update the loop: add larger critical section and switch to
> job_next_locked().
> 
> What is good here: we don't  need to unite updates of external API users
> into one patch, we can update file-by-file or subsystem-by-sybsystem.
> 
> 3. Delete unused job_next() API
> 
> 



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 10/18] jobs: rename static functions called with job_mutex held
  2022-06-23 11:19             ` Emanuele Giuseppe Esposito
@ 2022-06-23 11:58               ` Vladimir Sementsov-Ogievskiy
  2022-06-24 14:29               ` Kevin Wolf
  1 sibling, 0 replies; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-06-23 11:58 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block, Kevin Wolf
  Cc: Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

On 6/23/22 14:19, Emanuele Giuseppe Esposito wrote:
> 
> Am 23/06/2022 um 13:10 schrieb Vladimir Sementsov-Ogievskiy:
>> On 6/23/22 12:08, Emanuele Giuseppe Esposito wrote:
>>>
>>> Am 22/06/2022 um 20:38 schrieb Vladimir Sementsov-Ogievskiy:
>>>> On 6/22/22 17:26, Emanuele Giuseppe Esposito wrote:
>>>>>
>>>>> Am 21/06/2022 um 19:26 schrieb Vladimir Sementsov-Ogievskiy:
>>>>>> On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
>>>>>>> With the*nop*  job_lock/unlock placed, rename the static
>>>>>>> functions that are always under job_mutex, adding "_locked" suffix.
>>>>>>>
>>>>>>> List of functions that get this suffix:
>>>>>>> job_txn_ref           job_txn_del_job
>>>>>>> job_txn_apply           job_state_transition
>>>>>>> job_should_pause       job_event_cancelled
>>>>>>> job_event_completed       job_event_pending
>>>>>>> job_event_ready           job_event_idle
>>>>>>> job_do_yield           job_timer_not_pending
>>>>>>> job_do_dismiss           job_conclude
>>>>>>> job_update_rc           job_commit
>>>>>>> job_abort           job_clean
>>>>>>> job_finalize_single       job_cancel_async
>>>>>>> job_completed_txn_abort       job_prepare
>>>>>>> job_needs_finalize       job_do_finalize
>>>>>>> job_transition_to_pending  job_completed_txn_success
>>>>>>> job_completed           job_cancel_err
>>>>>>> job_force_cancel_err
>>>>>>>
>>>>>>> Note that "locked" refers to the*nop*  job_lock/unlock, and not
>>>>>>> real_job_lock/unlock.
>>>>>>>
>>>>>>> No functional change intended.
>>>>>>>
>>>>>>> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
>>>>>>
>>>>>> Hmm. Maybe it was already discussed.. But for me it seems, that it
>>>>>> would
>>>>>> be simpler to review previous patches, that fix job_ API users to use
>>>>>> locking properly, if this renaming go earlier.
>>>>>>
>>>>>> Anyway, in this series, we can't update everything at once. So
>>>>>> patch to
>>>>>> patch, we make the code more and more correct. (yes I remember that
>>>>>> lock() is a noop, but I should review thinking that it real,
>>>>>> otherwise,
>>>>>> how to review?)
>>>>>>
>>>>>> So, I'm saying about formal correctness of using lock() unlock()
>>>>>> function in connection with introduced _locked prifixes and in
>>>>>> connection with how it should finally work.
>>>>>>
>>>>>> You do:
>>>>>>
>>>>>> 05. introduce some _locked functions, that just duplicates, and
>>>>>> job_pause_point_locked() is formally inconsistent, as I said.
>>>>>>
>>>>>> 06. Update a lot of places, to give them their final form (but not
>>>>>> final, as some functions will be renamed to _locked, some not, hard to
>>>>>> imagine)
>>>>>>
>>>>>> 07,08,09. Update some more, and even more places. very hard to track
>>>>>> formal correctness of using locks
>>>>>>
>>>>>> 10-...: rename APIs.
>>>>>>
>>>>>>
>>>>>> What do you think about the following:
>>>>>>
>>>>>> 1. Introduce noop lock, and some internal _locked() versions, and keep
>>>>>> formal consistency inside job.c, considering all public interfaces as
>>>>>> unlocked:
>>>>>>
>>>>>>     at this point:
>>>>>>      - everything correct inside job.c
>>>>>>      - no public interfaces with _locked prefix
>>>>>>      - all public interfaces take mutex internally
>>>>>>      - no external user take mutex by hand
>>>>>>
>>>>>> We can rename all internal static functions at this step too.
>>>>>>
>>>>>> 2. Introduce some public _locked APIs, that we'll use in next patches
>>>>>>
>>>>>> 3. Now start fixing external users in several patches:
>>>>>>        - protect by mutex direct use of job fields
>>>>>>      - make wider locks and move to _locked APIs inside them where
>>>>>> needed
>>>>>>
>>>>>>
>>>>>> In this scenario, every updated unit becomes formally correct after
>>>>>> update, and after all steps everything is formally correct, and we can
>>>>>> move to turning-on the mutex.
>>>>>>
>>>>> I don't understand your logic also here, sorry:(
>>>>>
>>>>> I assume you want to keep patch 1-4, then the problem is assing
>>>>> job_lock
>>>>> and renaming functions in _locked.
>>>>> So I would say the problem is in patch 5-6-10-11-12-13. All the others
>>>>> should be self contained.
>>>>>
>>>>> I understand patch 5 is a little hard to follow.
>>>>>
>>>>> Now, I am not sure what you propose here but it seems that the end goal
>>>>> is to just have the same result, but with additional intermediate steps
>>>>> that are just "do this just because in the next patch will be useful".
>>>>> I think the problem is that we are going to miss the "why we need the
>>>>> lock" logic in the patches if we do so.
>>>>>
>>>>> The logic I tried to convey in this order is the following:
>>>>> - job.h: add _locked duplicates for job API functions called with and
>>>>> without job_mutex
>>>>>       Just create duplicates of functions
>>>>>
>>>>> - jobs: protect jobs with job_lock/unlock
>>>>>       QMP and monitor functions call APIs that assume lock is taken,
>>>>>       drivers must take explicitly the lock
>>>>>
>>>>> - jobs: rename static functions called with job_mutex held
>>>>> - job.h: rename job API functions called with job_mutex held
>>>>> - block_job: rename block_job functions called with job_mutex held
>>>>>       *given*  that some functions are always under lock, transform
>>>>>       them in _locked. Requires the job_lock/unlock patch
>>>>>
>>>>> - job.h: define unlocked functions
>>>>>       Comments on the public functions that are not _locked
>>>>>
>>>>>
>>>>> @Kevin, since you also had some feedbacks on the patch ordering, do you
>>>>> agree with this ordering or you have some other ideas?
>>>>>
>>>>> Following your suggestion, we could move patches 10-11-12-13 before
>>>>> patch 6 "jobs: protect jobs with job_lock/unlock".
>>>>>
>>>>> (Apologies for changing my mind, but being the second complain I am
>>>>> starting to reconsider reordering the patches).
>>>>>
>>>> In two words, what I mean: let's keep the following invariant from patch
>>>> to patch:
>>>>
>>>> 1. Function that has _locked() prefix is always called with lock held
>>>> 2. Function that has _locked() prefix never calls functions that take
>>>> lock by themselves so that would dead-lock
>>>> 3. Function that is documented as "called with lock not held" is never
>>>> called with lock held
>>>>
>>>> That what I mean by "formal correctness": yes, we know that lock is
>>>> noop, but still let's keep code logic to correspond function naming and
>>>> comments that we add.
>>>>
>>> Ok I get what you mean, but then we have useless changes for public
>>> functions that eventually will only be _locked() like job_next_locked:
>>>
>>> The function is always called in a loop, so it is pointless to take the
>>> lock inside. Therefore the patch would be "incorrect" on its own anyways.
>>>
>>> Then, we would have a patch where we add the lock guard inside, and
>>> another one where we remove it and rename to _locked and take the lock
>>> outside. Seems unnecessary to me.
>> For me it looks a bit simpler than you describe. And anyway keeping the
>> correctness from patch to patch worth the complexity. I'll give an
>> argument.
>>
>> First what is the best practices? Best practices is when every patch is
>> good and absolutely correct. So that you can apply any number of patches
>> from the beginning of the series (01-NN), commit them to master and this
>> will break neither compilation, nor tests, nor readability, nothing.
>> This makes the review process iterable: if I'm OK with patches 01-03, I
>> give them r-b and don't think about them. I don't have to keep in mind
>> any tricky things. And I can review 04 several days later not rereading
>> 01-03 (or at least I can consider applied 01-03 as a good correct base
>> state). This way I'm sure, that if I reviewed all patches one-by-one,
>> each one is correct, then the whole thing is correct.
>>
>> A lot harder to review when we have only collective correctness: the
>> whole series being applied make a correct thing, but we can't say it
>> about intermediate states. In your series we can't be absolutely correct
>> with each patch, as we have to switch from aio-context lock to mutex in
>> one patch, that's why mutex is added as noop. That's a reasonable and
>> (seems) unsolvable drawback. That's a thing I have to keep in mind
>> during the whole review. But I'd prefer not add more such things, like
>> comments and _locked suffixes that don't correspond to the code.
>>
>> With the invariant that I propose, the following logic works:
>>
>> If
>>     1. we keep the invariant from patch to patch
>>     AND
>>     2. at the end we have updated all users of the internal and external
>> APIs, not missed some file or function
>> Then everything is correct at the end.
>>
>> Without the invariant I can't prove that everything is correct at the
>> end, as it is hard to follow the degree of correctness from patch to
>> patch. In your way the only invariant that we have from patch to patch,
>> is that mutex is noop, so all changes do nothing, and therefore they are
>> correct. This way I can give an r-b to all such patches not thinking
>> about details, they are noop. But when I finally have to review the
>> patch that turns on the mutex, I'll have to recheck all internal and
>> external API users, which is equivalent to review all the changes merged
>> into one patch.
>>
>>
>>
>> Consider the case with job_next. The most correct way to update it IMHO:
>>
>> 1. Add lock inside job_next() and add job_next_locked() - in one patch
>> with other similar changes of job.c and job.h.
>>
>> At this moment we have job_next() calls in a loop, which is not good (we
>> want larger critical section), but that doesn't break the invariant I
>> proposed above.
> The only thing I am pointing here is that this breaks "readability",
> meaning if someone bisects here it will find a very weird situation
> (aside from the fact that there is a noop lock).

IHMO, calling job_next() in a loop, when job_next() takes mutex internally is still more readable and more correct than when we directly break _locked() prefix semantics and comments about "held / not held" that we add.

But I understand that arguing about what is more correct and what is less correct is not correct itself. In math something that is a bit incorrect is considered absolutely incorrect)

> 
> But I guess this is fine, as long as I write it in the commit message.
> 
> And since these patch are waiting here for more than 3 months now, I

I understand this :/ Feel unfair to require conceptual changes at stage of v7. If needed, I can take part in preparing v8.

> would say if the others (Kevin?) agree I will change the order with what
> you proposed here.

Yes, would be great to have more opinions


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 10/18] jobs: rename static functions called with job_mutex held
  2022-06-23 11:19             ` Emanuele Giuseppe Esposito
  2022-06-23 11:58               ` Vladimir Sementsov-Ogievskiy
@ 2022-06-24 14:29               ` Kevin Wolf
  2022-06-24 15:28                 ` Paolo Bonzini
  1 sibling, 1 reply; 48+ messages in thread
From: Kevin Wolf @ 2022-06-24 14:29 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, Hanna Reitz,
	Paolo Bonzini, John Snow, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Markus Armbruster, Stefan Hajnoczi,
	Fam Zheng, qemu-devel

Am 23.06.2022 um 13:19 hat Emanuele Giuseppe Esposito geschrieben:
> 
> 
> Am 23/06/2022 um 13:10 schrieb Vladimir Sementsov-Ogievskiy:
> > On 6/23/22 12:08, Emanuele Giuseppe Esposito wrote:
> >>
> >>
> >> Am 22/06/2022 um 20:38 schrieb Vladimir Sementsov-Ogievskiy:
> >>> On 6/22/22 17:26, Emanuele Giuseppe Esposito wrote:
> >>>>
> >>>>
> >>>> Am 21/06/2022 um 19:26 schrieb Vladimir Sementsov-Ogievskiy:
> >>>>> On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
> >>>>>> With the*nop*  job_lock/unlock placed, rename the static
> >>>>>> functions that are always under job_mutex, adding "_locked" suffix.
> >>>>>>
> >>>>>> List of functions that get this suffix:
> >>>>>> job_txn_ref           job_txn_del_job
> >>>>>> job_txn_apply           job_state_transition
> >>>>>> job_should_pause       job_event_cancelled
> >>>>>> job_event_completed       job_event_pending
> >>>>>> job_event_ready           job_event_idle
> >>>>>> job_do_yield           job_timer_not_pending
> >>>>>> job_do_dismiss           job_conclude
> >>>>>> job_update_rc           job_commit
> >>>>>> job_abort           job_clean
> >>>>>> job_finalize_single       job_cancel_async
> >>>>>> job_completed_txn_abort       job_prepare
> >>>>>> job_needs_finalize       job_do_finalize
> >>>>>> job_transition_to_pending  job_completed_txn_success
> >>>>>> job_completed           job_cancel_err
> >>>>>> job_force_cancel_err
> >>>>>>
> >>>>>> Note that "locked" refers to the*nop*  job_lock/unlock, and not
> >>>>>> real_job_lock/unlock.
> >>>>>>
> >>>>>> No functional change intended.
> >>>>>>
> >>>>>> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
> >>>>>
> >>>>>
> >>>>> Hmm. Maybe it was already discussed.. But for me it seems, that it
> >>>>> would
> >>>>> be simpler to review previous patches, that fix job_ API users to use
> >>>>> locking properly, if this renaming go earlier.
> >>>>>
> >>>>> Anyway, in this series, we can't update everything at once. So
> >>>>> patch to
> >>>>> patch, we make the code more and more correct. (yes I remember that
> >>>>> lock() is a noop, but I should review thinking that it real,
> >>>>> otherwise,
> >>>>> how to review?)
> >>>>>
> >>>>> So, I'm saying about formal correctness of using lock() unlock()
> >>>>> function in connection with introduced _locked prifixes and in
> >>>>> connection with how it should finally work.
> >>>>>
> >>>>> You do:
> >>>>>
> >>>>> 05. introduce some _locked functions, that just duplicates, and
> >>>>> job_pause_point_locked() is formally inconsistent, as I said.
> >>>>>
> >>>>> 06. Update a lot of places, to give them their final form (but not
> >>>>> final, as some functions will be renamed to _locked, some not, hard to
> >>>>> imagine)
> >>>>>
> >>>>> 07,08,09. Update some more, and even more places. very hard to track
> >>>>> formal correctness of using locks
> >>>>>
> >>>>> 10-...: rename APIs.
> >>>>>
> >>>>>
> >>>>> What do you think about the following:
> >>>>>
> >>>>> 1. Introduce noop lock, and some internal _locked() versions, and keep
> >>>>> formal consistency inside job.c, considering all public interfaces as
> >>>>> unlocked:
> >>>>>
> >>>>>    at this point:
> >>>>>     - everything correct inside job.c
> >>>>>     - no public interfaces with _locked prefix
> >>>>>     - all public interfaces take mutex internally
> >>>>>     - no external user take mutex by hand
> >>>>>
> >>>>> We can rename all internal static functions at this step too.
> >>>>>
> >>>>> 2. Introduce some public _locked APIs, that we'll use in next patches
> >>>>>
> >>>>> 3. Now start fixing external users in several patches:
> >>>>>       - protect by mutex direct use of job fields
> >>>>>     - make wider locks and move to _locked APIs inside them where
> >>>>> needed
> >>>>>
> >>>>>
> >>>>> In this scenario, every updated unit becomes formally correct after
> >>>>> update, and after all steps everything is formally correct, and we can
> >>>>> move to turning-on the mutex.
> >>>>>
> >>>>
> >>>> I don't understand your logic also here, sorry :(
> >>>>
> >>>> I assume you want to keep patch 1-4, then the problem is assing
> >>>> job_lock
> >>>> and renaming functions in _locked.
> >>>> So I would say the problem is in patch 5-6-10-11-12-13. All the others
> >>>> should be self contained.
> >>>>
> >>>> I understand patch 5 is a little hard to follow.
> >>>>
> >>>> Now, I am not sure what you propose here but it seems that the end goal
> >>>> is to just have the same result, but with additional intermediate steps
> >>>> that are just "do this just because in the next patch will be useful".
> >>>> I think the problem is that we are going to miss the "why we need the
> >>>> lock" logic in the patches if we do so.
> >>>>
> >>>> The logic I tried to convey in this order is the following:
> >>>> - job.h: add _locked duplicates for job API functions called with and
> >>>> without job_mutex
> >>>>      Just create duplicates of functions
> >>>>
> >>>> - jobs: protect jobs with job_lock/unlock
> >>>>      QMP and monitor functions call APIs that assume lock is taken,
> >>>>      drivers must take explicitly the lock
> >>>>
> >>>> - jobs: rename static functions called with job_mutex held
> >>>> - job.h: rename job API functions called with job_mutex held
> >>>> - block_job: rename block_job functions called with job_mutex held
> >>>>      *given* that some functions are always under lock, transform
> >>>>      them in _locked. Requires the job_lock/unlock patch
> >>>>
> >>>> - job.h: define unlocked functions
> >>>>      Comments on the public functions that are not _locked
> >>>>
> >>>>
> >>>> @Kevin, since you also had some feedbacks on the patch ordering, do you
> >>>> agree with this ordering or you have some other ideas?
> >>>>
> >>>> Following your suggestion, we could move patches 10-11-12-13 before
> >>>> patch 6 "jobs: protect jobs with job_lock/unlock".
> >>>>
> >>>> (Apologies for changing my mind, but being the second complain I am
> >>>> starting to reconsider reordering the patches).
> >>>>
> >>>
> >>> In two words, what I mean: let's keep the following invariant from patch
> >>> to patch:
> >>>
> >>> 1. Function that has _locked() prefix is always called with lock held
> >>> 2. Function that has _locked() prefix never calls functions that take
> >>> lock by themselves so that would dead-lock
> >>> 3. Function that is documented as "called with lock not held" is never
> >>> called with lock held
> >>>
> >>> That what I mean by "formal correctness": yes, we know that lock is
> >>> noop, but still let's keep code logic to correspond function naming and
> >>> comments that we add.
> >>>
> >>
> >> Ok I get what you mean, but then we have useless changes for public
> >> functions that eventually will only be _locked() like job_next_locked:
> >>
> >> The function is always called in a loop, so it is pointless to take the
> >> lock inside. Therefore the patch would be "incorrect" on its own anyways.
> >>
> >> Then, we would have a patch where we add the lock guard inside, and
> >> another one where we remove it and rename to _locked and take the lock
> >> outside. Seems unnecessary to me.
> > 
> > For me it looks a bit simpler than you describe. And anyway keeping the
> > correctness from patch to patch worth the complexity. I'll give an
> > argument.
> > 
> > First what is the best practices? Best practices is when every patch is
> > good and absolutely correct. So that you can apply any number of patches
> > from the beginning of the series (01-NN), commit them to master and this
> > will break neither compilation, nor tests, nor readability, nothing.
> > This makes the review process iterable: if I'm OK with patches 01-03, I
> > give them r-b and don't think about them. I don't have to keep in mind
> > any tricky things. And I can review 04 several days later not rereading
> > 01-03 (or at least I can consider applied 01-03 as a good correct base
> > state). This way I'm sure, that if I reviewed all patches one-by-one,
> > each one is correct, then the whole thing is correct.
> > 
> > A lot harder to review when we have only collective correctness: the
> > whole series being applied make a correct thing, but we can't say it
> > about intermediate states. In your series we can't be absolutely correct
> > with each patch, as we have to switch from aio-context lock to mutex in
> > one patch, that's why mutex is added as noop. That's a reasonable and
> > (seems) unsolvable drawback. That's a thing I have to keep in mind
> > during the whole review. But I'd prefer not add more such things, like
> > comments and _locked suffixes that don't correspond to the code.
> > 
> > With the invariant that I propose, the following logic works:
> > 
> > If
> >    1. we keep the invariant from patch to patch
> >    AND
> >    2. at the end we have updated all users of the internal and external
> > APIs, not missed some file or function
> > Then everything is correct at the end.
> > 
> > Without the invariant I can't prove that everything is correct at the
> > end, as it is hard to follow the degree of correctness from patch to
> > patch. In your way the only invariant that we have from patch to patch,
> > is that mutex is noop, so all changes do nothing, and therefore they are
> > correct. This way I can give an r-b to all such patches not thinking
> > about details, they are noop. But when I finally have to review the
> > patch that turns on the mutex, I'll have to recheck all internal and
> > external API users, which is equivalent to review all the changes merged
> > into one patch.
> > 
> > 
> > 
> > Consider the case with job_next. The most correct way to update it IMHO:
> > 
> > 1. Add lock inside job_next() and add job_next_locked() - in one patch
> > with other similar changes of job.c and job.h.
> > 
> > At this moment we have job_next() calls in a loop, which is not good (we
> > want larger critical section), but that doesn't break the invariant I
> > proposed above.
> 
> The only thing I am pointing here is that this breaks "readability",
> meaning if someone bisects here it will find a very weird situation
> (aside from the fact that there is a noop lock).
> 
> But I guess this is fine, as long as I write it in the commit message.
> 
> And since these patch are waiting here for more than 3 months now, I
> would say if the others (Kevin?) agree I will change the order with what
> you proposed here.

Yes, I think Vladimir is having the same difficulties with reading the
series as I had. And I believe his suggestion would make the
intermediate states less impossible to review. The question is how much
work it would be and whether you're willing to do this. As I said, if
reorganising is too hard, I'm okay with just ignoring the intermediate
state and reviewing the series as if it were a single patch.

Kevin



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 10/18] jobs: rename static functions called with job_mutex held
  2022-06-24 14:29               ` Kevin Wolf
@ 2022-06-24 15:28                 ` Paolo Bonzini
  2022-06-24 17:20                   ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 48+ messages in thread
From: Paolo Bonzini @ 2022-06-24 15:28 UTC (permalink / raw)
  To: Kevin Wolf, Emanuele Giuseppe Esposito
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, Hanna Reitz, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

On 6/24/22 16:29, Kevin Wolf wrote:
> Yes, I think Vladimir is having the same difficulties with reading the
> series as I had. And I believe his suggestion would make the
> intermediate states less impossible to review. The question is how much
> work it would be and whether you're willing to do this. As I said, if
> reorganising is too hard, I'm okay with just ignoring the intermediate
> state and reviewing the series as if it were a single patch.

I think we've tried different intermediate states for each of the 
previous 6 versions, and none of them were really satisfactory. :(

Paolo


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 10/18] jobs: rename static functions called with job_mutex held
  2022-06-24 15:28                 ` Paolo Bonzini
@ 2022-06-24 17:20                   ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-24 17:20 UTC (permalink / raw)
  To: Paolo Bonzini, Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, Hanna Reitz, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel



Am 24/06/2022 um 17:28 schrieb Paolo Bonzini:
> On 6/24/22 16:29, Kevin Wolf wrote:
>> Yes, I think Vladimir is having the same difficulties with reading the
>> series as I had. And I believe his suggestion would make the
>> intermediate states less impossible to review. The question is how much
>> work it would be and whether you're willing to do this. As I said, if
>> reorganising is too hard, I'm okay with just ignoring the intermediate
>> state and reviewing the series as if it were a single patch.
> 
> I think we've tried different intermediate states for each of the
> previous 6 versions, and none of them were really satisfactory. :(
> 

Yes. v7 in this case basically means that we tried at least 4-5 times to
reorganize patches.

Nevertheless I could give it a try. I just hope I won't regret it :)

If I don't manage, I will just give up and re-send the serie with
Vladimir's nitpicks.

But yeah, I guess we all agree that this is the last time I reorganize
this serie.

Feedback are always very well welcome, but not anymore on reordering
please ;)

Thank you,
Emanuele



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 01/18] job.c: make job_mutex and job_lock/unlock() public
  2022-06-16 13:18 ` [PATCH v7 01/18] job.c: make job_mutex and job_lock/unlock() public Emanuele Giuseppe Esposito
  2022-06-21 13:47   ` Vladimir Sementsov-Ogievskiy
@ 2022-06-24 18:22   ` Vladimir Sementsov-Ogievskiy
  2022-06-28 13:08     ` Emanuele Giuseppe Esposito
  1 sibling, 1 reply; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-06-24 18:22 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

I've already acked this (honestly, because Stefan do), but still, want to clarify:

On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
> job mutex will be used to protect the job struct elements and list,
> replacing AioContext locks.
> 
> Right now use a shared lock for all jobs, in order to keep things
> simple. Once the AioContext lock is gone, we can introduce per-job
> locks.
> 
> To simplify the switch from aiocontext to job lock, introduce
> *nop*  lock/unlock functions and macros.
> We want to always call job_lock/unlock outside the AioContext locks,
> and not vice-versa, otherwise we might get a deadlock.

Could you describe here, why we get a deadlock?

As I understand, we'll deadlock if two code paths exist simultaneously:

1. we take job mutex under aiocontext lock
2. we take aiocontex lock under job mutex

If these paths exists, it's possible that one thread goes through [1] and another through [2]. If thread [1] holds job-mutex and want to take aiocontext-lock, and in the same time thread [2] holds aiocontext-lock and want to take job-mutext, that's a dead-lock.

If you say, that we must avoid [1], do you have in mind that we have [2] somewhere? If so, this should be mentioned here.

If not, could we just make a normal mutex, not a noop?

> This is not
> straightforward to do, and that's why we start with nop functions.
> Once everything is protected by job_lock/unlock, we can change the nop into
> an actual mutex and remove the aiocontext lock.
> 
> Since job_mutex is already being used, add static
> real_job_{lock/unlock} for the existing usage.
> 
> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
> Reviewed-by: Stefan Hajnoczi<stefanha@redhat.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 10/18] jobs: rename static functions called with job_mutex held
  2022-06-22 18:38       ` Vladimir Sementsov-Ogievskiy
  2022-06-23  9:08         ` Emanuele Giuseppe Esposito
@ 2022-06-28  7:40         ` Emanuele Giuseppe Esposito
  2022-06-28 10:47           ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-28  7:40 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, Kevin Wolf
  Cc: Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel



Am 22/06/2022 um 20:38 schrieb Vladimir Sementsov-Ogievskiy:
> On 6/22/22 17:26, Emanuele Giuseppe Esposito wrote:
>>
>>
>> Am 21/06/2022 um 19:26 schrieb Vladimir Sementsov-Ogievskiy:
>>> On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
>>>> With the*nop*  job_lock/unlock placed, rename the static
>>>> functions that are always under job_mutex, adding "_locked" suffix.
>>>>
>>>> List of functions that get this suffix:
>>>> job_txn_ref           job_txn_del_job
>>>> job_txn_apply           job_state_transition
>>>> job_should_pause       job_event_cancelled
>>>> job_event_completed       job_event_pending
>>>> job_event_ready           job_event_idle
>>>> job_do_yield           job_timer_not_pending
>>>> job_do_dismiss           job_conclude
>>>> job_update_rc           job_commit
>>>> job_abort           job_clean
>>>> job_finalize_single       job_cancel_async
>>>> job_completed_txn_abort       job_prepare
>>>> job_needs_finalize       job_do_finalize
>>>> job_transition_to_pending  job_completed_txn_success
>>>> job_completed           job_cancel_err
>>>> job_force_cancel_err
>>>>
>>>> Note that "locked" refers to the*nop*  job_lock/unlock, and not
>>>> real_job_lock/unlock.
>>>>
>>>> No functional change intended.
>>>>
>>>> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
>>>
>>>
>>> Hmm. Maybe it was already discussed.. But for me it seems, that it would
>>> be simpler to review previous patches, that fix job_ API users to use
>>> locking properly, if this renaming go earlier.
>>>
>>> Anyway, in this series, we can't update everything at once. So patch to
>>> patch, we make the code more and more correct. (yes I remember that
>>> lock() is a noop, but I should review thinking that it real, otherwise,
>>> how to review?)
>>>
>>> So, I'm saying about formal correctness of using lock() unlock()
>>> function in connection with introduced _locked prifixes and in
>>> connection with how it should finally work.
>>>
>>> You do:
>>>
>>> 05. introduce some _locked functions, that just duplicates, and
>>> job_pause_point_locked() is formally inconsistent, as I said.
>>>
>>> 06. Update a lot of places, to give them their final form (but not
>>> final, as some functions will be renamed to _locked, some not, hard to
>>> imagine)
>>>
>>> 07,08,09. Update some more, and even more places. very hard to track
>>> formal correctness of using locks
>>>
>>> 10-...: rename APIs.
>>>
>>>
>>> What do you think about the following:
>>>
>>> 1. Introduce noop lock, and some internal _locked() versions, and keep
>>> formal consistency inside job.c, considering all public interfaces as
>>> unlocked:
>>>
>>>   at this point:
>>>    - everything correct inside job.c
>>>    - no public interfaces with _locked prefix
>>>    - all public interfaces take mutex internally
>>>    - no external user take mutex by hand
>>>
>>> We can rename all internal static functions at this step too.
>>>
>>> 2. Introduce some public _locked APIs, that we'll use in next patches
>>>
>>> 3. Now start fixing external users in several patches:
>>>      - protect by mutex direct use of job fields
>>>    - make wider locks and move to _locked APIs inside them where needed
>>>
>>>
>>> In this scenario, every updated unit becomes formally correct after
>>> update, and after all steps everything is formally correct, and we can
>>> move to turning-on the mutex.
>>>
>>
>> I don't understand your logic also here, sorry :(
>>
>> I assume you want to keep patch 1-4, then the problem is assing job_lock
>> and renaming functions in _locked.
>> So I would say the problem is in patch 5-6-10-11-12-13. All the others
>> should be self contained.
>>
>> I understand patch 5 is a little hard to follow.
>>
>> Now, I am not sure what you propose here but it seems that the end goal
>> is to just have the same result, but with additional intermediate steps
>> that are just "do this just because in the next patch will be useful".
>> I think the problem is that we are going to miss the "why we need the
>> lock" logic in the patches if we do so.
>>
>> The logic I tried to convey in this order is the following:
>> - job.h: add _locked duplicates for job API functions called with and
>> without job_mutex
>>     Just create duplicates of functions
>>
>> - jobs: protect jobs with job_lock/unlock
>>     QMP and monitor functions call APIs that assume lock is taken,
>>     drivers must take explicitly the lock
>>
>> - jobs: rename static functions called with job_mutex held
>> - job.h: rename job API functions called with job_mutex held
>> - block_job: rename block_job functions called with job_mutex held
>>     *given* that some functions are always under lock, transform
>>     them in _locked. Requires the job_lock/unlock patch
>>
>> - job.h: define unlocked functions
>>     Comments on the public functions that are not _locked
>>
>>
>> @Kevin, since you also had some feedbacks on the patch ordering, do you
>> agree with this ordering or you have some other ideas?
>>
>> Following your suggestion, we could move patches 10-11-12-13 before
>> patch 6 "jobs: protect jobs with job_lock/unlock".
>>
>> (Apologies for changing my mind, but being the second complain I am
>> starting to reconsider reordering the patches).
>>
> 
> In two words, what I mean: let's keep the following invariant from patch
> to patch:
> 
> 1. Function that has _locked() prefix is always called with lock held
> 2. Function that has _locked() prefix never calls functions that take
> lock by themselves so that would dead-lock
> 3. Function that is documented as "called with lock not held" is never
> called with lock held
> 
> That what I mean by "formal correctness": yes, we know that lock is
> noop, but still let's keep code logic to correspond function naming and
> comments that we add.
> 
> 

Ok so far I did the following:

- duplicated each public function as static {function}_locked()
- made sure all functions in job.c call only _locked() functions, since
the lock is always taken internally

Now, we need to do the same also for blockjob API in blockjob.h
The only problem is that in order to use and create functions like
block_job_get_locked(), we need:
- job_get_locked() to be public, and can't be just replacing job_get()
because it is still used everywhere
- block_job_get_locked() to be public too, since it is used in other
files like blockdev.c

so we will have:
Job *job_get()
Job *job_get_locked()

BlockJob *block_job_get(const char *id)
BlockJob *block_job_get_locked(const char *id)


Therefore with this approach I need to make all _locked() functions
public, duplicating the API. Is that what you want?

Emanuele



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 10/18] jobs: rename static functions called with job_mutex held
  2022-06-28  7:40         ` Emanuele Giuseppe Esposito
@ 2022-06-28 10:47           ` Vladimir Sementsov-Ogievskiy
  2022-06-28 13:04             ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-06-28 10:47 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block, Kevin Wolf
  Cc: Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

On 6/28/22 10:40, Emanuele Giuseppe Esposito wrote:
> 
> 
> Am 22/06/2022 um 20:38 schrieb Vladimir Sementsov-Ogievskiy:
>> On 6/22/22 17:26, Emanuele Giuseppe Esposito wrote:
>>>
>>>
>>> Am 21/06/2022 um 19:26 schrieb Vladimir Sementsov-Ogievskiy:
>>>> On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
>>>>> With the*nop*  job_lock/unlock placed, rename the static
>>>>> functions that are always under job_mutex, adding "_locked" suffix.
>>>>>
>>>>> List of functions that get this suffix:
>>>>> job_txn_ref           job_txn_del_job
>>>>> job_txn_apply           job_state_transition
>>>>> job_should_pause       job_event_cancelled
>>>>> job_event_completed       job_event_pending
>>>>> job_event_ready           job_event_idle
>>>>> job_do_yield           job_timer_not_pending
>>>>> job_do_dismiss           job_conclude
>>>>> job_update_rc           job_commit
>>>>> job_abort           job_clean
>>>>> job_finalize_single       job_cancel_async
>>>>> job_completed_txn_abort       job_prepare
>>>>> job_needs_finalize       job_do_finalize
>>>>> job_transition_to_pending  job_completed_txn_success
>>>>> job_completed           job_cancel_err
>>>>> job_force_cancel_err
>>>>>
>>>>> Note that "locked" refers to the*nop*  job_lock/unlock, and not
>>>>> real_job_lock/unlock.
>>>>>
>>>>> No functional change intended.
>>>>>
>>>>> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
>>>>
>>>>
>>>> Hmm. Maybe it was already discussed.. But for me it seems, that it would
>>>> be simpler to review previous patches, that fix job_ API users to use
>>>> locking properly, if this renaming go earlier.
>>>>
>>>> Anyway, in this series, we can't update everything at once. So patch to
>>>> patch, we make the code more and more correct. (yes I remember that
>>>> lock() is a noop, but I should review thinking that it real, otherwise,
>>>> how to review?)
>>>>
>>>> So, I'm saying about formal correctness of using lock() unlock()
>>>> function in connection with introduced _locked prifixes and in
>>>> connection with how it should finally work.
>>>>
>>>> You do:
>>>>
>>>> 05. introduce some _locked functions, that just duplicates, and
>>>> job_pause_point_locked() is formally inconsistent, as I said.
>>>>
>>>> 06. Update a lot of places, to give them their final form (but not
>>>> final, as some functions will be renamed to _locked, some not, hard to
>>>> imagine)
>>>>
>>>> 07,08,09. Update some more, and even more places. very hard to track
>>>> formal correctness of using locks
>>>>
>>>> 10-...: rename APIs.
>>>>
>>>>
>>>> What do you think about the following:
>>>>
>>>> 1. Introduce noop lock, and some internal _locked() versions, and keep
>>>> formal consistency inside job.c, considering all public interfaces as
>>>> unlocked:
>>>>
>>>>    at this point:
>>>>     - everything correct inside job.c
>>>>     - no public interfaces with _locked prefix
>>>>     - all public interfaces take mutex internally
>>>>     - no external user take mutex by hand
>>>>
>>>> We can rename all internal static functions at this step too.
>>>>
>>>> 2. Introduce some public _locked APIs, that we'll use in next patches
>>>>
>>>> 3. Now start fixing external users in several patches:
>>>>       - protect by mutex direct use of job fields
>>>>     - make wider locks and move to _locked APIs inside them where needed
>>>>
>>>>
>>>> In this scenario, every updated unit becomes formally correct after
>>>> update, and after all steps everything is formally correct, and we can
>>>> move to turning-on the mutex.
>>>>
>>>
>>> I don't understand your logic also here, sorry :(
>>>
>>> I assume you want to keep patch 1-4, then the problem is assing job_lock
>>> and renaming functions in _locked.
>>> So I would say the problem is in patch 5-6-10-11-12-13. All the others
>>> should be self contained.
>>>
>>> I understand patch 5 is a little hard to follow.
>>>
>>> Now, I am not sure what you propose here but it seems that the end goal
>>> is to just have the same result, but with additional intermediate steps
>>> that are just "do this just because in the next patch will be useful".
>>> I think the problem is that we are going to miss the "why we need the
>>> lock" logic in the patches if we do so.
>>>
>>> The logic I tried to convey in this order is the following:
>>> - job.h: add _locked duplicates for job API functions called with and
>>> without job_mutex
>>>      Just create duplicates of functions
>>>
>>> - jobs: protect jobs with job_lock/unlock
>>>      QMP and monitor functions call APIs that assume lock is taken,
>>>      drivers must take explicitly the lock
>>>
>>> - jobs: rename static functions called with job_mutex held
>>> - job.h: rename job API functions called with job_mutex held
>>> - block_job: rename block_job functions called with job_mutex held
>>>      *given* that some functions are always under lock, transform
>>>      them in _locked. Requires the job_lock/unlock patch
>>>
>>> - job.h: define unlocked functions
>>>      Comments on the public functions that are not _locked
>>>
>>>
>>> @Kevin, since you also had some feedbacks on the patch ordering, do you
>>> agree with this ordering or you have some other ideas?
>>>
>>> Following your suggestion, we could move patches 10-11-12-13 before
>>> patch 6 "jobs: protect jobs with job_lock/unlock".
>>>
>>> (Apologies for changing my mind, but being the second complain I am
>>> starting to reconsider reordering the patches).
>>>
>>
>> In two words, what I mean: let's keep the following invariant from patch
>> to patch:
>>
>> 1. Function that has _locked() prefix is always called with lock held
>> 2. Function that has _locked() prefix never calls functions that take
>> lock by themselves so that would dead-lock
>> 3. Function that is documented as "called with lock not held" is never
>> called with lock held
>>
>> That what I mean by "formal correctness": yes, we know that lock is
>> noop, but still let's keep code logic to correspond function naming and
>> comments that we add.
>>
>>
> 
> Ok so far I did the following:
> 
> - duplicated each public function as static {function}_locked()

They shouldn't be duplicates: function without _locked suffix should take the mutex.

> - made sure all functions in job.c call only _locked() functions, since
> the lock is always taken internally
> 
> Now, we need to do the same also for blockjob API in blockjob.h
> The only problem is that in order to use and create functions like
> block_job_get_locked(), we need:
> - job_get_locked() to be public, and can't be just replacing job_get()
> because it is still used everywhere
> - block_job_get_locked() to be public too, since it is used in other
> files like blockdev.c
> 
> so we will have:
> Job *job_get()
> Job *job_get_locked()
> 
> BlockJob *block_job_get(const char *id)
> BlockJob *block_job_get_locked(const char *id)
> 
> 
> Therefore with this approach I need to make all _locked() functions
> public, duplicating the API. Is that what you want?
> 

I don't see any problem in it. After the whole update we can drop public APIs that are unused.


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 10/18] jobs: rename static functions called with job_mutex held
  2022-06-28 10:47           ` Vladimir Sementsov-Ogievskiy
@ 2022-06-28 13:04             ` Emanuele Giuseppe Esposito
  2022-06-28 15:22               ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-28 13:04 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, Kevin Wolf
  Cc: Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel



Am 28/06/2022 um 12:47 schrieb Vladimir Sementsov-Ogievskiy:
> On 6/28/22 10:40, Emanuele Giuseppe Esposito wrote:
>>
>>
>> Am 22/06/2022 um 20:38 schrieb Vladimir Sementsov-Ogievskiy:
>>> On 6/22/22 17:26, Emanuele Giuseppe Esposito wrote:
>>>>
>>>>
>>>> Am 21/06/2022 um 19:26 schrieb Vladimir Sementsov-Ogievskiy:
>>>>> On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
>>>>>> With the*nop*  job_lock/unlock placed, rename the static
>>>>>> functions that are always under job_mutex, adding "_locked" suffix.
>>>>>>
>>>>>> List of functions that get this suffix:
>>>>>> job_txn_ref           job_txn_del_job
>>>>>> job_txn_apply           job_state_transition
>>>>>> job_should_pause       job_event_cancelled
>>>>>> job_event_completed       job_event_pending
>>>>>> job_event_ready           job_event_idle
>>>>>> job_do_yield           job_timer_not_pending
>>>>>> job_do_dismiss           job_conclude
>>>>>> job_update_rc           job_commit
>>>>>> job_abort           job_clean
>>>>>> job_finalize_single       job_cancel_async
>>>>>> job_completed_txn_abort       job_prepare
>>>>>> job_needs_finalize       job_do_finalize
>>>>>> job_transition_to_pending  job_completed_txn_success
>>>>>> job_completed           job_cancel_err
>>>>>> job_force_cancel_err
>>>>>>
>>>>>> Note that "locked" refers to the*nop*  job_lock/unlock, and not
>>>>>> real_job_lock/unlock.
>>>>>>
>>>>>> No functional change intended.
>>>>>>
>>>>>> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
>>>>>
>>>>>
>>>>> Hmm. Maybe it was already discussed.. But for me it seems, that it
>>>>> would
>>>>> be simpler to review previous patches, that fix job_ API users to use
>>>>> locking properly, if this renaming go earlier.
>>>>>
>>>>> Anyway, in this series, we can't update everything at once. So
>>>>> patch to
>>>>> patch, we make the code more and more correct. (yes I remember that
>>>>> lock() is a noop, but I should review thinking that it real,
>>>>> otherwise,
>>>>> how to review?)
>>>>>
>>>>> So, I'm saying about formal correctness of using lock() unlock()
>>>>> function in connection with introduced _locked prifixes and in
>>>>> connection with how it should finally work.
>>>>>
>>>>> You do:
>>>>>
>>>>> 05. introduce some _locked functions, that just duplicates, and
>>>>> job_pause_point_locked() is formally inconsistent, as I said.
>>>>>
>>>>> 06. Update a lot of places, to give them their final form (but not
>>>>> final, as some functions will be renamed to _locked, some not, hard to
>>>>> imagine)
>>>>>
>>>>> 07,08,09. Update some more, and even more places. very hard to track
>>>>> formal correctness of using locks
>>>>>
>>>>> 10-...: rename APIs.
>>>>>
>>>>>
>>>>> What do you think about the following:
>>>>>
>>>>> 1. Introduce noop lock, and some internal _locked() versions, and keep
>>>>> formal consistency inside job.c, considering all public interfaces as
>>>>> unlocked:
>>>>>
>>>>>    at this point:
>>>>>     - everything correct inside job.c
>>>>>     - no public interfaces with _locked prefix
>>>>>     - all public interfaces take mutex internally
>>>>>     - no external user take mutex by hand
>>>>>
>>>>> We can rename all internal static functions at this step too.
>>>>>
>>>>> 2. Introduce some public _locked APIs, that we'll use in next patches
>>>>>
>>>>> 3. Now start fixing external users in several patches:
>>>>>       - protect by mutex direct use of job fields
>>>>>     - make wider locks and move to _locked APIs inside them where
>>>>> needed
>>>>>
>>>>>
>>>>> In this scenario, every updated unit becomes formally correct after
>>>>> update, and after all steps everything is formally correct, and we can
>>>>> move to turning-on the mutex.
>>>>>
>>>>
>>>> I don't understand your logic also here, sorry :(
>>>>
>>>> I assume you want to keep patch 1-4, then the problem is assing
>>>> job_lock
>>>> and renaming functions in _locked.
>>>> So I would say the problem is in patch 5-6-10-11-12-13. All the others
>>>> should be self contained.
>>>>
>>>> I understand patch 5 is a little hard to follow.
>>>>
>>>> Now, I am not sure what you propose here but it seems that the end goal
>>>> is to just have the same result, but with additional intermediate steps
>>>> that are just "do this just because in the next patch will be useful".
>>>> I think the problem is that we are going to miss the "why we need the
>>>> lock" logic in the patches if we do so.
>>>>
>>>> The logic I tried to convey in this order is the following:
>>>> - job.h: add _locked duplicates for job API functions called with and
>>>> without job_mutex
>>>>      Just create duplicates of functions
>>>>
>>>> - jobs: protect jobs with job_lock/unlock
>>>>      QMP and monitor functions call APIs that assume lock is taken,
>>>>      drivers must take explicitly the lock
>>>>
>>>> - jobs: rename static functions called with job_mutex held
>>>> - job.h: rename job API functions called with job_mutex held
>>>> - block_job: rename block_job functions called with job_mutex held
>>>>      *given* that some functions are always under lock, transform
>>>>      them in _locked. Requires the job_lock/unlock patch
>>>>
>>>> - job.h: define unlocked functions
>>>>      Comments on the public functions that are not _locked
>>>>
>>>>
>>>> @Kevin, since you also had some feedbacks on the patch ordering, do you
>>>> agree with this ordering or you have some other ideas?
>>>>
>>>> Following your suggestion, we could move patches 10-11-12-13 before
>>>> patch 6 "jobs: protect jobs with job_lock/unlock".
>>>>
>>>> (Apologies for changing my mind, but being the second complain I am
>>>> starting to reconsider reordering the patches).
>>>>
>>>
>>> In two words, what I mean: let's keep the following invariant from patch
>>> to patch:
>>>
>>> 1. Function that has _locked() prefix is always called with lock held
>>> 2. Function that has _locked() prefix never calls functions that take
>>> lock by themselves so that would dead-lock
>>> 3. Function that is documented as "called with lock not held" is never
>>> called with lock held
>>>
>>> That what I mean by "formal correctness": yes, we know that lock is
>>> noop, but still let's keep code logic to correspond function naming and
>>> comments that we add.
>>>
>>>
>>
>> Ok so far I did the following:
>>
>> - duplicated each public function as static {function}_locked()
> 
> They shouldn't be duplicates: function without _locked suffix should
> take the mutex.

By "duplicate" I mean same function name, with just _locked suffix.
Maybe a better definition?

Almost done preparing the patches!

Emanuele

> 
>> - made sure all functions in job.c call only _locked() functions, since
>> the lock is always taken internally
>>
>> Now, we need to do the same also for blockjob API in blockjob.h
>> The only problem is that in order to use and create functions like
>> block_job_get_locked(), we need:
>> - job_get_locked() to be public, and can't be just replacing job_get()
>> because it is still used everywhere
>> - block_job_get_locked() to be public too, since it is used in other
>> files like blockdev.c
>>
>> so we will have:
>> Job *job_get()
>> Job *job_get_locked()
>>
>> BlockJob *block_job_get(const char *id)
>> BlockJob *block_job_get_locked(const char *id)
>>
>>
>> Therefore with this approach I need to make all _locked() functions
>> public, duplicating the API. Is that what you want?
>>
> 
> I don't see any problem in it. After the whole update we can drop public
> APIs that are unused.
> 
> 



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 01/18] job.c: make job_mutex and job_lock/unlock() public
  2022-06-24 18:22   ` Vladimir Sementsov-Ogievskiy
@ 2022-06-28 13:08     ` Emanuele Giuseppe Esposito
  2022-06-28 15:20       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-28 13:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel



Am 24/06/2022 um 20:22 schrieb Vladimir Sementsov-Ogievskiy:
> I've already acked this (honestly, because Stefan do), but still, want
> to clarify:
> 
> On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
>> job mutex will be used to protect the job struct elements and list,
>> replacing AioContext locks.
>>
>> Right now use a shared lock for all jobs, in order to keep things
>> simple. Once the AioContext lock is gone, we can introduce per-job
>> locks.
>>
>> To simplify the switch from aiocontext to job lock, introduce
>> *nop*  lock/unlock functions and macros.
>> We want to always call job_lock/unlock outside the AioContext locks,
>> and not vice-versa, otherwise we might get a deadlock.
> 
> Could you describe here, why we get a deadlock?
> 
> As I understand, we'll deadlock if two code paths exist simultaneously:
> 
> 1. we take job mutex under aiocontext lock
> 2. we take aiocontex lock under job mutex
> 
> If these paths exists, it's possible that one thread goes through [1]
> and another through [2]. If thread [1] holds job-mutex and want to take
> aiocontext-lock, and in the same time thread [2] holds aiocontext-lock
> and want to take job-mutext, that's a dead-lock.
> 
> If you say, that we must avoid [1], do you have in mind that we have [2]
> somewhere? If so, this should be mentioned here
> 
> If not, could we just make a normal mutex, not a noop?

Of course we have [2] somewhere, otherwise I wouldn't even think about
creating a noop function. This idea came up in v1-v2.

Regarding the specific case, I don't remember. But there are tons of
functions that are acquiring the AioContext lock and then calling job_*
API, such as job_cancel_sync in blockdev.c.

I might use job_cancel_sync as example and write it in the commit
message though.

Thank you,
Emanuele
>> This is not
>> straightforward to do, and that's why we start with nop functions.
>> Once everything is protected by job_lock/unlock, we can change the nop
>> into
>> an actual mutex and remove the aiocontext lock.
>>
>> Since job_mutex is already being used, add static
>> real_job_{lock/unlock} for the existing usage.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
>> Reviewed-by: Stefan Hajnoczi<stefanha@redhat.com>
> 
> 



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 01/18] job.c: make job_mutex and job_lock/unlock() public
  2022-06-28 13:08     ` Emanuele Giuseppe Esposito
@ 2022-06-28 15:20       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-06-28 15:20 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

On 6/28/22 16:08, Emanuele Giuseppe Esposito wrote:
> 
> 
> Am 24/06/2022 um 20:22 schrieb Vladimir Sementsov-Ogievskiy:
>> I've already acked this (honestly, because Stefan do), but still, want
>> to clarify:
>>
>> On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
>>> job mutex will be used to protect the job struct elements and list,
>>> replacing AioContext locks.
>>>
>>> Right now use a shared lock for all jobs, in order to keep things
>>> simple. Once the AioContext lock is gone, we can introduce per-job
>>> locks.
>>>
>>> To simplify the switch from aiocontext to job lock, introduce
>>> *nop*  lock/unlock functions and macros.
>>> We want to always call job_lock/unlock outside the AioContext locks,
>>> and not vice-versa, otherwise we might get a deadlock.
>>
>> Could you describe here, why we get a deadlock?
>>
>> As I understand, we'll deadlock if two code paths exist simultaneously:
>>
>> 1. we take job mutex under aiocontext lock
>> 2. we take aiocontex lock under job mutex
>>
>> If these paths exists, it's possible that one thread goes through [1]
>> and another through [2]. If thread [1] holds job-mutex and want to take
>> aiocontext-lock, and in the same time thread [2] holds aiocontext-lock
>> and want to take job-mutext, that's a dead-lock.
>>
>> If you say, that we must avoid [1], do you have in mind that we have [2]
>> somewhere? If so, this should be mentioned here
>>
>> If not, could we just make a normal mutex, not a noop?
> 
> Of course we have [2] somewhere, otherwise I wouldn't even think about
> creating a noop function. This idea came up in v1-v2.
> 
> Regarding the specific case, I don't remember. But there are tons of
> functions that are acquiring the AioContext lock and then calling job_*
> API, such as job_cancel_sync in blockdev.c.
> 
> I might use job_cancel_sync as example and write it in the commit
> message though.
> 

Yes, that's obvious that we have tons of [1]. That's why an example of [2] would be lot more valuable.


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 10/18] jobs: rename static functions called with job_mutex held
  2022-06-28 13:04             ` Emanuele Giuseppe Esposito
@ 2022-06-28 15:22               ` Vladimir Sementsov-Ogievskiy
  2022-06-28 15:26                 ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-06-28 15:22 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block, Kevin Wolf
  Cc: Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

On 6/28/22 16:04, Emanuele Giuseppe Esposito wrote:
>>> Ok so far I did the following:
>>>
>>> - duplicated each public function as static {function}_locked()
>> They shouldn't be duplicates: function without _locked suffix should
>> take the mutex.
> By "duplicate" I mean same function name, with just _locked suffix.
> Maybe a better definition?
> 
> Almost done preparing the patches!

Why not just add _locked version and rework the version without suffix to call _locked under mutex one in one patch, to just keep it all meaningful?

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 10/18] jobs: rename static functions called with job_mutex held
  2022-06-28 15:22               ` Vladimir Sementsov-Ogievskiy
@ 2022-06-28 15:26                 ` Vladimir Sementsov-Ogievskiy
  2022-06-28 17:28                   ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-06-28 15:26 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block, Kevin Wolf
  Cc: Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

On 6/28/22 18:22, Vladimir Sementsov-Ogievskiy wrote:
> On 6/28/22 16:04, Emanuele Giuseppe Esposito wrote:
>>>> Ok so far I did the following:
>>>>
>>>> - duplicated each public function as static {function}_locked()
>>> They shouldn't be duplicates: function without _locked suffix should
>>> take the mutex.
>> By "duplicate" I mean same function name, with just _locked suffix.
>> Maybe a better definition?
>>
>> Almost done preparing the patches!
> 
> Why not just add _locked version and rework the version without suffix to call _locked under mutex one in one patch, to just keep it all meaningful?
> 

I mean, instead of:

patch 1: add a _locked() duplicate

   At this point we have a duplicated function that's just bad practice.

patch 2: remake version without prefix to call _locked() under mutex
  
   Now everything is correct. But we have to track the moment when something strange becomes something correct.


do just

patch 1: rename function to _locked() and add a wrapper without suffix, that calls _locked() under mutex



-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 10/18] jobs: rename static functions called with job_mutex held
  2022-06-28 15:26                 ` Vladimir Sementsov-Ogievskiy
@ 2022-06-28 17:28                   ` Emanuele Giuseppe Esposito
  2022-06-28 19:42                     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 48+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-06-28 17:28 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, Kevin Wolf
  Cc: Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel



Am 28/06/2022 um 17:26 schrieb Vladimir Sementsov-Ogievskiy:
> On 6/28/22 18:22, Vladimir Sementsov-Ogievskiy wrote:
>> On 6/28/22 16:04, Emanuele Giuseppe Esposito wrote:
>>>>> Ok so far I did the following:
>>>>>
>>>>> - duplicated each public function as static {function}_locked()
>>>> They shouldn't be duplicates: function without _locked suffix should
>>>> take the mutex.
>>> By "duplicate" I mean same function name, with just _locked suffix.
>>> Maybe a better definition?
>>>
>>> Almost done preparing the patches!
>>
>> Why not just add _locked version and rework the version without suffix
>> to call _locked under mutex one in one patch, to just keep it all
>> meaningful?
>>
> 
> I mean, instead of:
> 
> patch 1: add a _locked() duplicate
> 
>   At this point we have a duplicated function that's just bad practice.
> 
> patch 2: remake version without prefix to call _locked() under mutex
>  
>   Now everything is correct. But we have to track the moment when
> something strange becomes something correct.
> 
> 
> do just
> 
> patch 1: rename function to _locked() and add a wrapper without suffix,
> that calls _locked() under mutex
> 
> 

That's what I always intended to do. As I said, I just used the wrong word.

Emanuele



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 10/18] jobs: rename static functions called with job_mutex held
  2022-06-28 17:28                   ` Emanuele Giuseppe Esposito
@ 2022-06-28 19:42                     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2022-06-28 19:42 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block, Kevin Wolf
  Cc: Hanna Reitz, Paolo Bonzini, John Snow,
	Vladimir Sementsov-Ogievskiy, Wen Congyang, Xie Changlong,
	Markus Armbruster, Stefan Hajnoczi, Fam Zheng, qemu-devel

On 6/28/22 20:28, Emanuele Giuseppe Esposito wrote:
> 
> 
> Am 28/06/2022 um 17:26 schrieb Vladimir Sementsov-Ogievskiy:
>> On 6/28/22 18:22, Vladimir Sementsov-Ogievskiy wrote:
>>> On 6/28/22 16:04, Emanuele Giuseppe Esposito wrote:
>>>>>> Ok so far I did the following:
>>>>>>
>>>>>> - duplicated each public function as static {function}_locked()
>>>>> They shouldn't be duplicates: function without _locked suffix should
>>>>> take the mutex.
>>>> By "duplicate" I mean same function name, with just _locked suffix.
>>>> Maybe a better definition?
>>>>
>>>> Almost done preparing the patches!
>>>
>>> Why not just add _locked version and rework the version without suffix
>>> to call _locked under mutex one in one patch, to just keep it all
>>> meaningful?
>>>
>>
>> I mean, instead of:
>>
>> patch 1: add a _locked() duplicate
>>
>>    At this point we have a duplicated function that's just bad practice.
>>
>> patch 2: remake version without prefix to call _locked() under mutex
>>   
>>    Now everything is correct. But we have to track the moment when
>> something strange becomes something correct.
>>
>>
>> do just
>>
>> patch 1: rename function to _locked() and add a wrapper without suffix,
>> that calls _locked() under mutex
>>
>>
> 
> That's what I always intended to do. As I said, I just used the wrong word.
> 

Ah, OK then, I misunderstood.


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2022-06-28 19:44 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-16 13:18 [PATCH v7 00/18] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
2022-06-16 13:18 ` [PATCH v7 01/18] job.c: make job_mutex and job_lock/unlock() public Emanuele Giuseppe Esposito
2022-06-21 13:47   ` Vladimir Sementsov-Ogievskiy
2022-06-24 18:22   ` Vladimir Sementsov-Ogievskiy
2022-06-28 13:08     ` Emanuele Giuseppe Esposito
2022-06-28 15:20       ` Vladimir Sementsov-Ogievskiy
2022-06-16 13:18 ` [PATCH v7 02/18] job.h: categorize fields in struct Job Emanuele Giuseppe Esposito
2022-06-21 14:29   ` Vladimir Sementsov-Ogievskiy
2022-06-16 13:18 ` [PATCH v7 03/18] job.c: API functions not used outside should be static Emanuele Giuseppe Esposito
2022-06-21 14:34   ` Vladimir Sementsov-Ogievskiy
2022-06-16 13:18 ` [PATCH v7 04/18] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-06-21 14:40   ` Vladimir Sementsov-Ogievskiy
2022-06-16 13:18 ` [PATCH v7 05/18] job.h: add _locked duplicates for job API functions called with and without job_mutex Emanuele Giuseppe Esposito
2022-06-21 15:03   ` Vladimir Sementsov-Ogievskiy
2022-06-22 14:26     ` Emanuele Giuseppe Esposito
2022-06-22 18:12       ` Vladimir Sementsov-Ogievskiy
2022-06-16 13:18 ` [PATCH v7 06/18] jobs: protect jobs with job_lock/unlock Emanuele Giuseppe Esposito
2022-06-21 16:47   ` Vladimir Sementsov-Ogievskiy
2022-06-21 17:09   ` Vladimir Sementsov-Ogievskiy
2022-06-16 13:18 ` [PATCH v7 07/18] jobs: add job lock in find_* functions Emanuele Giuseppe Esposito
2022-06-16 13:18 ` [PATCH v7 08/18] jobs: use job locks also in the unit tests Emanuele Giuseppe Esposito
2022-06-16 13:18 ` [PATCH v7 09/18] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU Emanuele Giuseppe Esposito
2022-06-16 13:18 ` [PATCH v7 10/18] jobs: rename static functions called with job_mutex held Emanuele Giuseppe Esposito
2022-06-21 17:26   ` Vladimir Sementsov-Ogievskiy
2022-06-22 14:26     ` Emanuele Giuseppe Esposito
2022-06-22 18:38       ` Vladimir Sementsov-Ogievskiy
2022-06-23  9:08         ` Emanuele Giuseppe Esposito
2022-06-23 11:10           ` Vladimir Sementsov-Ogievskiy
2022-06-23 11:19             ` Emanuele Giuseppe Esposito
2022-06-23 11:58               ` Vladimir Sementsov-Ogievskiy
2022-06-24 14:29               ` Kevin Wolf
2022-06-24 15:28                 ` Paolo Bonzini
2022-06-24 17:20                   ` Emanuele Giuseppe Esposito
2022-06-28  7:40         ` Emanuele Giuseppe Esposito
2022-06-28 10:47           ` Vladimir Sementsov-Ogievskiy
2022-06-28 13:04             ` Emanuele Giuseppe Esposito
2022-06-28 15:22               ` Vladimir Sementsov-Ogievskiy
2022-06-28 15:26                 ` Vladimir Sementsov-Ogievskiy
2022-06-28 17:28                   ` Emanuele Giuseppe Esposito
2022-06-28 19:42                     ` Vladimir Sementsov-Ogievskiy
2022-06-16 13:18 ` [PATCH v7 11/18] job.h: rename job API " Emanuele Giuseppe Esposito
2022-06-16 13:18 ` [PATCH v7 12/18] block_job: rename block_job " Emanuele Giuseppe Esposito
2022-06-16 13:18 ` [PATCH v7 13/18] job.h: define unlocked functions Emanuele Giuseppe Esposito
2022-06-16 13:18 ` [PATCH v7 14/18] commit and mirror: create new nodes using bdrv_get_aio_context, and not the job aiocontext Emanuele Giuseppe Esposito
2022-06-16 13:18 ` [PATCH v7 15/18] job: detect change of aiocontext within job coroutine Emanuele Giuseppe Esposito
2022-06-16 13:18 ` [PATCH v7 16/18] jobs: protect job.aio_context with BQL and job_mutex Emanuele Giuseppe Esposito
2022-06-16 13:18 ` [PATCH v7 17/18] job.c: enable job lock/unlock and remove Aiocontext locks Emanuele Giuseppe Esposito
2022-06-16 13:18 ` [PATCH v7 18/18] block_job_query: remove atomic read Emanuele Giuseppe Esposito

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.