All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/14] job: replace AioContext lock with job_mutex
@ 2021-11-04 14:53 Emanuele Giuseppe Esposito
  2021-11-04 14:53 ` [RFC PATCH v2 01/14] job.c: make job_lock/unlock public Emanuele Giuseppe Esposito
                   ` (13 more replies)
  0 siblings, 14 replies; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-11-04 14:53 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

In this series, we want to remove the AioContext lock and instead
use the already existent job_mutex to protect the job structures
and list. This is part of the work to get rid of AioContext lock
usage in favour of smaller granularity locks.

In order to simplify reviewer's job, job lock/unlock functions and
macros are added as empty prototypes (nop) in patch 1.
They are converted to use the actual job mutex only in the last
patch, 14. In this way we can freely create locking sections
without worrying about deadlocks with the aiocontext lock.

Patch 2 defines what fields in the job structure need protection,
and patches 3-4 categorize respectively locked and unlocked
functions in the job API.

Patch 5-9 are in preparation to the job locks, they try to reduce
the aiocontext critical sections and other minor fixes.

Patch 10-13 introduces the (nop) job lock into the job API and
its users, following the comments and categorizations done in
patch 2-3-4.

Patch 14 makes the prototypes in patch 1 use the job_mutex and
removes all aiocontext lock at the same time.

Tested this series by running unit tests, qemu-iotests and qtests
(x86_64).

This serie is based on my previous series "block layer: split
block APIs in global state and I/O".

Based-on: <20211025101735.2060852-1-eesposit@redhat.com>
---
RFC v2:
* use JOB_LOCK_GUARD and WITH_JOB_LOCK_GUARD
* mu(u)ltiple typos in commit messages
* job API split patches are sent separately in another series
* use of empty job_{lock/unlock} and JOB_LOCK_GUARD/WITH_JOB_LOCK_GUARD
  to avoid deadlocks and simplify the reviewer job

Emanuele Giuseppe Esposito (14):
  job.c: make job_lock/unlock public
  job.h: categorize fields in struct Job
  job.h: define locked functions
  job.h: define unlocked functions
  block/mirror.c: use of job helpers in drivers to avoid TOC/TOU
  job.c: make job_event_* functions static
  job.c: move inner aiocontext lock in callbacks
  aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED
  jobs: remove aiocontext locks since the functions are under BQL
  jobs: protect jobs with job_lock/unlock
  block_job_query: remove atomic read
  jobs: use job locks and helpers also in the unit tests
  jobs: add job lock in find_* functions
  job.c: enable job lock/unlock and remove Aiocontext locks

 include/block/aio-wait.h         |  15 +-
 include/qemu/job.h               | 171 ++++++++++---
 block.c                          |   6 +
 block/mirror.c                   |   8 +-
 block/replication.c              |   6 +
 blockdev.c                       |  88 +++----
 blockjob.c                       |  62 +++--
 job-qmp.c                        |  54 ++--
 job.c                            | 410 ++++++++++++++++++++++---------
 monitor/qmp-cmds.c               |   2 +
 qemu-img.c                       |   8 +-
 tests/unit/test-bdrv-drain.c     |  44 ++--
 tests/unit/test-block-iothread.c |   6 +-
 tests/unit/test-blockjob-txn.c   |  10 +
 tests/unit/test-blockjob.c       |  68 ++---
 15 files changed, 628 insertions(+), 330 deletions(-)

-- 
2.27.0



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v2 01/14] job.c: make job_lock/unlock public
  2021-11-04 14:53 [RFC PATCH v2 00/14] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
@ 2021-11-04 14:53 ` Emanuele Giuseppe Esposito
  2021-12-16 16:18   ` Stefan Hajnoczi
  2021-11-04 14:53 ` [RFC PATCH v2 02/14] job.h: categorize fields in struct Job Emanuele Giuseppe Esposito
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-11-04 14:53 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

job mutex will be used to protect the job struct elements and list,
replacing AioContext locks.

Right now use a shared lock for all jobs, in order to keep things
simple. Once the AioContext lock is gone, we can introduce per-job
locks.

To simplify the switch from aiocontext to job lock, introduce
*nop* lock/unlock functions and macros. Once everything is protected
by jobs, we can add the mutex and remove the aiocontext.
Since job_mutex is already being used, add static
_job_{lock/unlock}.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/qemu/job.h | 18 ++++++++++++++++++
 job.c              | 39 +++++++++++++++++++++++++++------------
 2 files changed, 45 insertions(+), 12 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 7e9e59f4b8..ccf7826426 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -297,6 +297,24 @@ typedef enum JobCreateFlags {
     JOB_MANUAL_DISMISS = 0x04,
 } JobCreateFlags;
 
+/**
+ * job_lock:
+ *
+ * Take the mutex protecting the list of jobs and their status.
+ * Most functions called by the monitor need to call job_lock
+ * and job_unlock manually.  On the other hand, function called
+ * by the block jobs themselves and by the block layer will take the
+ * lock for you.
+ */
+void job_lock(void);
+
+/**
+ * job_unlock:
+ *
+ * Release the mutex protecting the list of jobs and their status.
+ */
+void job_unlock(void);
+
 /**
  * Allocate and return a new job transaction. Jobs can be added to the
  * transaction using job_txn_add_job().
diff --git a/job.c b/job.c
index 94b142684f..0e4dacf028 100644
--- a/job.c
+++ b/job.c
@@ -32,6 +32,12 @@
 #include "trace/trace-root.h"
 #include "qapi/qapi-events-job.h"
 
+/*
+ * job_mutex protects the jobs list, but also makes the
+ * struct job fields thread-safe.
+ */
+static QemuMutex job_mutex;
+
 static QLIST_HEAD(, Job) jobs = QLIST_HEAD_INITIALIZER(jobs);
 
 /* Job State Transition Table */
@@ -74,17 +80,26 @@ struct JobTxn {
     int refcnt;
 };
 
-/* Right now, this mutex is only needed to synchronize accesses to job->busy
- * and job->sleep_timer, such as concurrent calls to job_do_yield and
- * job_enter. */
-static QemuMutex job_mutex;
+#define JOB_LOCK_GUARD() /* QEMU_LOCK_GUARD(&job_mutex) */
+
+#define WITH_JOB_LOCK_GUARD() /* WITH_QEMU_LOCK_GUARD(&job_mutex) */
+
+void job_lock(void)
+{
+    /* nop */
+}
+
+void job_unlock(void)
+{
+    /* nop */
+}
 
-static void job_lock(void)
+static void _job_lock(void)
 {
     qemu_mutex_lock(&job_mutex);
 }
 
-static void job_unlock(void)
+static void _job_unlock(void)
 {
     qemu_mutex_unlock(&job_mutex);
 }
@@ -449,21 +464,21 @@ void job_enter_cond(Job *job, bool(*fn)(Job *job))
         return;
     }
 
-    job_lock();
+    _job_lock();
     if (job->busy) {
-        job_unlock();
+        _job_unlock();
         return;
     }
 
     if (fn && !fn(job)) {
-        job_unlock();
+        _job_unlock();
         return;
     }
 
     assert(!job->deferred_to_main_loop);
     timer_del(&job->sleep_timer);
     job->busy = true;
-    job_unlock();
+    _job_unlock();
     aio_co_enter(job->aio_context, job->co);
 }
 
@@ -480,13 +495,13 @@ void job_enter(Job *job)
  * called explicitly. */
 static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
 {
-    job_lock();
+    _job_lock();
     if (ns != -1) {
         timer_mod(&job->sleep_timer, ns);
     }
     job->busy = false;
     job_event_idle(job);
-    job_unlock();
+    _job_unlock();
     qemu_coroutine_yield();
 
     /* Set by job_enter_cond() before re-entering the coroutine.  */
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v2 02/14] job.h: categorize fields in struct Job
  2021-11-04 14:53 [RFC PATCH v2 00/14] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
  2021-11-04 14:53 ` [RFC PATCH v2 01/14] job.c: make job_lock/unlock public Emanuele Giuseppe Esposito
@ 2021-11-04 14:53 ` Emanuele Giuseppe Esposito
  2021-12-16 16:21   ` Stefan Hajnoczi
  2021-11-04 14:53 ` [RFC PATCH v2 03/14] job.h: define locked functions Emanuele Giuseppe Esposito
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-11-04 14:53 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

Categorize the fields in struct Job to understand which ones
need to be protected by the job mutex and which don't.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 include/qemu/job.h | 57 +++++++++++++++++++++++++++-------------------
 1 file changed, 34 insertions(+), 23 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index ccf7826426..f7036ac6b3 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -40,27 +40,52 @@ typedef struct JobTxn JobTxn;
  * Long-running operation.
  */
 typedef struct Job {
+
+    /* Fields set at initialization (job_create), and never modified */
+
     /** The ID of the job. May be NULL for internal jobs. */
     char *id;
 
-    /** The type of this job. */
+    /**
+     * The type of this job.
+     * All callbacks are called with job_mutex *not* held.
+     */
     const JobDriver *driver;
 
-    /** Reference count of the block job */
-    int refcnt;
-
-    /** Current state; See @JobStatus for details. */
-    JobStatus status;
-
     /** AioContext to run the job coroutine in */
     AioContext *aio_context;
 
     /**
      * The coroutine that executes the job.  If not NULL, it is reentered when
      * busy is false and the job is cancelled.
+     * Initialized in job_start()
      */
     Coroutine *co;
 
+    /** True if this job should automatically finalize itself */
+    bool auto_finalize;
+
+    /** True if this job should automatically dismiss itself */
+    bool auto_dismiss;
+
+    /** The completion function that will be called when the job completes.  */
+    BlockCompletionFunc *cb;
+
+    /** The opaque value that is passed to the completion function.  */
+    void *opaque;
+
+    /* ProgressMeter API is thread-safe */
+    ProgressMeter progress;
+
+
+    /** Protected by job_mutex */
+
+    /** Reference count of the block job */
+    int refcnt;
+
+    /** Current state; See @JobStatus for details. */
+    JobStatus status;
+
     /**
      * Timer that is used by @job_sleep_ns. Accessed under job_mutex (in
      * job.c).
@@ -76,7 +101,7 @@ typedef struct Job {
     /**
      * Set to false by the job while the coroutine has yielded and may be
      * re-entered by job_enter(). There may still be I/O or event loop activity
-     * pending. Accessed under block_job_mutex (in blockjob.c).
+     * pending. Accessed under job_mutex.
      *
      * When the job is deferred to the main loop, busy is true as long as the
      * bottom half is still pending.
@@ -112,14 +137,6 @@ typedef struct Job {
     /** Set to true when the job has deferred work to the main loop. */
     bool deferred_to_main_loop;
 
-    /** True if this job should automatically finalize itself */
-    bool auto_finalize;
-
-    /** True if this job should automatically dismiss itself */
-    bool auto_dismiss;
-
-    ProgressMeter progress;
-
     /**
      * Return code from @run and/or @prepare callback(s).
      * Not final until the job has reached the CONCLUDED status.
@@ -134,12 +151,6 @@ typedef struct Job {
      */
     Error *err;
 
-    /** The completion function that will be called when the job completes.  */
-    BlockCompletionFunc *cb;
-
-    /** The opaque value that is passed to the completion function.  */
-    void *opaque;
-
     /** Notifiers called when a cancelled job is finalised */
     NotifierList on_finalize_cancelled;
 
@@ -167,6 +178,7 @@ typedef struct Job {
 
 /**
  * Callbacks and other information about a Job driver.
+ * All callbacks are invoked with job_mutex *not* held.
  */
 struct JobDriver {
 
@@ -460,7 +472,6 @@ void job_yield(Job *job);
  */
 void coroutine_fn job_sleep_ns(Job *job, int64_t ns);
 
-
 /** Returns the JobType of a given Job. */
 JobType job_type(const Job *job);
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v2 03/14] job.h: define locked functions
  2021-11-04 14:53 [RFC PATCH v2 00/14] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
  2021-11-04 14:53 ` [RFC PATCH v2 01/14] job.c: make job_lock/unlock public Emanuele Giuseppe Esposito
  2021-11-04 14:53 ` [RFC PATCH v2 02/14] job.h: categorize fields in struct Job Emanuele Giuseppe Esposito
@ 2021-11-04 14:53 ` Emanuele Giuseppe Esposito
  2021-12-16 16:48   ` Stefan Hajnoczi
  2021-11-04 14:53 ` [RFC PATCH v2 04/14] job.h: define unlocked functions Emanuele Giuseppe Esposito
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-11-04 14:53 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

These functions assume that the job lock is held by the
caller, to avoid TOC/TOU conditions.

Introduce also additional helpers that define _locked
functions (useful when the job_mutex is globally applied).

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 include/qemu/job.h | 66 ++++++++++++++++++++++++++++++++++++++++++----
 job.c              | 18 +++++++++++--
 2 files changed, 77 insertions(+), 7 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index f7036ac6b3..c7a6bcad1b 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -355,6 +355,8 @@ void job_txn_unref(JobTxn *txn);
  * the reference that is automatically grabbed here.
  *
  * If @txn is NULL, the function does nothing.
+ *
+ * Called between job_lock and job_unlock.
  */
 void job_txn_add_job(JobTxn *txn, Job *job);
 
@@ -377,12 +379,16 @@ void *job_create(const char *job_id, const JobDriver *driver, JobTxn *txn,
 /**
  * Add a reference to Job refcnt, it will be decreased with job_unref, and then
  * be freed if it comes to be the last reference.
+ *
+ * Called between job_lock and job_unlock.
  */
 void job_ref(Job *job);
 
 /**
  * Release a reference that was previously acquired with job_ref() or
  * job_create(). If it's the last reference to the object, it will be freed.
+ *
+ * Called between job_lock and job_unlock.
  */
 void job_unref(Job *job);
 
@@ -429,6 +435,8 @@ void job_event_completed(Job *job);
  * Conditionally enter the job coroutine if the job is ready to run, not
  * already busy and fn() returns true. fn() is called while under the job_lock
  * critical section.
+ *
+ * Called between job_lock and job_unlock, but it releases the lock temporarly.
  */
 void job_enter_cond(Job *job, bool(*fn)(Job *job));
 
@@ -490,34 +498,52 @@ bool job_is_cancelled(Job *job);
  */
 bool job_cancel_requested(Job *job);
 
-/** Returns whether the job is in a completed state. */
+/**
+ * Returns whether the job is in a completed state.
+ * Called between job_lock and job_unlock.
+ */
 bool job_is_completed(Job *job);
 
 /** Returns whether the job is ready to be completed. */
 bool job_is_ready(Job *job);
 
+/** Same as job_is_ready(), but assumes job_lock is held. */
+bool job_is_ready_locked(Job *job);
+
 /**
  * Request @job to pause at the next pause point. Must be paired with
  * job_resume(). If the job is supposed to be resumed by user action, call
  * job_user_pause() instead.
+ *
+ * Called between job_lock and job_unlock.
  */
 void job_pause(Job *job);
 
-/** Resumes a @job paused with job_pause. */
+/**
+ * Resumes a @job paused with job_pause.
+ * Called between job_lock and job_unlock.
+ */
 void job_resume(Job *job);
 
 /**
  * Asynchronously pause the specified @job.
  * Do not allow a resume until a matching call to job_user_resume.
+ *
+ * Called between job_lock and job_unlock.
  */
 void job_user_pause(Job *job, Error **errp);
 
-/** Returns true if the job is user-paused. */
+/**
+ * Returns true if the job is user-paused.
+ * Called between job_lock and job_unlock.
+ */
 bool job_user_paused(Job *job);
 
 /**
  * Resume the specified @job.
  * Must be paired with a preceding job_user_pause.
+ *
+ * Called between job_lock and job_unlock.
  */
 void job_user_resume(Job *job, Error **errp);
 
@@ -526,6 +552,8 @@ void job_user_resume(Job *job, Error **errp);
  * first one if @job is %NULL.
  *
  * Returns the requested job, or %NULL if there are no more jobs left.
+ *
+ * Called between job_lock and job_unlock.
  */
 Job *job_next(Job *job);
 
@@ -533,6 +561,8 @@ Job *job_next(Job *job);
  * Get the job identified by @id (which must not be %NULL).
  *
  * Returns the requested job, or %NULL if it doesn't exist.
+ *
+ * Called between job_lock and job_unlock.
  */
 Job *job_get(const char *id);
 
@@ -540,27 +570,39 @@ Job *job_get(const char *id);
  * Check whether the verb @verb can be applied to @job in its current state.
  * Returns 0 if the verb can be applied; otherwise errp is set and -EPERM
  * returned.
+ *
+ * Called between job_lock and job_unlock.
  */
 int job_apply_verb(Job *job, JobVerb verb, Error **errp);
 
 /** The @job could not be started, free it. */
 void job_early_fail(Job *job);
 
+/** Same as job_early_fail(), but assumes job_lock is held. */
+void job_early_fail_locked(Job *job);
+
 /** Moves the @job from RUNNING to READY */
 void job_transition_to_ready(Job *job);
 
-/** Asynchronously complete the specified @job. */
+/**
+ * Asynchronously complete the specified @job.
+ * Called between job_lock and job_unlock, but it releases the lock temporarly.
+ */
 void job_complete(Job *job, Error **errp);
 
 /**
  * Asynchronously cancel the specified @job. If @force is true, the job should
  * be cancelled immediately without waiting for a consistent state.
+ *
+ * Called between job_lock and job_unlock.
  */
 void job_cancel(Job *job, bool force);
 
 /**
  * Cancels the specified job like job_cancel(), but may refuse to do so if the
  * operation isn't meaningful in the current state of the job.
+ *
+ * Called between job_lock and job_unlock.
  */
 void job_user_cancel(Job *job, bool force, Error **errp);
 
@@ -577,7 +619,13 @@ void job_user_cancel(Job *job, bool force, Error **errp);
  */
 int job_cancel_sync(Job *job, bool force);
 
-/** Synchronously force-cancels all jobs using job_cancel_sync(). */
+/**
+ * Synchronously force-cancels all jobs using job_cancel_sync().
+ *
+ * Called with job_lock *not* held, unlike most other APIs consumed
+ * by the monitor! This is primarly to avoid adding unnecessary lock-unlock
+ * patterns in the caller.
+ */
 void job_cancel_sync_all(void);
 
 /**
@@ -593,6 +641,8 @@ void job_cancel_sync_all(void);
  * Returns the return value from the job.
  *
  * Callers must hold the AioContext lock of job->aio_context.
+ *
+ * Called between job_lock and job_unlock.
  */
 int job_complete_sync(Job *job, Error **errp);
 
@@ -603,12 +653,16 @@ int job_complete_sync(Job *job, Error **errp);
  * FIXME: Make the below statement universally true:
  * For jobs that support the manual workflow mode, all graph changes that occur
  * as a result will occur after this command and before a successful reply.
+ *
+ * Called between job_lock and job_unlock.
  */
 void job_finalize(Job *job, Error **errp);
 
 /**
  * Remove the concluded @job from the query list and resets the passed pointer
  * to %NULL. Returns an error if the job is not actually concluded.
+ *
+ * Called between job_lock and job_unlock.
  */
 void job_dismiss(Job **job, Error **errp);
 
@@ -620,6 +674,8 @@ void job_dismiss(Job **job, Error **errp);
  * cancelled before completing, and -errno in other error cases.
  *
  * Callers must hold the AioContext lock of job->aio_context.
+ *
+ * Called between job_lock and job_unlock.
  */
 int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp);
 
diff --git a/job.c b/job.c
index 0e4dacf028..e393c1222f 100644
--- a/job.c
+++ b/job.c
@@ -242,7 +242,8 @@ bool job_cancel_requested(Job *job)
     return job->cancelled;
 }
 
-bool job_is_ready(Job *job)
+/* Called with job_mutex held. */
+bool job_is_ready_locked(Job *job)
 {
     switch (job->status) {
     case JOB_STATUS_UNDEFINED:
@@ -264,6 +265,13 @@ bool job_is_ready(Job *job)
     return false;
 }
 
+/* Called with job_mutex lock *not* held */
+bool job_is_ready(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job_is_ready_locked(job);
+}
+
 bool job_is_completed(Job *job)
 {
     switch (job->status) {
@@ -659,12 +667,18 @@ void job_dismiss(Job **jobptr, Error **errp)
     *jobptr = NULL;
 }
 
-void job_early_fail(Job *job)
+void job_early_fail_locked(Job *job)
 {
     assert(job->status == JOB_STATUS_CREATED);
     job_do_dismiss(job);
 }
 
+void job_early_fail(Job *job)
+{
+    JOB_LOCK_GUARD();
+    job_early_fail_locked(job);
+}
+
 static void job_conclude(Job *job)
 {
     job_state_transition(job, JOB_STATUS_CONCLUDED);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v2 04/14] job.h: define unlocked functions
  2021-11-04 14:53 [RFC PATCH v2 00/14] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (2 preceding siblings ...)
  2021-11-04 14:53 ` [RFC PATCH v2 03/14] job.h: define locked functions Emanuele Giuseppe Esposito
@ 2021-11-04 14:53 ` Emanuele Giuseppe Esposito
  2021-12-16 16:51   ` Stefan Hajnoczi
  2021-11-04 14:53 ` [RFC PATCH v2 05/14] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU Emanuele Giuseppe Esposito
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-11-04 14:53 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

All these functions assume that the lock is not held, and acquire
it internally.

These functions will be useful when job_lock is globally applied,
as they will allow callers to access the job struct fields
without worrying about the job lock.

Update also the comments in blockjob.c (and move them in job.c).

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 include/qemu/job.h | 21 +++++++++++
 blockjob.c         | 20 -----------
 job.c              | 88 ++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 107 insertions(+), 22 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index c7a6bcad1b..d34c55dad0 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -679,4 +679,25 @@ void job_dismiss(Job **job, Error **errp);
  */
 int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp);
 
+/** Enters the @job if it is not paused */
+void job_enter_not_paused(Job *job);
+
+/** returns @job->ret */
+bool job_has_failed(Job *job);
+
+/** Returns the @job->status */
+JobStatus job_get_status(Job *job);
+
+/** Returns the @job->pause_count */
+int job_get_pause_count(Job *job);
+
+/** Returns @job->paused */
+bool job_get_paused(Job *job);
+
+/** Returns @job->busy */
+bool job_get_busy(Job *job);
+
+/** Return true if @job not paused and not cancelled */
+bool job_not_paused_nor_cancelled(Job *job);
+
 #endif
diff --git a/blockjob.c b/blockjob.c
index 4982f6a2b5..53c1e9c406 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -36,21 +36,6 @@
 #include "qemu/main-loop.h"
 #include "qemu/timer.h"
 
-/*
- * The block job API is composed of two categories of functions.
- *
- * The first includes functions used by the monitor.  The monitor is
- * peculiar in that it accesses the block job list with block_job_get, and
- * therefore needs consistency across block_job_get and the actual operation
- * (e.g. block_job_set_speed).  The consistency is achieved with
- * aio_context_acquire/release.  These functions are declared in blockjob.h.
- *
- * The second includes functions used by the block job drivers and sometimes
- * by the core block layer.  These do not care about locking, because the
- * whole coroutine runs under the AioContext lock, and are declared in
- * blockjob_int.h.
- */
-
 static bool is_block_job(Job *job)
 {
     return job_type(job) == JOB_TYPE_BACKUP ||
@@ -433,11 +418,6 @@ static void block_job_event_ready(Notifier *n, void *opaque)
 }
 
 
-/*
- * API for block job drivers and the block layer.  These functions are
- * declared in blockjob_int.h.
- */
-
 void *block_job_create(const char *job_id, const BlockJobDriver *driver,
                        JobTxn *txn, BlockDriverState *bs, uint64_t perm,
                        uint64_t shared_perm, int64_t speed, int flags,
diff --git a/job.c b/job.c
index e393c1222f..bd36207021 100644
--- a/job.c
+++ b/job.c
@@ -32,6 +32,23 @@
 #include "trace/trace-root.h"
 #include "qapi/qapi-events-job.h"
 
+/*
+ * The job API is composed of two categories of functions.
+ *
+ * The first includes functions used by the monitor.  The monitor is
+ * peculiar in that it accesses the block job list with job_get, and
+ * therefore needs consistency across job_get and the actual operation
+ * (e.g. job_user_cancel). To achieve this consistency, the caller
+ * calls job_lock/job_unlock itself around the whole operation.
+ * These functions are declared in job-monitor.h.
+ *
+ *
+ * The second includes functions used by the block job drivers and sometimes
+ * by the core block layer. These delegate the locking to the callee instead,
+ * and are declared in job-driver.h.
+ */
+
+
 /*
  * job_mutex protects the jobs list, but also makes the
  * struct job fields thread-safe.
@@ -230,18 +247,70 @@ const char *job_type_str(const Job *job)
     return JobType_str(job_type(job));
 }
 
-bool job_is_cancelled(Job *job)
+JobStatus job_get_status(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job->status;
+}
+
+int job_get_pause_count(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job->pause_count;
+}
+
+bool job_get_paused(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job->paused;
+}
+
+bool job_get_busy(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job->busy;
+}
+
+bool job_has_failed(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job->ret < 0;
+}
+
+/* Called with job_mutex held. */
+static bool job_is_cancelled_locked(Job *job)
 {
     /* force_cancel may be true only if cancelled is true, too */
     assert(job->cancelled || !job->force_cancel);
     return job->force_cancel;
 }
 
-bool job_cancel_requested(Job *job)
+/* Called with job_mutex *not* held. */
+bool job_is_cancelled(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job_is_cancelled_locked(job);
+}
+
+bool job_not_paused_nor_cancelled(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return !job->paused && !job_is_cancelled_locked(job);
+}
+
+/* Called with job_mutex held. */
+static bool job_cancel_requested_locked(Job *job)
 {
     return job->cancelled;
 }
 
+/* Called with job_mutex *not* held. */
+bool job_cancel_requested(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job_cancel_requested_locked(job);
+}
+
 /* Called with job_mutex held. */
 bool job_is_ready_locked(Job *job)
 {
@@ -294,6 +363,13 @@ bool job_is_completed(Job *job)
     return false;
 }
 
+/* Called with job_mutex lock *not* held */
+static bool job_is_completed_unlocked(Job *job)
+{
+    JOB_LOCK_GUARD();
+    return job_is_completed(job);
+}
+
 static bool job_started(Job *job)
 {
     return job->co;
@@ -593,6 +669,14 @@ void job_pause(Job *job)
     }
 }
 
+void job_enter_not_paused(Job *job)
+{
+    JOB_LOCK_GUARD();
+    if (!job->paused) {
+        job_enter_cond(job, NULL);
+    }
+}
+
 void job_resume(Job *job)
 {
     assert(job->pause_count > 0);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v2 05/14] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU
  2021-11-04 14:53 [RFC PATCH v2 00/14] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (3 preceding siblings ...)
  2021-11-04 14:53 ` [RFC PATCH v2 04/14] job.h: define unlocked functions Emanuele Giuseppe Esposito
@ 2021-11-04 14:53 ` Emanuele Giuseppe Esposito
  2021-12-18 11:53   ` Vladimir Sementsov-Ogievskiy
  2021-11-04 14:53 ` [RFC PATCH v2 06/14] job.c: make job_event_* functions static Emanuele Giuseppe Esposito
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-11-04 14:53 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

Once job lock is used and aiocontext is removed, mirror has
to perform job operations under the same critical section,
using the helpers prepared in previous commit.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block/mirror.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 00089e519b..f22fa7da6e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -653,7 +653,7 @@ static int mirror_exit_common(Job *job)
     BlockDriverState *target_bs;
     BlockDriverState *mirror_top_bs;
     Error *local_err = NULL;
-    bool abort = job->ret < 0;
+    bool abort = job_has_failed(job);
     int ret = 0;
 
     if (s->prepared) {
@@ -1161,9 +1161,7 @@ static void mirror_complete(Job *job, Error **errp)
     s->should_complete = true;
 
     /* If the job is paused, it will be re-entered when it is resumed */
-    if (!job->paused) {
-        job_enter(job);
-    }
+    job_enter_not_paused(job);
 }
 
 static void coroutine_fn mirror_pause(Job *job)
@@ -1182,7 +1180,7 @@ static bool mirror_drained_poll(BlockJob *job)
      * from one of our own drain sections, to avoid a deadlock waiting for
      * ourselves.
      */
-    if (!s->common.job.paused && !job_is_cancelled(&job->job) && !s->in_drain) {
+    if (job_not_paused_nor_cancelled(&s->common.job) && !s->in_drain) {
         return true;
     }
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v2 06/14] job.c: make job_event_* functions static
  2021-11-04 14:53 [RFC PATCH v2 00/14] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (4 preceding siblings ...)
  2021-11-04 14:53 ` [RFC PATCH v2 05/14] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU Emanuele Giuseppe Esposito
@ 2021-11-04 14:53 ` Emanuele Giuseppe Esposito
  2021-12-16 16:54   ` Stefan Hajnoczi
  2021-11-04 14:53 ` [RFC PATCH v2 07/14] job.c: move inner aiocontext lock in callbacks Emanuele Giuseppe Esposito
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-11-04 14:53 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

job_event_* functions can all be static, as they are not used
outside job.c.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 include/qemu/job.h |  6 ------
 job.c              | 12 ++++++++++--
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index d34c55dad0..58b3af47e3 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -425,12 +425,6 @@ void job_progress_set_remaining(Job *job, uint64_t remaining);
  */
 void job_progress_increase_remaining(Job *job, uint64_t delta);
 
-/** To be called when a cancelled job is finalised. */
-void job_event_cancelled(Job *job);
-
-/** To be called when a successfully completed job is finalised. */
-void job_event_completed(Job *job);
-
 /**
  * Conditionally enter the job coroutine if the job is ready to run, not
  * already busy and fn() returns true. fn() is called while under the job_lock
diff --git a/job.c b/job.c
index bd36207021..ce5066522f 100644
--- a/job.c
+++ b/job.c
@@ -514,12 +514,20 @@ void job_progress_increase_remaining(Job *job, uint64_t delta)
     progress_increase_remaining(&job->progress, delta);
 }
 
-void job_event_cancelled(Job *job)
+/**
+ * To be called when a cancelled job is finalised.
+ * Called with job_mutex held.
+ */
+static void job_event_cancelled(Job *job)
 {
     notifier_list_notify(&job->on_finalize_cancelled, job);
 }
 
-void job_event_completed(Job *job)
+/**
+ * To be called when a successfully completed job is finalised.
+ * Called with job_mutex held.
+ */
+static void job_event_completed(Job *job)
 {
     notifier_list_notify(&job->on_finalize_completed, job);
 }
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v2 07/14] job.c: move inner aiocontext lock in callbacks
  2021-11-04 14:53 [RFC PATCH v2 00/14] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (5 preceding siblings ...)
  2021-11-04 14:53 ` [RFC PATCH v2 06/14] job.c: make job_event_* functions static Emanuele Giuseppe Esposito
@ 2021-11-04 14:53 ` Emanuele Giuseppe Esposito
  2021-11-04 14:53 ` [RFC PATCH v2 08/14] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-11-04 14:53 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

Instead of having the lock in job_tnx_apply, move it inside
in the callback. This will be helpful for next commits, when
we introduce job_lock/unlock pairs.

job_transition_to_pending() and job_needs_finalize() do not
need to be protected by the aiocontext lock.

No functional change intended.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 job.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/job.c b/job.c
index ce5066522f..7856fa734b 100644
--- a/job.c
+++ b/job.c
@@ -170,7 +170,6 @@ static void job_txn_del_job(Job *job)
 
 static int job_txn_apply(Job *job, int fn(Job *))
 {
-    AioContext *inner_ctx;
     Job *other_job, *next;
     JobTxn *txn = job->txn;
     int rc = 0;
@@ -185,10 +184,7 @@ static int job_txn_apply(Job *job, int fn(Job *))
     aio_context_release(job->aio_context);
 
     QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
-        inner_ctx = other_job->aio_context;
-        aio_context_acquire(inner_ctx);
         rc = fn(other_job);
-        aio_context_release(inner_ctx);
         if (rc) {
             break;
         }
@@ -820,11 +816,15 @@ static void job_clean(Job *job)
 
 static int job_finalize_single(Job *job)
 {
+    AioContext *ctx = job->aio_context;
+
     assert(job_is_completed(job));
 
     /* Ensure abort is called for late-transactional failures */
     job_update_rc(job);
 
+    aio_context_acquire(ctx);
+
     if (!job->ret) {
         job_commit(job);
     } else {
@@ -832,6 +832,8 @@ static int job_finalize_single(Job *job)
     }
     job_clean(job);
 
+    aio_context_release(ctx);
+
     if (job->cb) {
         job->cb(job->opaque, job->ret);
     }
@@ -952,11 +954,16 @@ static void job_completed_txn_abort(Job *job)
 
 static int job_prepare(Job *job)
 {
+    AioContext *ctx = job->aio_context;
     assert(qemu_in_main_thread());
+
     if (job->ret == 0 && job->driver->prepare) {
+        aio_context_acquire(ctx);
         job->ret = job->driver->prepare(job);
+        aio_context_release(ctx);
         job_update_rc(job);
     }
+
     return job->ret;
 }
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v2 08/14] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED
  2021-11-04 14:53 [RFC PATCH v2 00/14] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (6 preceding siblings ...)
  2021-11-04 14:53 ` [RFC PATCH v2 07/14] job.c: move inner aiocontext lock in callbacks Emanuele Giuseppe Esposito
@ 2021-11-04 14:53 ` Emanuele Giuseppe Esposito
  2021-11-04 14:53 ` [RFC PATCH v2 09/14] jobs: remove aiocontext locks since the functions are under BQL Emanuele Giuseppe Esposito
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-11-04 14:53 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

Same as AIO_WAIT_WHILE macro, but if we are in the Main loop
do not release and then acquire ctx_ 's aiocontext.

Once all Aiocontext locks go away, this macro will replace
AIO_WAIT_WHILE.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 include/block/aio-wait.h | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index b39eefb38d..ff27fe4eab 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -59,10 +59,11 @@ typedef struct {
 extern AioWait global_aio_wait;
 
 /**
- * AIO_WAIT_WHILE:
+ * _AIO_WAIT_WHILE:
  * @ctx: the aio context, or NULL if multiple aio contexts (for which the
  *       caller does not hold a lock) are involved in the polling condition.
  * @cond: wait while this conditional expression is true
+ * @unlock: whether to unlock and then lock again @ctx
  *
  * Wait while a condition is true.  Use this to implement synchronous
  * operations that require event loop activity.
@@ -75,7 +76,7 @@ extern AioWait global_aio_wait;
  * wait on conditions between two IOThreads since that could lead to deadlock,
  * go via the main loop instead.
  */
-#define AIO_WAIT_WHILE(ctx, cond) ({                               \
+#define _AIO_WAIT_WHILE(ctx, cond, unlock) ({                      \
     bool waited_ = false;                                          \
     AioWait *wait_ = &global_aio_wait;                             \
     AioContext *ctx_ = (ctx);                                      \
@@ -90,11 +91,11 @@ extern AioWait global_aio_wait;
         assert(qemu_get_current_aio_context() ==                   \
                qemu_get_aio_context());                            \
         while ((cond)) {                                           \
-            if (ctx_) {                                            \
+            if (unlock && ctx_) {                                  \
                 aio_context_release(ctx_);                         \
             }                                                      \
             aio_poll(qemu_get_aio_context(), true);                \
-            if (ctx_) {                                            \
+            if (unlock && ctx_) {                                  \
                 aio_context_acquire(ctx_);                         \
             }                                                      \
             waited_ = true;                                        \
@@ -103,6 +104,12 @@ extern AioWait global_aio_wait;
     qatomic_dec(&wait_->num_waiters);                              \
     waited_; })
 
+#define AIO_WAIT_WHILE(ctx, cond)                                  \
+    _AIO_WAIT_WHILE(ctx, cond, true)
+
+#define AIO_WAIT_WHILE_UNLOCKED(ctx, cond)                         \
+    _AIO_WAIT_WHILE(ctx, cond, false)
+
 /**
  * aio_wait_kick:
  * Wake up the main thread if it is waiting on AIO_WAIT_WHILE().  During
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v2 09/14] jobs: remove aiocontext locks since the functions are under BQL
  2021-11-04 14:53 [RFC PATCH v2 00/14] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (7 preceding siblings ...)
  2021-11-04 14:53 ` [RFC PATCH v2 08/14] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
@ 2021-11-04 14:53 ` Emanuele Giuseppe Esposito
  2021-11-04 14:53 ` [RFC PATCH v2 10/14] jobs: protect jobs with job_lock/unlock Emanuele Giuseppe Esposito
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-11-04 14:53 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

In preparation to the job_lock/unlock patch, remove these
aiocontext locks.
The main reason these two locks are removed here is because
they are inside a loop iterating on the jobs list. Once the
job_lock is added, it will have to protect the whole loop,
wrapping also the aiocontext acquire/release.

We don't want this, as job_lock can only be *wrapped by*
the aiocontext lock, and not vice-versa, to avoid deadlocks.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 blockdev.c | 4 ----
 job-qmp.c  | 4 ----
 2 files changed, 8 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index d9bf842a81..67b55eec11 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3719,15 +3719,11 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 
     for (job = block_job_next(NULL); job; job = block_job_next(job)) {
         BlockJobInfo *value;
-        AioContext *aio_context;
 
         if (block_job_is_internal(job)) {
             continue;
         }
-        aio_context = blk_get_aio_context(job->blk);
-        aio_context_acquire(aio_context);
         value = block_job_query(job, errp);
-        aio_context_release(aio_context);
         if (!value) {
             qapi_free_BlockJobInfoList(head);
             return NULL;
diff --git a/job-qmp.c b/job-qmp.c
index 829a28aa70..a6774aaaa5 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -173,15 +173,11 @@ JobInfoList *qmp_query_jobs(Error **errp)
 
     for (job = job_next(NULL); job; job = job_next(job)) {
         JobInfo *value;
-        AioContext *aio_context;
 
         if (job_is_internal(job)) {
             continue;
         }
-        aio_context = job->aio_context;
-        aio_context_acquire(aio_context);
         value = job_query_single(job, errp);
-        aio_context_release(aio_context);
         if (!value) {
             qapi_free_JobInfoList(head);
             return NULL;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v2 10/14] jobs: protect jobs with job_lock/unlock
  2021-11-04 14:53 [RFC PATCH v2 00/14] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (8 preceding siblings ...)
  2021-11-04 14:53 ` [RFC PATCH v2 09/14] jobs: remove aiocontext locks since the functions are under BQL Emanuele Giuseppe Esposito
@ 2021-11-04 14:53 ` Emanuele Giuseppe Esposito
  2021-12-18 11:57   ` Vladimir Sementsov-Ogievskiy
  2021-11-04 14:53 ` [RFC PATCH v2 11/14] block_job_query: remove atomic read Emanuele Giuseppe Esposito
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-11-04 14:53 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

Introduce the job locking mechanism through the whole job API,
following the comments and requirements of job-monitor (assume
lock is held) and job-driver (lock is not held).

job_{lock/unlock} is independent from _job_{lock/unlock}.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block.c             |   6 ++
 block/replication.c |   4 +
 blockdev.c          |  13 +++
 blockjob.c          |  37 +++++++-
 job-qmp.c           |   4 +
 job.c               | 201 ++++++++++++++++++++++++++++++++++----------
 monitor/qmp-cmds.c  |   2 +
 qemu-img.c          |   8 +-
 8 files changed, 229 insertions(+), 46 deletions(-)

diff --git a/block.c b/block.c
index da80e89ad4..a6dcd9eb36 100644
--- a/block.c
+++ b/block.c
@@ -4826,7 +4826,9 @@ static void bdrv_close(BlockDriverState *bs)
 
 void bdrv_close_all(void)
 {
+    job_lock();
     assert(job_next(NULL) == NULL);
+    job_unlock();
     assert(qemu_in_main_thread());
 
     /* Drop references from requests still in flight, such as canceled block
@@ -5965,6 +5967,8 @@ XDbgBlockGraph *bdrv_get_xdbg_block_graph(Error **errp)
         }
     }
 
+    job_lock();
+
     for (job = block_job_next(NULL); job; job = block_job_next(job)) {
         GSList *el;
 
@@ -5975,6 +5979,8 @@ XDbgBlockGraph *bdrv_get_xdbg_block_graph(Error **errp)
         }
     }
 
+    job_unlock();
+
     QTAILQ_FOREACH(bs, &graph_bdrv_states, node_list) {
         xdbg_graph_add_node(gr, bs, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_DRIVER,
                            bs->node_name);
diff --git a/block/replication.c b/block/replication.c
index 55c8f894aa..0f487cc215 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -149,7 +149,9 @@ static void replication_close(BlockDriverState *bs)
     if (s->stage == BLOCK_REPLICATION_FAILOVER) {
         commit_job = &s->commit_job->job;
         assert(commit_job->aio_context == qemu_get_current_aio_context());
+        job_lock();
         job_cancel_sync(commit_job, false);
+        job_unlock();
     }
 
     if (s->mode == REPLICATION_MODE_SECONDARY) {
@@ -726,7 +728,9 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
          * disk, secondary disk in backup_job_completed().
          */
         if (s->backup_job) {
+            job_lock();
             job_cancel_sync(&s->backup_job->job, true);
+            job_unlock();
         }
 
         if (!failover) {
diff --git a/blockdev.c b/blockdev.c
index 67b55eec11..c5a835d9ed 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -150,6 +150,8 @@ void blockdev_mark_auto_del(BlockBackend *blk)
         return;
     }
 
+    job_lock();
+
     for (job = block_job_next(NULL); job; job = block_job_next(job)) {
         if (block_job_has_bdrv(job, blk_bs(blk))) {
             AioContext *aio_context = job->job.aio_context;
@@ -161,6 +163,8 @@ void blockdev_mark_auto_del(BlockBackend *blk)
         }
     }
 
+    job_unlock();
+
     dinfo->auto_del = 1;
 }
 
@@ -1844,7 +1848,9 @@ static void drive_backup_abort(BlkActionState *common)
         aio_context = bdrv_get_aio_context(state->bs);
         aio_context_acquire(aio_context);
 
+        job_lock();
         job_cancel_sync(&state->job->job, true);
+        job_unlock();
 
         aio_context_release(aio_context);
     }
@@ -1945,7 +1951,9 @@ static void blockdev_backup_abort(BlkActionState *common)
         aio_context = bdrv_get_aio_context(state->bs);
         aio_context_acquire(aio_context);
 
+        job_lock();
         job_cancel_sync(&state->job->job, true);
+        job_unlock();
 
         aio_context_release(aio_context);
     }
@@ -2394,7 +2402,9 @@ exit:
     if (!has_props) {
         qapi_free_TransactionProperties(props);
     }
+    job_lock();
     job_txn_unref(block_job_txn);
+    job_unlock();
 }
 
 BlockDirtyBitmapSha256 *qmp_x_debug_block_dirty_bitmap_sha256(const char *node,
@@ -3717,6 +3727,7 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
     BlockJobInfoList *head = NULL, **tail = &head;
     BlockJob *job;
 
+    job_lock();
     for (job = block_job_next(NULL); job; job = block_job_next(job)) {
         BlockJobInfo *value;
 
@@ -3726,10 +3737,12 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
         value = block_job_query(job, errp);
         if (!value) {
             qapi_free_BlockJobInfoList(head);
+            job_unlock();
             return NULL;
         }
         QAPI_LIST_APPEND(tail, value);
     }
+    job_unlock();
 
     return head;
 }
diff --git a/blockjob.c b/blockjob.c
index 53c1e9c406..dcc13dc336 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -88,19 +88,25 @@ static char *child_job_get_parent_desc(BdrvChild *c)
 static void child_job_drained_begin(BdrvChild *c)
 {
     BlockJob *job = c->opaque;
+    job_lock();
     job_pause(&job->job);
+    job_unlock();
 }
 
 static bool child_job_drained_poll(BdrvChild *c)
 {
     BlockJob *bjob = c->opaque;
     Job *job = &bjob->job;
+    bool inactive_incomplete;
     const BlockJobDriver *drv = block_job_driver(bjob);
 
     /* An inactive or completed job doesn't have any pending requests. Jobs
      * with !job->busy are either already paused or have a pause point after
      * being reentered, so no job driver code will run before they pause. */
-    if (!job->busy || job_is_completed(job)) {
+    job_lock();
+    inactive_incomplete = !job->busy || job_is_completed(job);
+    job_unlock();
+    if (inactive_incomplete) {
         return false;
     }
 
@@ -116,7 +122,9 @@ static bool child_job_drained_poll(BdrvChild *c)
 static void child_job_drained_end(BdrvChild *c, int *drained_end_counter)
 {
     BlockJob *job = c->opaque;
+    job_lock();
     job_resume(&job->job);
+    job_unlock();
 }
 
 static bool child_job_can_set_aio_ctx(BdrvChild *c, AioContext *ctx,
@@ -236,9 +244,16 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
     return 0;
 }
 
+/* Called with job_mutex lock held. */
 static void block_job_on_idle(Notifier *n, void *opaque)
 {
+    /*
+     * we can't kick with job_mutex held, but we also want
+     * to protect the notifier list.
+     */
+    job_unlock();
     aio_wait_kick();
+    job_lock();
 }
 
 bool block_job_is_internal(BlockJob *job)
@@ -257,6 +272,7 @@ static bool job_timer_pending(Job *job)
     return timer_pending(&job->sleep_timer);
 }
 
+/* Called with job_mutex held. May temporarly release the lock. */
 bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
 {
     const BlockJobDriver *drv = block_job_driver(job);
@@ -278,7 +294,9 @@ bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
     job->speed = speed;
 
     if (drv->set_speed) {
+        job_unlock();
         drv->set_speed(job, speed);
+        job_lock();
     }
 
     if (speed && speed <= old_speed) {
@@ -341,6 +359,7 @@ static void block_job_iostatus_set_err(BlockJob *job, int error)
     }
 }
 
+/* Called with job_mutex lock held. */
 static void block_job_event_cancelled(Notifier *n, void *opaque)
 {
     BlockJob *job = opaque;
@@ -360,6 +379,7 @@ static void block_job_event_cancelled(Notifier *n, void *opaque)
                                         job->speed);
 }
 
+/* Called with job_mutex lock held. */
 static void block_job_event_completed(Notifier *n, void *opaque)
 {
     BlockJob *job = opaque;
@@ -386,6 +406,7 @@ static void block_job_event_completed(Notifier *n, void *opaque)
                                         msg);
 }
 
+/* Called with job_mutex lock held. */
 static void block_job_event_pending(Notifier *n, void *opaque)
 {
     BlockJob *job = opaque;
@@ -398,6 +419,7 @@ static void block_job_event_pending(Notifier *n, void *opaque)
                                       job->job.id);
 }
 
+/* Called with job_mutex lock held. */
 static void block_job_event_ready(Notifier *n, void *opaque)
 {
     BlockJob *job = opaque;
@@ -458,6 +480,7 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     job->ready_notifier.notify = block_job_event_ready;
     job->idle_notifier.notify = block_job_on_idle;
 
+    job_lock();
     notifier_list_add(&job->job.on_finalize_cancelled,
                       &job->finalize_cancelled_notifier);
     notifier_list_add(&job->job.on_finalize_completed,
@@ -465,6 +488,7 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     notifier_list_add(&job->job.on_pending, &job->pending_notifier);
     notifier_list_add(&job->job.on_ready, &job->ready_notifier);
     notifier_list_add(&job->job.on_idle, &job->idle_notifier);
+    job_unlock();
 
     error_setg(&job->blocker, "block device is in use by block job: %s",
                job_type_str(&job->job));
@@ -477,14 +501,19 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     blk_set_disable_request_queuing(blk, true);
     blk_set_allow_aio_context_change(blk, true);
 
+    job_lock();
     if (!block_job_set_speed(job, speed, errp)) {
-        job_early_fail(&job->job);
+        job_early_fail_locked(&job->job);
+        job_unlock();
         return NULL;
     }
+    job_unlock();
+
 
     return job;
 }
 
+/* Called with job_mutex lock held. */
 void block_job_iostatus_reset(BlockJob *job)
 {
     assert(qemu_in_main_thread());
@@ -499,7 +528,9 @@ void block_job_user_resume(Job *job)
 {
     assert(qemu_in_main_thread());
     BlockJob *bjob = container_of(job, BlockJob, job);
+    job_lock();
     block_job_iostatus_reset(bjob);
+    job_unlock();
 }
 
 BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
@@ -532,11 +563,13 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
                                         action);
     }
     if (action == BLOCK_ERROR_ACTION_STOP) {
+        job_lock();
         if (!job->job.user_paused) {
             job_pause(&job->job);
             /* make the pause user visible, which will be resumed from QMP. */
             job->job.user_paused = true;
         }
+        job_unlock();
         block_job_iostatus_set_err(job, error);
     }
     return action;
diff --git a/job-qmp.c b/job-qmp.c
index a6774aaaa5..a355dc2954 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -171,6 +171,8 @@ JobInfoList *qmp_query_jobs(Error **errp)
     JobInfoList *head = NULL, **tail = &head;
     Job *job;
 
+    job_lock();
+
     for (job = job_next(NULL); job; job = job_next(job)) {
         JobInfo *value;
 
@@ -180,10 +182,12 @@ JobInfoList *qmp_query_jobs(Error **errp)
         value = job_query_single(job, errp);
         if (!value) {
             qapi_free_JobInfoList(head);
+            job_unlock();
             return NULL;
         }
         QAPI_LIST_APPEND(tail, value);
     }
+    job_unlock();
 
     return head;
 }
diff --git a/job.c b/job.c
index 7856fa734b..5efbf38a72 100644
--- a/job.c
+++ b/job.c
@@ -55,6 +55,7 @@
  */
 static QemuMutex job_mutex;
 
+/* Protected by job_mutex */
 static QLIST_HEAD(, Job) jobs = QLIST_HEAD_INITIALIZER(jobs);
 
 /* Job State Transition Table */
@@ -134,6 +135,7 @@ JobTxn *job_txn_new(void)
     return txn;
 }
 
+/* Called with job_mutex held. */
 static void job_txn_ref(JobTxn *txn)
 {
     txn->refcnt++;
@@ -159,6 +161,7 @@ void job_txn_add_job(JobTxn *txn, Job *job)
     job_txn_ref(txn);
 }
 
+/* Called with job_mutex held. */
 static void job_txn_del_job(Job *job)
 {
     if (job->txn) {
@@ -168,6 +171,7 @@ static void job_txn_del_job(Job *job)
     }
 }
 
+/* Called with job_mutex held. */
 static int job_txn_apply(Job *job, int fn(Job *))
 {
     Job *other_job, *next;
@@ -204,6 +208,7 @@ bool job_is_internal(Job *job)
     return (job->id == NULL);
 }
 
+/* Called with job_mutex held. */
 static void job_state_transition(Job *job, JobStatus s1)
 {
     JobStatus s0 = job->status;
@@ -371,6 +376,7 @@ static bool job_started(Job *job)
     return job->co;
 }
 
+/* Called with job_mutex held. */
 static bool job_should_pause(Job *job)
 {
     return job->pause_count > 0;
@@ -397,6 +403,7 @@ Job *job_get(const char *id)
     return NULL;
 }
 
+/* Called with job_mutex *not* held. */
 static void job_sleep_timer_cb(void *opaque)
 {
     Job *job = opaque;
@@ -404,12 +411,15 @@ static void job_sleep_timer_cb(void *opaque)
     job_enter(job);
 }
 
+/* Called with job_mutex *not* held. */
 void *job_create(const char *job_id, const JobDriver *driver, JobTxn *txn,
                  AioContext *ctx, int flags, BlockCompletionFunc *cb,
                  void *opaque, Error **errp)
 {
     Job *job;
 
+    JOB_LOCK_GUARD();
+
     if (job_id) {
         if (flags & JOB_INTERNAL) {
             error_setg(errp, "Cannot specify job ID for internal job");
@@ -483,7 +493,9 @@ void job_unref(Job *job)
         assert(!job->txn);
 
         if (job->driver->free) {
+            job_unlock();
             job->driver->free(job);
+            job_lock();
         }
 
         QLIST_REMOVE(job, job_list);
@@ -495,16 +507,19 @@ void job_unref(Job *job)
     }
 }
 
+/* Progress API is thread safe */
 void job_progress_update(Job *job, uint64_t done)
 {
     progress_work_done(&job->progress, done);
 }
 
+/* Progress API is thread safe */
 void job_progress_set_remaining(Job *job, uint64_t remaining)
 {
     progress_set_remaining(&job->progress, remaining);
 }
 
+/* Progress API is thread safe */
 void job_progress_increase_remaining(Job *job, uint64_t delta)
 {
     progress_increase_remaining(&job->progress, delta);
@@ -528,16 +543,19 @@ static void job_event_completed(Job *job)
     notifier_list_notify(&job->on_finalize_completed, job);
 }
 
+/* Called with job_mutex held. */
 static void job_event_pending(Job *job)
 {
     notifier_list_notify(&job->on_pending, job);
 }
 
+/* Called with job_mutex held. */
 static void job_event_ready(Job *job)
 {
     notifier_list_notify(&job->on_ready, job);
 }
 
+/* Called with job_mutex held. */
 static void job_event_idle(Job *job)
 {
     notifier_list_notify(&job->on_idle, job);
@@ -567,11 +585,15 @@ void job_enter_cond(Job *job, bool(*fn)(Job *job))
     timer_del(&job->sleep_timer);
     job->busy = true;
     _job_unlock();
+    job_unlock();
     aio_co_enter(job->aio_context, job->co);
+    job_lock();
 }
 
+/* Called with job_mutex *not* held. */
 void job_enter(Job *job)
 {
+    JOB_LOCK_GUARD();
     job_enter_cond(job, NULL);
 }
 
@@ -580,7 +602,10 @@ void job_enter(Job *job)
  * is allowed and cancels the timer.
  *
  * If @ns is (uint64_t) -1, no timer is scheduled and job_enter() must be
- * called explicitly. */
+ * called explicitly.
+ *
+ * Called with job_mutex held, but releases it temporarly.
+ */
 static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
 {
     _job_lock();
@@ -590,28 +615,39 @@ static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
     job->busy = false;
     job_event_idle(job);
     _job_unlock();
+    job_unlock();
     qemu_coroutine_yield();
+    job_lock();
 
     /* Set by job_enter_cond() before re-entering the coroutine.  */
     assert(job->busy);
 }
 
+/*
+ * Called with job_mutex *not* held (we don't want the coroutine
+ * to yield with the lock held!).
+ */
 void coroutine_fn job_pause_point(Job *job)
 {
     assert(job && job_started(job));
 
+    job_lock();
     if (!job_should_pause(job)) {
+        job_unlock();
         return;
     }
-    if (job_is_cancelled(job)) {
+    if (job_is_cancelled_locked(job)) {
+        job_unlock();
         return;
     }
 
     if (job->driver->pause) {
+        job_unlock();
         job->driver->pause(job);
+        job_lock();
     }
 
-    if (job_should_pause(job) && !job_is_cancelled(job)) {
+    if (job_should_pause(job) && !job_is_cancelled_locked(job)) {
         JobStatus status = job->status;
         job_state_transition(job, status == JOB_STATUS_READY
                                   ? JOB_STATUS_STANDBY
@@ -621,45 +657,58 @@ void coroutine_fn job_pause_point(Job *job)
         job->paused = false;
         job_state_transition(job, status);
     }
+    job_unlock();
 
     if (job->driver->resume) {
         job->driver->resume(job);
     }
 }
 
+/*
+ * Called with job_mutex *not* held (we don't want the coroutine
+ * to yield with the lock held!).
+ */
 void job_yield(Job *job)
 {
-    assert(job->busy);
+    WITH_JOB_LOCK_GUARD() {
+        assert(job->busy);
 
-    /* Check cancellation *before* setting busy = false, too!  */
-    if (job_is_cancelled(job)) {
-        return;
-    }
+        /* Check cancellation *before* setting busy = false, too!  */
+        if (job_is_cancelled_locked(job)) {
+            return;
+        }
 
-    if (!job_should_pause(job)) {
-        job_do_yield(job, -1);
+        if (!job_should_pause(job)) {
+            job_do_yield(job, -1);
+        }
     }
 
     job_pause_point(job);
 }
 
+/*
+ * Called with job_mutex *not* held (we don't want the coroutine
+ * to yield with the lock held!).
+ */
 void coroutine_fn job_sleep_ns(Job *job, int64_t ns)
 {
-    assert(job->busy);
+    WITH_JOB_LOCK_GUARD() {
+        assert(job->busy);
 
-    /* Check cancellation *before* setting busy = false, too!  */
-    if (job_is_cancelled(job)) {
-        return;
-    }
+        /* Check cancellation *before* setting busy = false, too!  */
+        if (job_is_cancelled_locked(job)) {
+            return;
+        }
 
-    if (!job_should_pause(job)) {
-        job_do_yield(job, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + ns);
+        if (!job_should_pause(job)) {
+            job_do_yield(job, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + ns);
+        }
     }
 
     job_pause_point(job);
 }
 
-/* Assumes the block_job_mutex is held */
+/* Assumes the job_mutex is held */
 static bool job_timer_not_pending(Job *job)
 {
     return !timer_pending(&job->sleep_timer);
@@ -669,7 +718,7 @@ void job_pause(Job *job)
 {
     job->pause_count++;
     if (!job->paused) {
-        job_enter(job);
+        job_enter_cond(job, NULL);
     }
 }
 
@@ -723,12 +772,15 @@ void job_user_resume(Job *job, Error **errp)
         return;
     }
     if (job->driver->user_resume) {
+        job_unlock();
         job->driver->user_resume(job);
+        job_lock();
     }
     job->user_paused = false;
     job_resume(job);
 }
 
+/* Called with job_mutex held. */
 static void job_do_dismiss(Job *job)
 {
     assert(job);
@@ -767,6 +819,7 @@ void job_early_fail(Job *job)
     job_early_fail_locked(job);
 }
 
+/* Called with job_mutex held. */
 static void job_conclude(Job *job)
 {
     job_state_transition(job, JOB_STATUS_CONCLUDED);
@@ -775,9 +828,10 @@ static void job_conclude(Job *job)
     }
 }
 
+/* Called with job_mutex held. */
 static void job_update_rc(Job *job)
 {
-    if (!job->ret && job_is_cancelled(job)) {
+    if (!job->ret && job_is_cancelled_locked(job)) {
         job->ret = -ECANCELED;
     }
     if (job->ret) {
@@ -788,34 +842,45 @@ static void job_update_rc(Job *job)
     }
 }
 
+/* Called with job_mutex held, but releases it temporarly */
 static void job_commit(Job *job)
 {
     assert(!job->ret);
     assert(qemu_in_main_thread());
     if (job->driver->commit) {
+        job_unlock();
         job->driver->commit(job);
+        job_lock();
     }
 }
 
+/* Called with job_mutex held, but releases it temporarly */
 static void job_abort(Job *job)
 {
     assert(job->ret);
     assert(qemu_in_main_thread());
     if (job->driver->abort) {
+        job_unlock();
         job->driver->abort(job);
+        job_lock();
     }
 }
 
+/* Called with job_mutex held, but releases it temporarly */
 static void job_clean(Job *job)
 {
     assert(qemu_in_main_thread());
     if (job->driver->clean) {
+        job_unlock();
         job->driver->clean(job);
+        job_lock();
     }
 }
 
+/* Called with job_mutex held, but releases it temporarly. */
 static int job_finalize_single(Job *job)
 {
+    int job_ret;
     AioContext *ctx = job->aio_context;
 
     assert(job_is_completed(job));
@@ -835,12 +900,15 @@ static int job_finalize_single(Job *job)
     aio_context_release(ctx);
 
     if (job->cb) {
-        job->cb(job->opaque, job->ret);
+        job_ret = job->ret;
+        job_unlock();
+        job->cb(job->opaque, job_ret);
+        job_lock();
     }
 
     /* Emit events only if we actually started */
     if (job_started(job)) {
-        if (job_is_cancelled(job)) {
+        if (job_is_cancelled_locked(job)) {
             job_event_cancelled(job);
         } else {
             job_event_completed(job);
@@ -852,11 +920,14 @@ static int job_finalize_single(Job *job)
     return 0;
 }
 
+/* Called with job_mutex held, but releases it temporarly. */
 static void job_cancel_async(Job *job, bool force)
 {
     assert(qemu_in_main_thread());
     if (job->driver->cancel) {
+        job_unlock();
         force = job->driver->cancel(job, force);
+        job_lock();
     } else {
         /* No .cancel() means the job will behave as if force-cancelled */
         force = true;
@@ -865,7 +936,9 @@ static void job_cancel_async(Job *job, bool force)
     if (job->user_paused) {
         /* Do not call job_enter here, the caller will handle it.  */
         if (job->driver->user_resume) {
+            job_unlock();
             job->driver->user_resume(job);
+            job_lock();
         }
         job->user_paused = false;
         assert(job->pause_count > 0);
@@ -886,6 +959,7 @@ static void job_cancel_async(Job *job, bool force)
     }
 }
 
+/* Called with job_mutex held. */
 static void job_completed_txn_abort(Job *job)
 {
     AioContext *ctx;
@@ -935,7 +1009,7 @@ static void job_completed_txn_abort(Job *job)
         ctx = other_job->aio_context;
         aio_context_acquire(ctx);
         if (!job_is_completed(other_job)) {
-            assert(job_cancel_requested(other_job));
+            assert(job_cancel_requested_locked(other_job));
             job_finish_sync(other_job, NULL, NULL);
         }
         job_finalize_single(other_job);
@@ -952,26 +1026,33 @@ static void job_completed_txn_abort(Job *job)
     job_txn_unref(txn);
 }
 
+/* Called with job_mutex held, but releases it temporarly. */
 static int job_prepare(Job *job)
 {
+    int ret;
     AioContext *ctx = job->aio_context;
     assert(qemu_in_main_thread());
 
     if (job->ret == 0 && job->driver->prepare) {
+        job_unlock();
         aio_context_acquire(ctx);
-        job->ret = job->driver->prepare(job);
+        ret = job->driver->prepare(job);
         aio_context_release(ctx);
+        job_lock();
+        job->ret = ret;
         job_update_rc(job);
     }
 
     return job->ret;
 }
 
+/* Called with job_mutex held. */
 static int job_needs_finalize(Job *job)
 {
     return !job->auto_finalize;
 }
 
+/* Called with job_mutex held. */
 static void job_do_finalize(Job *job)
 {
     int rc;
@@ -995,6 +1076,7 @@ void job_finalize(Job *job, Error **errp)
     job_do_finalize(job);
 }
 
+/* Called with job_mutex held. */
 static int job_transition_to_pending(Job *job)
 {
     job_state_transition(job, JOB_STATUS_PENDING);
@@ -1004,12 +1086,15 @@ static int job_transition_to_pending(Job *job)
     return 0;
 }
 
+/* Called with job_mutex *not* held. */
 void job_transition_to_ready(Job *job)
 {
+    JOB_LOCK_GUARD();
     job_state_transition(job, JOB_STATUS_READY);
     job_event_ready(job);
 }
 
+/* Called with job_mutex held. */
 static void job_completed_txn_success(Job *job)
 {
     JobTxn *txn = job->txn;
@@ -1036,6 +1121,7 @@ static void job_completed_txn_success(Job *job)
     }
 }
 
+/* Called with job_mutex held. */
 static void job_completed(Job *job)
 {
     assert(job && job->txn && !job_is_completed(job));
@@ -1049,12 +1135,16 @@ static void job_completed(Job *job)
     }
 }
 
-/** Useful only as a type shim for aio_bh_schedule_oneshot. */
+/**
+ * Useful only as a type shim for aio_bh_schedule_oneshot.
+ *  Called with job_mutex *not* held.
+ */
 static void job_exit(void *opaque)
 {
     Job *job = (Job *)opaque;
     AioContext *ctx;
 
+    JOB_LOCK_GUARD();
     job_ref(job);
     aio_context_acquire(job->aio_context);
 
@@ -1081,28 +1171,36 @@ static void job_exit(void *opaque)
 /**
  * All jobs must allow a pause point before entering their job proper. This
  * ensures that jobs can be paused prior to being started, then resumed later.
+ *
+ * Called with job_mutex *not* held.
  */
 static void coroutine_fn job_co_entry(void *opaque)
 {
     Job *job = opaque;
-
+    int ret;
     assert(job && job->driver && job->driver->run);
     job_pause_point(job);
-    job->ret = job->driver->run(job, &job->err);
-    job->deferred_to_main_loop = true;
-    job->busy = true;
+    ret = job->driver->run(job, &job->err);
+    WITH_JOB_LOCK_GUARD() {
+        job->ret = ret;
+        job->deferred_to_main_loop = true;
+        job->busy = true;
+    }
     aio_bh_schedule_oneshot(qemu_get_aio_context(), job_exit, job);
 }
 
+/* Called with job_mutex *not* held. */
 void job_start(Job *job)
 {
-    assert(job && !job_started(job) && job->paused &&
-           job->driver && job->driver->run);
-    job->co = qemu_coroutine_create(job_co_entry, job);
-    job->pause_count--;
-    job->busy = true;
-    job->paused = false;
-    job_state_transition(job, JOB_STATUS_RUNNING);
+    WITH_JOB_LOCK_GUARD() {
+        assert(job && !job_started(job) && job->paused &&
+            job->driver && job->driver->run);
+        job->co = qemu_coroutine_create(job_co_entry, job);
+        job->pause_count--;
+        job->busy = true;
+        job->paused = false;
+        job_state_transition(job, JOB_STATUS_RUNNING);
+    }
     aio_co_enter(job->aio_context, job->co);
 }
 
@@ -1126,11 +1224,11 @@ void job_cancel(Job *job, bool force)
          * choose to call job_is_cancelled() to show that we invoke
          * job_completed_txn_abort() only for force-cancelled jobs.)
          */
-        if (job_is_cancelled(job)) {
+        if (job_is_cancelled_locked(job)) {
             job_completed_txn_abort(job);
         }
     } else {
-        job_enter(job);
+        job_enter_cond(job, NULL);
     }
 }
 
@@ -1142,9 +1240,13 @@ void job_user_cancel(Job *job, bool force, Error **errp)
     job_cancel(job, force);
 }
 
-/* A wrapper around job_cancel() taking an Error ** parameter so it may be
+/*
+ * A wrapper around job_cancel() taking an Error ** parameter so it may be
  * used with job_finish_sync() without the need for (rather nasty) function
- * pointer casts there. */
+ * pointer casts there.
+ *
+ * Called with job_mutex held.
+ */
 static void job_cancel_err(Job *job, Error **errp)
 {
     job_cancel(job, false);
@@ -1152,6 +1254,8 @@ static void job_cancel_err(Job *job, Error **errp)
 
 /**
  * Same as job_cancel_err(), but force-cancel.
+ *
+ * Called with job_mutex held.
  */
 static void job_force_cancel_err(Job *job, Error **errp)
 {
@@ -1167,11 +1271,17 @@ int job_cancel_sync(Job *job, bool force)
     }
 }
 
+/*
+ * Called with job_lock *not* held, unlike most other APIs consumed
+ * by the monitor! This is primarly to avoid adding lock-unlock
+ * patterns in the caller.
+ */
 void job_cancel_sync_all(void)
 {
     Job *job;
     AioContext *aio_context;
 
+    JOB_LOCK_GUARD();
     while ((job = job_next(NULL))) {
         aio_context = job->aio_context;
         aio_context_acquire(aio_context);
@@ -1193,13 +1303,15 @@ void job_complete(Job *job, Error **errp)
     if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
         return;
     }
-    if (job_cancel_requested(job) || !job->driver->complete) {
+    if (job_cancel_requested_locked(job) || !job->driver->complete) {
         error_setg(errp, "The active block job '%s' cannot be completed",
                    job->id);
         return;
     }
 
+    job_unlock();
     job->driver->complete(job, errp);
+    job_lock();
 }
 
 int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
@@ -1218,10 +1330,13 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
         return -EBUSY;
     }
 
+    job_unlock();
     AIO_WAIT_WHILE(job->aio_context,
-                   (job_enter(job), !job_is_completed(job)));
+                   (job_enter(job), !job_is_completed_unlocked(job)));
+    job_lock();
 
-    ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : job->ret;
+    ret = (job_is_cancelled_locked(job) && job->ret == 0) ?
+           -ECANCELED : job->ret;
     job_unref(job);
     return ret;
 }
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 5c0d5e116b..a0b023cac1 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -129,9 +129,11 @@ void qmp_cont(Error **errp)
         blk_iostatus_reset(blk);
     }
 
+    job_lock();
     for (job = block_job_next(NULL); job; job = block_job_next(job)) {
         block_job_iostatus_reset(job);
     }
+    job_unlock();
 
     /* Continuing after completed migration. Images have been inactivated to
      * allow the destination to take control. Need to get control back now.
diff --git a/qemu-img.c b/qemu-img.c
index f036a1d428..170c65b1b7 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -906,9 +906,11 @@ static void run_block_job(BlockJob *job, Error **errp)
     int ret = 0;
 
     aio_context_acquire(aio_context);
+    job_lock();
     job_ref(&job->job);
     do {
         float progress = 0.0f;
+        job_unlock();
         aio_poll(aio_context, true);
 
         progress_get_snapshot(&job->job.progress, &progress_current,
@@ -917,7 +919,8 @@ static void run_block_job(BlockJob *job, Error **errp)
             progress = (float)progress_current / progress_total * 100.f;
         }
         qemu_progress_print(progress, 0);
-    } while (!job_is_ready(&job->job) && !job_is_completed(&job->job));
+        job_lock();
+    } while (!job_is_ready_locked(&job->job) && !job_is_completed(&job->job));
 
     if (!job_is_completed(&job->job)) {
         ret = job_complete_sync(&job->job, errp);
@@ -925,6 +928,7 @@ static void run_block_job(BlockJob *job, Error **errp)
         ret = job->job.ret;
     }
     job_unref(&job->job);
+    job_unlock();
     aio_context_release(aio_context);
 
     /* publish completion progress only when success */
@@ -1077,7 +1081,9 @@ static int img_commit(int argc, char **argv)
         bdrv_ref(bs);
     }
 
+    job_lock();
     job = block_job_get("commit");
+    job_unlock();
     assert(job);
     run_block_job(job, &local_err);
     if (local_err) {
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v2 11/14] block_job_query: remove atomic read
  2021-11-04 14:53 [RFC PATCH v2 00/14] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (9 preceding siblings ...)
  2021-11-04 14:53 ` [RFC PATCH v2 10/14] jobs: protect jobs with job_lock/unlock Emanuele Giuseppe Esposito
@ 2021-11-04 14:53 ` Emanuele Giuseppe Esposito
  2021-12-18 12:07   ` Vladimir Sementsov-Ogievskiy
  2021-11-04 14:53 ` [RFC PATCH v2 12/14] jobs: use job locks and helpers also in the unit tests Emanuele Giuseppe Esposito
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-11-04 14:53 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

Not sure what the atomic here was supposed to do, since job.busy
is protected by the job lock. Since the whole function will
be called under job_mutex, just remove the atomic.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 blockjob.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index dcc13dc336..426dcddcc1 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -314,6 +314,7 @@ int64_t block_job_ratelimit_get_delay(BlockJob *job, uint64_t n)
     return ratelimit_calculate_delay(&job->limit, n);
 }
 
+/* Called with job_mutex held */
 BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
 {
     BlockJobInfo *info;
@@ -332,13 +333,13 @@ BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
     info = g_new0(BlockJobInfo, 1);
     info->type      = g_strdup(job_type_str(&job->job));
     info->device    = g_strdup(job->job.id);
-    info->busy      = qatomic_read(&job->job.busy);
+    info->busy      = job->job.busy;
     info->paused    = job->job.pause_count > 0;
     info->offset    = progress_current;
     info->len       = progress_total;
     info->speed     = job->speed;
     info->io_status = job->iostatus;
-    info->ready     = job_is_ready(&job->job),
+    info->ready     = job_is_ready_locked(&job->job),
     info->status    = job->job.status;
     info->auto_finalize = job->job.auto_finalize;
     info->auto_dismiss  = job->job.auto_dismiss;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v2 12/14] jobs: use job locks and helpers also in the unit tests
  2021-11-04 14:53 [RFC PATCH v2 00/14] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (10 preceding siblings ...)
  2021-11-04 14:53 ` [RFC PATCH v2 11/14] block_job_query: remove atomic read Emanuele Giuseppe Esposito
@ 2021-11-04 14:53 ` Emanuele Giuseppe Esposito
  2021-11-04 14:53 ` [RFC PATCH v2 13/14] jobs: add job lock in find_* functions Emanuele Giuseppe Esposito
  2021-11-04 14:53 ` [RFC PATCH v2 14/14] job.c: enable job lock/unlock and remove Aiocontext locks Emanuele Giuseppe Esposito
  13 siblings, 0 replies; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-11-04 14:53 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

Add missing job synchronization in the unit tests, with
both explicit locks and helpers.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 tests/unit/test-bdrv-drain.c     | 40 +++++++++++-----------
 tests/unit/test-block-iothread.c |  4 +++
 tests/unit/test-blockjob-txn.c   | 10 ++++++
 tests/unit/test-blockjob.c       | 57 +++++++++++++++++++++-----------
 4 files changed, 72 insertions(+), 39 deletions(-)

diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 2d3c17e566..535c39b5a8 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -941,61 +941,63 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
         }
     }
 
-    g_assert_cmpint(job->job.pause_count, ==, 0);
-    g_assert_false(job->job.paused);
+    g_assert_cmpint(job_get_pause_count(&job->job), ==, 0);
+    g_assert_false(job_get_paused(&job->job));
     g_assert_true(tjob->running);
-    g_assert_true(job->job.busy); /* We're in qemu_co_sleep_ns() */
+    g_assert_true(job_get_busy(&job->job)); /* We're in qemu_co_sleep_ns() */
 
     do_drain_begin_unlocked(drain_type, drain_bs);
 
     if (drain_type == BDRV_DRAIN_ALL) {
         /* bdrv_drain_all() drains both src and target */
-        g_assert_cmpint(job->job.pause_count, ==, 2);
+        g_assert_cmpint(job_get_pause_count(&job->job), ==, 2);
     } else {
-        g_assert_cmpint(job->job.pause_count, ==, 1);
+        g_assert_cmpint(job_get_pause_count(&job->job), ==, 1);
     }
-    g_assert_true(job->job.paused);
-    g_assert_false(job->job.busy); /* The job is paused */
+    g_assert_true(job_get_paused(&job->job));
+    g_assert_false(job_get_busy(&job->job)); /* The job is paused */
 
     do_drain_end_unlocked(drain_type, drain_bs);
 
     if (use_iothread) {
         /* paused is reset in the I/O thread, wait for it */
-        while (job->job.paused) {
+        while (job_get_paused(&job->job)) {
             aio_poll(qemu_get_aio_context(), false);
         }
     }
 
-    g_assert_cmpint(job->job.pause_count, ==, 0);
-    g_assert_false(job->job.paused);
-    g_assert_true(job->job.busy); /* We're in qemu_co_sleep_ns() */
+    g_assert_cmpint(job_get_pause_count(&job->job), ==, 0);
+    g_assert_false(job_get_paused(&job->job));
+    g_assert_true(job_get_busy(&job->job)); /* We're in qemu_co_sleep_ns() */
 
     do_drain_begin_unlocked(drain_type, target);
 
     if (drain_type == BDRV_DRAIN_ALL) {
         /* bdrv_drain_all() drains both src and target */
-        g_assert_cmpint(job->job.pause_count, ==, 2);
+        g_assert_cmpint(job_get_pause_count(&job->job), ==, 2);
     } else {
-        g_assert_cmpint(job->job.pause_count, ==, 1);
+        g_assert_cmpint(job_get_pause_count(&job->job), ==, 1);
     }
-    g_assert_true(job->job.paused);
-    g_assert_false(job->job.busy); /* The job is paused */
+    g_assert_true(job_get_paused(&job->job));
+    g_assert_false(job_get_busy(&job->job)); /* The job is paused */
 
     do_drain_end_unlocked(drain_type, target);
 
     if (use_iothread) {
         /* paused is reset in the I/O thread, wait for it */
-        while (job->job.paused) {
+        while (job_get_paused(&job->job)) {
             aio_poll(qemu_get_aio_context(), false);
         }
     }
 
-    g_assert_cmpint(job->job.pause_count, ==, 0);
-    g_assert_false(job->job.paused);
-    g_assert_true(job->job.busy); /* We're in qemu_co_sleep_ns() */
+    g_assert_cmpint(job_get_pause_count(&job->job), ==, 0);
+    g_assert_false(job_get_paused(&job->job));
+    g_assert_true(job_get_busy(&job->job)); /* We're in qemu_co_sleep_ns() */
 
     aio_context_acquire(ctx);
+    job_lock();
     ret = job_complete_sync(&job->job, &error_abort);
+    job_unlock();
     g_assert_cmpint(ret, ==, (result == TEST_JOB_SUCCESS ? 0 : -EIO));
 
     if (use_iothread) {
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index aea660aeed..f39cb8b7ef 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -456,7 +456,9 @@ static void test_attach_blockjob(void)
     }
 
     aio_context_acquire(ctx);
+    job_lock();
     job_complete_sync(&tjob->common.job, &error_abort);
+    job_unlock();
     blk_set_aio_context(blk, qemu_get_aio_context(), &error_abort);
     aio_context_release(ctx);
 
@@ -630,7 +632,9 @@ static void test_propagate_mirror(void)
                  BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
                  false, "filter_node", MIRROR_COPY_MODE_BACKGROUND,
                  &error_abort);
+    job_lock();
     job = job_get("job0");
+    job_unlock();
     filter = bdrv_find_node("filter_node");
 
     /* Change the AioContext of src */
diff --git a/tests/unit/test-blockjob-txn.c b/tests/unit/test-blockjob-txn.c
index 8bd13b9949..1ae3a9d443 100644
--- a/tests/unit/test-blockjob-txn.c
+++ b/tests/unit/test-blockjob-txn.c
@@ -124,16 +124,20 @@ static void test_single_job(int expected)
     job = test_block_job_start(1, true, expected, &result, txn);
     job_start(&job->job);
 
+    job_lock();
     if (expected == -ECANCELED) {
         job_cancel(&job->job, false);
     }
+    job_unlock();
 
     while (result == -EINPROGRESS) {
         aio_poll(qemu_get_aio_context(), true);
     }
     g_assert_cmpint(result, ==, expected);
 
+    job_lock();
     job_txn_unref(txn);
+    job_unlock();
 }
 
 static void test_single_job_success(void)
@@ -168,6 +172,7 @@ static void test_pair_jobs(int expected1, int expected2)
     /* Release our reference now to trigger as many nice
      * use-after-free bugs as possible.
      */
+    job_lock();
     job_txn_unref(txn);
 
     if (expected1 == -ECANCELED) {
@@ -176,6 +181,7 @@ static void test_pair_jobs(int expected1, int expected2)
     if (expected2 == -ECANCELED) {
         job_cancel(&job2->job, false);
     }
+    job_unlock();
 
     while (result1 == -EINPROGRESS || result2 == -EINPROGRESS) {
         aio_poll(qemu_get_aio_context(), true);
@@ -227,7 +233,9 @@ static void test_pair_jobs_fail_cancel_race(void)
     job_start(&job1->job);
     job_start(&job2->job);
 
+    job_lock();
     job_cancel(&job1->job, false);
+    job_unlock();
 
     /* Now make job2 finish before the main loop kicks jobs.  This simulates
      * the race between a pending kick and another job completing.
@@ -242,7 +250,9 @@ static void test_pair_jobs_fail_cancel_race(void)
     g_assert_cmpint(result1, ==, -ECANCELED);
     g_assert_cmpint(result2, ==, -ECANCELED);
 
+    job_lock();
     job_txn_unref(txn);
+    job_unlock();
 }
 
 int main(int argc, char **argv)
diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c
index 4c9e1bf1e5..b94e1510c9 100644
--- a/tests/unit/test-blockjob.c
+++ b/tests/unit/test-blockjob.c
@@ -211,8 +211,11 @@ static CancelJob *create_common(Job **pjob)
     bjob = mk_job(blk, "Steve", &test_cancel_driver, true,
                   JOB_MANUAL_FINALIZE | JOB_MANUAL_DISMISS);
     job = &bjob->job;
+    job_lock();
     job_ref(job);
     assert(job->status == JOB_STATUS_CREATED);
+    job_unlock();
+
     s = container_of(bjob, CancelJob, common);
     s->blk = blk;
 
@@ -230,6 +233,7 @@ static void cancel_common(CancelJob *s)
     ctx = job->job.aio_context;
     aio_context_acquire(ctx);
 
+    job_lock();
     job_cancel_sync(&job->job, true);
     if (sts != JOB_STATUS_CREATED && sts != JOB_STATUS_CONCLUDED) {
         Job *dummy = &job->job;
@@ -237,6 +241,7 @@ static void cancel_common(CancelJob *s)
     }
     assert(job->job.status == JOB_STATUS_NULL);
     job_unref(&job->job);
+    job_unlock();
     destroy_blk(blk);
 
     aio_context_release(ctx);
@@ -259,7 +264,7 @@ static void test_cancel_running(void)
     s = create_common(&job);
 
     job_start(job);
-    assert(job->status == JOB_STATUS_RUNNING);
+    assert(job_get_status(job) == JOB_STATUS_RUNNING);
 
     cancel_common(s);
 }
@@ -272,11 +277,13 @@ static void test_cancel_paused(void)
     s = create_common(&job);
 
     job_start(job);
-    assert(job->status == JOB_STATUS_RUNNING);
+    assert(job_get_status(job) == JOB_STATUS_RUNNING);
 
+    job_lock();
     job_user_pause(job, &error_abort);
+    job_unlock();
     job_enter(job);
-    assert(job->status == JOB_STATUS_PAUSED);
+    assert(job_get_status(job) == JOB_STATUS_PAUSED);
 
     cancel_common(s);
 }
@@ -289,11 +296,11 @@ static void test_cancel_ready(void)
     s = create_common(&job);
 
     job_start(job);
-    assert(job->status == JOB_STATUS_RUNNING);
+    assert(job_get_status(job) == JOB_STATUS_RUNNING);
 
     s->should_converge = true;
     job_enter(job);
-    assert(job->status == JOB_STATUS_READY);
+    assert(job_get_status(job) == JOB_STATUS_READY);
 
     cancel_common(s);
 }
@@ -306,15 +313,17 @@ static void test_cancel_standby(void)
     s = create_common(&job);
 
     job_start(job);
-    assert(job->status == JOB_STATUS_RUNNING);
+    assert(job_get_status(job) == JOB_STATUS_RUNNING);
 
     s->should_converge = true;
     job_enter(job);
-    assert(job->status == JOB_STATUS_READY);
+    assert(job_get_status(job) == JOB_STATUS_READY);
 
+    job_lock();
     job_user_pause(job, &error_abort);
+    job_unlock();
     job_enter(job);
-    assert(job->status == JOB_STATUS_STANDBY);
+    assert(job_get_status(job) == JOB_STATUS_STANDBY);
 
     cancel_common(s);
 }
@@ -327,20 +336,22 @@ static void test_cancel_pending(void)
     s = create_common(&job);
 
     job_start(job);
-    assert(job->status == JOB_STATUS_RUNNING);
+    assert(job_get_status(job) == JOB_STATUS_RUNNING);
 
     s->should_converge = true;
     job_enter(job);
-    assert(job->status == JOB_STATUS_READY);
+    assert(job_get_status(job) == JOB_STATUS_READY);
 
+    job_lock();
     job_complete(job, &error_abort);
+    job_unlock();
     job_enter(job);
     while (!job->deferred_to_main_loop) {
         aio_poll(qemu_get_aio_context(), true);
     }
-    assert(job->status == JOB_STATUS_READY);
+    assert(job_get_status(job) == JOB_STATUS_READY);
     aio_poll(qemu_get_aio_context(), true);
-    assert(job->status == JOB_STATUS_PENDING);
+    assert(job_get_status(job) == JOB_STATUS_PENDING);
 
     cancel_common(s);
 }
@@ -353,25 +364,29 @@ static void test_cancel_concluded(void)
     s = create_common(&job);
 
     job_start(job);
-    assert(job->status == JOB_STATUS_RUNNING);
+    assert(job_get_status(job) == JOB_STATUS_RUNNING);
 
     s->should_converge = true;
     job_enter(job);
-    assert(job->status == JOB_STATUS_READY);
+    assert(job_get_status(job) == JOB_STATUS_READY);
 
+    job_lock();
     job_complete(job, &error_abort);
+    job_unlock();
     job_enter(job);
     while (!job->deferred_to_main_loop) {
         aio_poll(qemu_get_aio_context(), true);
     }
-    assert(job->status == JOB_STATUS_READY);
+    assert(job_get_status(job) == JOB_STATUS_READY);
     aio_poll(qemu_get_aio_context(), true);
-    assert(job->status == JOB_STATUS_PENDING);
+    assert(job_get_status(job) == JOB_STATUS_PENDING);
 
     aio_context_acquire(job->aio_context);
+    job_lock();
     job_finalize(job, &error_abort);
+    job_unlock();
     aio_context_release(job->aio_context);
-    assert(job->status == JOB_STATUS_CONCLUDED);
+    assert(job_get_status(job) == JOB_STATUS_CONCLUDED);
 
     cancel_common(s);
 }
@@ -459,22 +474,23 @@ static void test_complete_in_standby(void)
     bjob = mk_job(blk, "job", &test_yielding_driver, true,
                   JOB_MANUAL_FINALIZE | JOB_MANUAL_DISMISS);
     job = &bjob->job;
-    assert(job->status == JOB_STATUS_CREATED);
+    assert(job_get_status(job) == JOB_STATUS_CREATED);
 
     /* Wait for the job to become READY */
     job_start(job);
     aio_context_acquire(ctx);
-    AIO_WAIT_WHILE(ctx, job->status != JOB_STATUS_READY);
+    AIO_WAIT_WHILE(ctx, job_get_status(job) != JOB_STATUS_READY);
     aio_context_release(ctx);
 
     /* Begin the drained section, pausing the job */
     bdrv_drain_all_begin();
-    assert(job->status == JOB_STATUS_STANDBY);
+    assert(job_get_status(job) == JOB_STATUS_STANDBY);
     /* Lock the IO thread to prevent the job from being run */
     aio_context_acquire(ctx);
     /* This will schedule the job to resume it */
     bdrv_drain_all_end();
 
+    job_lock();
     /* But the job cannot run, so it will remain on standby */
     assert(job->status == JOB_STATUS_STANDBY);
 
@@ -489,6 +505,7 @@ static void test_complete_in_standby(void)
     assert(job->status == JOB_STATUS_CONCLUDED);
 
     job_dismiss(&job, &error_abort);
+    job_unlock();
 
     destroy_blk(blk);
     aio_context_release(ctx);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v2 13/14] jobs: add job lock in find_* functions
  2021-11-04 14:53 [RFC PATCH v2 00/14] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (11 preceding siblings ...)
  2021-11-04 14:53 ` [RFC PATCH v2 12/14] jobs: use job locks and helpers also in the unit tests Emanuele Giuseppe Esposito
@ 2021-11-04 14:53 ` Emanuele Giuseppe Esposito
  2021-12-18 12:11   ` Vladimir Sementsov-Ogievskiy
  2021-11-04 14:53 ` [RFC PATCH v2 14/14] job.c: enable job lock/unlock and remove Aiocontext locks Emanuele Giuseppe Esposito
  13 siblings, 1 reply; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-11-04 14:53 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

Both blockdev.c and job-qmp.c have TOC/TOU conditions, because
they first search for the job and then perform an action on it.
Therefore, we need to do the search + action under the same
job mutex critical section.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 blockdev.c | 9 +++++++++
 job-qmp.c  | 8 ++++++++
 2 files changed, 17 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index c5a835d9ed..0bd79757fc 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3327,12 +3327,14 @@ static BlockJob *find_block_job(const char *id, AioContext **aio_context,
     assert(id != NULL);
 
     *aio_context = NULL;
+    job_lock();
 
     job = block_job_get(id);
 
     if (!job) {
         error_set(errp, ERROR_CLASS_DEVICE_NOT_ACTIVE,
                   "Block job '%s' not found", id);
+        job_unlock();
         return NULL;
     }
 
@@ -3353,6 +3355,7 @@ void qmp_block_job_set_speed(const char *device, int64_t speed, Error **errp)
 
     block_job_set_speed(job, speed, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_block_job_cancel(const char *device,
@@ -3379,6 +3382,7 @@ void qmp_block_job_cancel(const char *device,
     job_user_cancel(&job->job, force, errp);
 out:
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_block_job_pause(const char *device, Error **errp)
@@ -3393,6 +3397,7 @@ void qmp_block_job_pause(const char *device, Error **errp)
     trace_qmp_block_job_pause(job);
     job_user_pause(&job->job, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_block_job_resume(const char *device, Error **errp)
@@ -3407,6 +3412,7 @@ void qmp_block_job_resume(const char *device, Error **errp)
     trace_qmp_block_job_resume(job);
     job_user_resume(&job->job, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_block_job_complete(const char *device, Error **errp)
@@ -3421,6 +3427,7 @@ void qmp_block_job_complete(const char *device, Error **errp)
     trace_qmp_block_job_complete(job);
     job_complete(&job->job, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_block_job_finalize(const char *id, Error **errp)
@@ -3444,6 +3451,7 @@ void qmp_block_job_finalize(const char *id, Error **errp)
     aio_context = blk_get_aio_context(job->blk);
     job_unref(&job->job);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_block_job_dismiss(const char *id, Error **errp)
@@ -3460,6 +3468,7 @@ void qmp_block_job_dismiss(const char *id, Error **errp)
     job = &bjob->job;
     job_dismiss(&job, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_change_backing_file(const char *device,
diff --git a/job-qmp.c b/job-qmp.c
index a355dc2954..8f07c51db8 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -35,10 +35,12 @@ static Job *find_job(const char *id, AioContext **aio_context, Error **errp)
     Job *job;
 
     *aio_context = NULL;
+    job_lock();
 
     job = job_get(id);
     if (!job) {
         error_setg(errp, "Job not found");
+        job_unlock();
         return NULL;
     }
 
@@ -60,6 +62,7 @@ void qmp_job_cancel(const char *id, Error **errp)
     trace_qmp_job_cancel(job);
     job_user_cancel(job, true, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_job_pause(const char *id, Error **errp)
@@ -74,6 +77,7 @@ void qmp_job_pause(const char *id, Error **errp)
     trace_qmp_job_pause(job);
     job_user_pause(job, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_job_resume(const char *id, Error **errp)
@@ -88,6 +92,7 @@ void qmp_job_resume(const char *id, Error **errp)
     trace_qmp_job_resume(job);
     job_user_resume(job, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_job_complete(const char *id, Error **errp)
@@ -102,6 +107,7 @@ void qmp_job_complete(const char *id, Error **errp)
     trace_qmp_job_complete(job);
     job_complete(job, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_job_finalize(const char *id, Error **errp)
@@ -125,6 +131,7 @@ void qmp_job_finalize(const char *id, Error **errp)
     aio_context = job->aio_context;
     job_unref(job);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 void qmp_job_dismiss(const char *id, Error **errp)
@@ -139,6 +146,7 @@ void qmp_job_dismiss(const char *id, Error **errp)
     trace_qmp_job_dismiss(job);
     job_dismiss(&job, errp);
     aio_context_release(aio_context);
+    job_unlock();
 }
 
 static JobInfo *job_query_single(Job *job, Error **errp)
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v2 14/14] job.c: enable job lock/unlock and remove Aiocontext locks
  2021-11-04 14:53 [RFC PATCH v2 00/14] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
                   ` (12 preceding siblings ...)
  2021-11-04 14:53 ` [RFC PATCH v2 13/14] jobs: add job lock in find_* functions Emanuele Giuseppe Esposito
@ 2021-11-04 14:53 ` Emanuele Giuseppe Esposito
  2021-12-18 12:24   ` Vladimir Sementsov-Ogievskiy
  13 siblings, 1 reply; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-11-04 14:53 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Wen Congyang, Xie Changlong, Emanuele Giuseppe Esposito,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow

Change the job_{lock/unlock} and macros to use job_mutex.

Now that they are not nop anymore, remove the aiocontext
to avoid deadlocks.

Therefore:
- when possible, remove completely the aiocontext lock/unlock pair
- if it is used by some other functions too, reduce the locking
section as much as possible, leaving the job API outside.

There is only one JobDriver callback, ->free() that assumes that
the aiocontext lock is held (because it calls bdrv_unref), so for
now keep that under aiocontext lock.

Also remove _job_{lock/unlock}, as they are replaced by the
public functions.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 include/qemu/job.h               |  7 ---
 block/replication.c              |  2 +
 blockdev.c                       | 62 ++++--------------------
 job-qmp.c                        | 38 ++++-----------
 job.c                            | 81 ++++----------------------------
 tests/unit/test-bdrv-drain.c     |  4 +-
 tests/unit/test-block-iothread.c |  2 +-
 tests/unit/test-blockjob.c       | 13 ++---
 8 files changed, 34 insertions(+), 175 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 58b3af47e3..d417e1b601 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -608,8 +608,6 @@ void job_user_cancel(Job *job, bool force, Error **errp);
  *
  * Returns the return value from the job if the job actually completed
  * during the call, or -ECANCELED if it was canceled.
- *
- * Callers must hold the AioContext lock of job->aio_context.
  */
 int job_cancel_sync(Job *job, bool force);
 
@@ -633,9 +631,6 @@ void job_cancel_sync_all(void);
  * function).
  *
  * Returns the return value from the job.
- *
- * Callers must hold the AioContext lock of job->aio_context.
- *
  * Called between job_lock and job_unlock.
  */
 int job_complete_sync(Job *job, Error **errp);
@@ -667,8 +662,6 @@ void job_dismiss(Job **job, Error **errp);
  * Returns 0 if the job is successfully completed, -ECANCELED if the job was
  * cancelled before completing, and -errno in other error cases.
  *
- * Callers must hold the AioContext lock of job->aio_context.
- *
  * Called between job_lock and job_unlock.
  */
 int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp);
diff --git a/block/replication.c b/block/replication.c
index 0f487cc215..6a60c1af1a 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -728,9 +728,11 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
          * disk, secondary disk in backup_job_completed().
          */
         if (s->backup_job) {
+            aio_context_release(aio_context);
             job_lock();
             job_cancel_sync(&s->backup_job->job, true);
             job_unlock();
+            aio_context_acquire(aio_context);
         }
 
         if (!failover) {
diff --git a/blockdev.c b/blockdev.c
index 0bd79757fc..dfc73ef1bf 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -154,12 +154,7 @@ void blockdev_mark_auto_del(BlockBackend *blk)
 
     for (job = block_job_next(NULL); job; job = block_job_next(job)) {
         if (block_job_has_bdrv(job, blk_bs(blk))) {
-            AioContext *aio_context = job->job.aio_context;
-            aio_context_acquire(aio_context);
-
             job_cancel(&job->job, false);
-
-            aio_context_release(aio_context);
         }
     }
 
@@ -1843,16 +1838,9 @@ static void drive_backup_abort(BlkActionState *common)
     DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
 
     if (state->job) {
-        AioContext *aio_context;
-
-        aio_context = bdrv_get_aio_context(state->bs);
-        aio_context_acquire(aio_context);
-
         job_lock();
         job_cancel_sync(&state->job->job, true);
         job_unlock();
-
-        aio_context_release(aio_context);
     }
 }
 
@@ -1946,16 +1934,9 @@ static void blockdev_backup_abort(BlkActionState *common)
     BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, common);
 
     if (state->job) {
-        AioContext *aio_context;
-
-        aio_context = bdrv_get_aio_context(state->bs);
-        aio_context_acquire(aio_context);
-
         job_lock();
         job_cancel_sync(&state->job->job, true);
         job_unlock();
-
-        aio_context_release(aio_context);
     }
 }
 
@@ -3318,15 +3299,13 @@ out:
     aio_context_release(aio_context);
 }
 
-/* Get a block job using its ID and acquire its AioContext */
-static BlockJob *find_block_job(const char *id, AioContext **aio_context,
-                                Error **errp)
+/* Get a block job using its ID. Returns with job_lock held on success */
+static BlockJob *find_block_job(const char *id, Error **errp)
 {
     BlockJob *job;
 
     assert(id != NULL);
 
-    *aio_context = NULL;
     job_lock();
 
     job = block_job_get(id);
@@ -3338,31 +3317,25 @@ static BlockJob *find_block_job(const char *id, AioContext **aio_context,
         return NULL;
     }
 
-    *aio_context = blk_get_aio_context(job->blk);
-    aio_context_acquire(*aio_context);
-
     return job;
 }
 
 void qmp_block_job_set_speed(const char *device, int64_t speed, Error **errp)
 {
-    AioContext *aio_context;
-    BlockJob *job = find_block_job(device, &aio_context, errp);
+    BlockJob *job = find_block_job(device, errp);
 
     if (!job) {
         return;
     }
 
     block_job_set_speed(job, speed, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_block_job_cancel(const char *device,
                           bool has_force, bool force, Error **errp)
 {
-    AioContext *aio_context;
-    BlockJob *job = find_block_job(device, &aio_context, errp);
+    BlockJob *job = find_block_job(device, errp);
 
     if (!job) {
         return;
@@ -3381,14 +3354,12 @@ void qmp_block_job_cancel(const char *device,
     trace_qmp_block_job_cancel(job);
     job_user_cancel(&job->job, force, errp);
 out:
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_block_job_pause(const char *device, Error **errp)
 {
-    AioContext *aio_context;
-    BlockJob *job = find_block_job(device, &aio_context, errp);
+    BlockJob *job = find_block_job(device, errp);
 
     if (!job) {
         return;
@@ -3396,14 +3367,12 @@ void qmp_block_job_pause(const char *device, Error **errp)
 
     trace_qmp_block_job_pause(job);
     job_user_pause(&job->job, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_block_job_resume(const char *device, Error **errp)
 {
-    AioContext *aio_context;
-    BlockJob *job = find_block_job(device, &aio_context, errp);
+    BlockJob *job = find_block_job(device, errp);
 
     if (!job) {
         return;
@@ -3411,14 +3380,12 @@ void qmp_block_job_resume(const char *device, Error **errp)
 
     trace_qmp_block_job_resume(job);
     job_user_resume(&job->job, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_block_job_complete(const char *device, Error **errp)
 {
-    AioContext *aio_context;
-    BlockJob *job = find_block_job(device, &aio_context, errp);
+    BlockJob *job = find_block_job(device, errp);
 
     if (!job) {
         return;
@@ -3426,14 +3393,12 @@ void qmp_block_job_complete(const char *device, Error **errp)
 
     trace_qmp_block_job_complete(job);
     job_complete(&job->job, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_block_job_finalize(const char *id, Error **errp)
 {
-    AioContext *aio_context;
-    BlockJob *job = find_block_job(id, &aio_context, errp);
+    BlockJob *job = find_block_job(id, errp);
 
     if (!job) {
         return;
@@ -3443,21 +3408,13 @@ void qmp_block_job_finalize(const char *id, Error **errp)
     job_ref(&job->job);
     job_finalize(&job->job, errp);
 
-    /*
-     * Job's context might have changed via job_finalize (and job_txn_apply
-     * automatically acquires the new one), so make sure we release the correct
-     * one.
-     */
-    aio_context = blk_get_aio_context(job->blk);
     job_unref(&job->job);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_block_job_dismiss(const char *id, Error **errp)
 {
-    AioContext *aio_context;
-    BlockJob *bjob = find_block_job(id, &aio_context, errp);
+    BlockJob *bjob = find_block_job(id, errp);
     Job *job;
 
     if (!bjob) {
@@ -3467,7 +3424,6 @@ void qmp_block_job_dismiss(const char *id, Error **errp)
     trace_qmp_block_job_dismiss(bjob);
     job = &bjob->job;
     job_dismiss(&job, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
diff --git a/job-qmp.c b/job-qmp.c
index 8f07c51db8..d592780953 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -29,12 +29,11 @@
 #include "qapi/error.h"
 #include "trace/trace-root.h"
 
-/* Get a job using its ID and acquire its AioContext */
-static Job *find_job(const char *id, AioContext **aio_context, Error **errp)
+/* Get a job using its ID. Returns with job_lock held on success. */
+static Job *find_job(const char *id, Error **errp)
 {
     Job *job;
 
-    *aio_context = NULL;
     job_lock();
 
     job = job_get(id);
@@ -44,16 +43,12 @@ static Job *find_job(const char *id, AioContext **aio_context, Error **errp)
         return NULL;
     }
 
-    *aio_context = job->aio_context;
-    aio_context_acquire(*aio_context);
-
     return job;
 }
 
 void qmp_job_cancel(const char *id, Error **errp)
 {
-    AioContext *aio_context;
-    Job *job = find_job(id, &aio_context, errp);
+    Job *job = find_job(id, errp);
 
     if (!job) {
         return;
@@ -61,14 +56,12 @@ void qmp_job_cancel(const char *id, Error **errp)
 
     trace_qmp_job_cancel(job);
     job_user_cancel(job, true, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_job_pause(const char *id, Error **errp)
 {
-    AioContext *aio_context;
-    Job *job = find_job(id, &aio_context, errp);
+    Job *job = find_job(id, errp);
 
     if (!job) {
         return;
@@ -76,14 +69,12 @@ void qmp_job_pause(const char *id, Error **errp)
 
     trace_qmp_job_pause(job);
     job_user_pause(job, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_job_resume(const char *id, Error **errp)
 {
-    AioContext *aio_context;
-    Job *job = find_job(id, &aio_context, errp);
+    Job *job = find_job(id, errp);
 
     if (!job) {
         return;
@@ -91,14 +82,12 @@ void qmp_job_resume(const char *id, Error **errp)
 
     trace_qmp_job_resume(job);
     job_user_resume(job, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_job_complete(const char *id, Error **errp)
 {
-    AioContext *aio_context;
-    Job *job = find_job(id, &aio_context, errp);
+    Job *job = find_job(id, errp);
 
     if (!job) {
         return;
@@ -106,14 +95,12 @@ void qmp_job_complete(const char *id, Error **errp)
 
     trace_qmp_job_complete(job);
     job_complete(job, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_job_finalize(const char *id, Error **errp)
 {
-    AioContext *aio_context;
-    Job *job = find_job(id, &aio_context, errp);
+    Job *job = find_job(id, errp);
 
     if (!job) {
         return;
@@ -123,21 +110,13 @@ void qmp_job_finalize(const char *id, Error **errp)
     job_ref(job);
     job_finalize(job, errp);
 
-    /*
-     * Job's context might have changed via job_finalize (and job_txn_apply
-     * automatically acquires the new one), so make sure we release the correct
-     * one.
-     */
-    aio_context = job->aio_context;
     job_unref(job);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
 void qmp_job_dismiss(const char *id, Error **errp)
 {
-    AioContext *aio_context;
-    Job *job = find_job(id, &aio_context, errp);
+    Job *job = find_job(id, errp);
 
     if (!job) {
         return;
@@ -145,7 +124,6 @@ void qmp_job_dismiss(const char *id, Error **errp)
 
     trace_qmp_job_dismiss(job);
     job_dismiss(&job, errp);
-    aio_context_release(aio_context);
     job_unlock();
 }
 
diff --git a/job.c b/job.c
index 5efbf38a72..f225e770f5 100644
--- a/job.c
+++ b/job.c
@@ -98,26 +98,16 @@ struct JobTxn {
     int refcnt;
 };
 
-#define JOB_LOCK_GUARD() /* QEMU_LOCK_GUARD(&job_mutex) */
+#define JOB_LOCK_GUARD() QEMU_LOCK_GUARD(&job_mutex)
 
-#define WITH_JOB_LOCK_GUARD() /* WITH_QEMU_LOCK_GUARD(&job_mutex) */
+#define WITH_JOB_LOCK_GUARD() WITH_QEMU_LOCK_GUARD(&job_mutex)
 
 void job_lock(void)
-{
-    /* nop */
-}
-
-void job_unlock(void)
-{
-    /* nop */
-}
-
-static void _job_lock(void)
 {
     qemu_mutex_lock(&job_mutex);
 }
 
-static void _job_unlock(void)
+void job_unlock(void)
 {
     qemu_mutex_unlock(&job_mutex);
 }
@@ -185,7 +175,6 @@ static int job_txn_apply(Job *job, int fn(Job *))
      * break AIO_WAIT_WHILE from within fn.
      */
     job_ref(job);
-    aio_context_release(job->aio_context);
 
     QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
         rc = fn(other_job);
@@ -194,11 +183,6 @@ static int job_txn_apply(Job *job, int fn(Job *))
         }
     }
 
-    /*
-     * Note that job->aio_context might have been changed by calling fn, so we
-     * can't use a local variable to cache it.
-     */
-    aio_context_acquire(job->aio_context);
     job_unref(job);
     return rc;
 }
@@ -494,7 +478,10 @@ void job_unref(Job *job)
 
         if (job->driver->free) {
             job_unlock();
+            /* FIXME: aiocontext lock is required because cb calls blk_unref */
+            aio_context_acquire(job->aio_context);
             job->driver->free(job);
+            aio_context_release(job->aio_context);
             job_lock();
         }
 
@@ -570,21 +557,17 @@ void job_enter_cond(Job *job, bool(*fn)(Job *job))
         return;
     }
 
-    _job_lock();
     if (job->busy) {
-        _job_unlock();
         return;
     }
 
     if (fn && !fn(job)) {
-        _job_unlock();
         return;
     }
 
     assert(!job->deferred_to_main_loop);
     timer_del(&job->sleep_timer);
     job->busy = true;
-    _job_unlock();
     job_unlock();
     aio_co_enter(job->aio_context, job->co);
     job_lock();
@@ -608,13 +591,11 @@ void job_enter(Job *job)
  */
 static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
 {
-    _job_lock();
     if (ns != -1) {
         timer_mod(&job->sleep_timer, ns);
     }
     job->busy = false;
     job_event_idle(job);
-    _job_unlock();
     job_unlock();
     qemu_coroutine_yield();
     job_lock();
@@ -962,7 +943,6 @@ static void job_cancel_async(Job *job, bool force)
 /* Called with job_mutex held. */
 static void job_completed_txn_abort(Job *job)
 {
-    AioContext *ctx;
     JobTxn *txn = job->txn;
     Job *other_job;
 
@@ -975,54 +955,28 @@ static void job_completed_txn_abort(Job *job)
     txn->aborting = true;
     job_txn_ref(txn);
 
-    /*
-     * We can only hold the single job's AioContext lock while calling
-     * job_finalize_single() because the finalization callbacks can involve
-     * calls of AIO_WAIT_WHILE(), which could deadlock otherwise.
-     * Note that the job's AioContext may change when it is finalized.
-     */
-    job_ref(job);
-    aio_context_release(job->aio_context);
-
     /* Other jobs are effectively cancelled by us, set the status for
      * them; this job, however, may or may not be cancelled, depending
      * on the caller, so leave it. */
     QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
         if (other_job != job) {
-            ctx = other_job->aio_context;
-            aio_context_acquire(ctx);
             /*
              * This is a transaction: If one job failed, no result will matter.
              * Therefore, pass force=true to terminate all other jobs as quickly
              * as possible.
              */
             job_cancel_async(other_job, true);
-            aio_context_release(ctx);
         }
     }
     while (!QLIST_EMPTY(&txn->jobs)) {
         other_job = QLIST_FIRST(&txn->jobs);
-        /*
-         * The job's AioContext may change, so store it in @ctx so we
-         * release the same context that we have acquired before.
-         */
-        ctx = other_job->aio_context;
-        aio_context_acquire(ctx);
         if (!job_is_completed(other_job)) {
             assert(job_cancel_requested_locked(other_job));
             job_finish_sync(other_job, NULL, NULL);
         }
         job_finalize_single(other_job);
-        aio_context_release(ctx);
     }
 
-    /*
-     * Use job_ref()/job_unref() so we can read the AioContext here
-     * even if the job went away during job_finalize_single().
-     */
-    aio_context_acquire(job->aio_context);
-    job_unref(job);
-
     job_txn_unref(txn);
 }
 
@@ -1142,11 +1096,7 @@ static void job_completed(Job *job)
 static void job_exit(void *opaque)
 {
     Job *job = (Job *)opaque;
-    AioContext *ctx;
-
     JOB_LOCK_GUARD();
-    job_ref(job);
-    aio_context_acquire(job->aio_context);
 
     /* This is a lie, we're not quiescent, but still doing the completion
      * callbacks. However, completion callbacks tend to involve operations that
@@ -1156,16 +1106,6 @@ static void job_exit(void *opaque)
     job_event_idle(job);
 
     job_completed(job);
-
-    /*
-     * Note that calling job_completed can move the job to a different
-     * aio_context, so we cannot cache from above. job_txn_apply takes care of
-     * acquiring the new lock, and we ref/unref to avoid job_completed freeing
-     * the job underneath us.
-     */
-    ctx = job->aio_context;
-    job_unref(job);
-    aio_context_release(ctx);
 }
 
 /**
@@ -1279,14 +1219,10 @@ int job_cancel_sync(Job *job, bool force)
 void job_cancel_sync_all(void)
 {
     Job *job;
-    AioContext *aio_context;
 
     JOB_LOCK_GUARD();
     while ((job = job_next(NULL))) {
-        aio_context = job->aio_context;
-        aio_context_acquire(aio_context);
         job_cancel_sync(job, true);
-        aio_context_release(aio_context);
     }
 }
 
@@ -1331,8 +1267,9 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
     }
 
     job_unlock();
-    AIO_WAIT_WHILE(job->aio_context,
-                   (job_enter(job), !job_is_completed_unlocked(job)));
+    AIO_WAIT_WHILE_UNLOCKED(job->aio_context,
+                            (job_enter(job),
+                            !job_is_completed_unlocked(job)));
     job_lock();
 
     ret = (job_is_cancelled_locked(job) && job->ret == 0) ?
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 535c39b5a8..83485a33aa 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -928,9 +928,9 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
         tjob->prepare_ret = -EIO;
         break;
     }
+    aio_context_release(ctx);
 
     job_start(&job->job);
-    aio_context_release(ctx);
 
     if (use_iothread) {
         /* job_co_entry() is run in the I/O thread, wait for the actual job
@@ -994,12 +994,12 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
     g_assert_false(job_get_paused(&job->job));
     g_assert_true(job_get_busy(&job->job)); /* We're in qemu_co_sleep_ns() */
 
-    aio_context_acquire(ctx);
     job_lock();
     ret = job_complete_sync(&job->job, &error_abort);
     job_unlock();
     g_assert_cmpint(ret, ==, (result == TEST_JOB_SUCCESS ? 0 : -EIO));
 
+    aio_context_acquire(ctx);
     if (use_iothread) {
         blk_set_aio_context(blk_src, qemu_get_aio_context(), &error_abort);
         assert(blk_get_aio_context(blk_target) == qemu_get_aio_context());
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index f39cb8b7ef..cf197347b7 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -455,10 +455,10 @@ static void test_attach_blockjob(void)
         aio_poll(qemu_get_aio_context(), false);
     }
 
-    aio_context_acquire(ctx);
     job_lock();
     job_complete_sync(&tjob->common.job, &error_abort);
     job_unlock();
+    aio_context_acquire(ctx);
     blk_set_aio_context(blk, qemu_get_aio_context(), &error_abort);
     aio_context_release(ctx);
 
diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c
index b94e1510c9..11cff70a6b 100644
--- a/tests/unit/test-blockjob.c
+++ b/tests/unit/test-blockjob.c
@@ -228,10 +228,6 @@ static void cancel_common(CancelJob *s)
     BlockJob *job = &s->common;
     BlockBackend *blk = s->blk;
     JobStatus sts = job->job.status;
-    AioContext *ctx;
-
-    ctx = job->job.aio_context;
-    aio_context_acquire(ctx);
 
     job_lock();
     job_cancel_sync(&job->job, true);
@@ -244,7 +240,6 @@ static void cancel_common(CancelJob *s)
     job_unlock();
     destroy_blk(blk);
 
-    aio_context_release(ctx);
 }
 
 static void test_cancel_created(void)
@@ -381,11 +376,9 @@ static void test_cancel_concluded(void)
     aio_poll(qemu_get_aio_context(), true);
     assert(job_get_status(job) == JOB_STATUS_PENDING);
 
-    aio_context_acquire(job->aio_context);
     job_lock();
     job_finalize(job, &error_abort);
     job_unlock();
-    aio_context_release(job->aio_context);
     assert(job_get_status(job) == JOB_STATUS_CONCLUDED);
 
     cancel_common(s);
@@ -478,9 +471,7 @@ static void test_complete_in_standby(void)
 
     /* Wait for the job to become READY */
     job_start(job);
-    aio_context_acquire(ctx);
-    AIO_WAIT_WHILE(ctx, job_get_status(job) != JOB_STATUS_READY);
-    aio_context_release(ctx);
+    AIO_WAIT_WHILE_UNLOCKED(ctx, job_get_status(job) != JOB_STATUS_READY);
 
     /* Begin the drained section, pausing the job */
     bdrv_drain_all_begin();
@@ -498,6 +489,7 @@ static void test_complete_in_standby(void)
     job_complete(job, &error_abort);
 
     /* The test is done now, clean up. */
+    aio_context_release(ctx);
     job_finish_sync(job, NULL, &error_abort);
     assert(job->status == JOB_STATUS_PENDING);
 
@@ -507,6 +499,7 @@ static void test_complete_in_standby(void)
     job_dismiss(&job, &error_abort);
     job_unlock();
 
+    aio_context_acquire(ctx);
     destroy_blk(blk);
     aio_context_release(ctx);
     iothread_join(iothread);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v2 01/14] job.c: make job_lock/unlock public
  2021-11-04 14:53 ` [RFC PATCH v2 01/14] job.c: make job_lock/unlock public Emanuele Giuseppe Esposito
@ 2021-12-16 16:18   ` Stefan Hajnoczi
  0 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2021-12-16 16:18 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Paolo Bonzini, John Snow

[-- Attachment #1: Type: text/plain, Size: 3030 bytes --]

On Thu, Nov 04, 2021 at 10:53:21AM -0400, Emanuele Giuseppe Esposito wrote:
> job mutex will be used to protect the job struct elements and list,
> replacing AioContext locks.
> 
> Right now use a shared lock for all jobs, in order to keep things
> simple. Once the AioContext lock is gone, we can introduce per-job
> locks.
> 
> To simplify the switch from aiocontext to job lock, introduce
> *nop* lock/unlock functions and macros. Once everything is protected
> by jobs, we can add the mutex and remove the aiocontext.
> Since job_mutex is already being used, add static
> _job_{lock/unlock}.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/qemu/job.h | 18 ++++++++++++++++++
>  job.c              | 39 +++++++++++++++++++++++++++------------
>  2 files changed, 45 insertions(+), 12 deletions(-)
> 
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index 7e9e59f4b8..ccf7826426 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -297,6 +297,24 @@ typedef enum JobCreateFlags {
>      JOB_MANUAL_DISMISS = 0x04,
>  } JobCreateFlags;
>  
> +/**
> + * job_lock:
> + *
> + * Take the mutex protecting the list of jobs and their status.
> + * Most functions called by the monitor need to call job_lock
> + * and job_unlock manually.  On the other hand, function called
> + * by the block jobs themselves and by the block layer will take the
> + * lock for you.
> + */
> +void job_lock(void);
> +
> +/**
> + * job_unlock:
> + *
> + * Release the mutex protecting the list of jobs and their status.
> + */
> +void job_unlock(void);
> +
>  /**
>   * Allocate and return a new job transaction. Jobs can be added to the
>   * transaction using job_txn_add_job().
> diff --git a/job.c b/job.c
> index 94b142684f..0e4dacf028 100644
> --- a/job.c
> +++ b/job.c
> @@ -32,6 +32,12 @@
>  #include "trace/trace-root.h"
>  #include "qapi/qapi-events-job.h"
>  
> +/*
> + * job_mutex protects the jobs list, but also makes the
> + * struct job fields thread-safe.
> + */
> +static QemuMutex job_mutex;
> +
>  static QLIST_HEAD(, Job) jobs = QLIST_HEAD_INITIALIZER(jobs);
>  
>  /* Job State Transition Table */
> @@ -74,17 +80,26 @@ struct JobTxn {
>      int refcnt;
>  };
>  
> -/* Right now, this mutex is only needed to synchronize accesses to job->busy
> - * and job->sleep_timer, such as concurrent calls to job_do_yield and
> - * job_enter. */
> -static QemuMutex job_mutex;
> +#define JOB_LOCK_GUARD() /* QEMU_LOCK_GUARD(&job_mutex) */
> +
> +#define WITH_JOB_LOCK_GUARD() /* WITH_QEMU_LOCK_GUARD(&job_mutex) */
> +
> +void job_lock(void)
> +{
> +    /* nop */
> +}
> +
> +void job_unlock(void)
> +{
> +    /* nop */
> +}
>  
> -static void job_lock(void)
> +static void _job_lock(void)

QEMU code does not use leading underscores because the C standard
reserves them. real_job_lock()?

See "7.1.3 Reserved identifiers" in C99.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v2 02/14] job.h: categorize fields in struct Job
  2021-11-04 14:53 ` [RFC PATCH v2 02/14] job.h: categorize fields in struct Job Emanuele Giuseppe Esposito
@ 2021-12-16 16:21   ` Stefan Hajnoczi
  2021-12-21 14:23     ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 34+ messages in thread
From: Stefan Hajnoczi @ 2021-12-16 16:21 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Paolo Bonzini, John Snow

[-- Attachment #1: Type: text/plain, Size: 1443 bytes --]

On Thu, Nov 04, 2021 at 10:53:22AM -0400, Emanuele Giuseppe Esposito wrote:
> Categorize the fields in struct Job to understand which ones
> need to be protected by the job mutex and which don't.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>  include/qemu/job.h | 57 +++++++++++++++++++++++++++-------------------
>  1 file changed, 34 insertions(+), 23 deletions(-)
> 
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index ccf7826426..f7036ac6b3 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -40,27 +40,52 @@ typedef struct JobTxn JobTxn;
>   * Long-running operation.
>   */
>  typedef struct Job {
> +
> +    /* Fields set at initialization (job_create), and never modified */
> +
>      /** The ID of the job. May be NULL for internal jobs. */
>      char *id;
>  
> -    /** The type of this job. */
> +    /**
> +     * The type of this job.
> +     * All callbacks are called with job_mutex *not* held.
> +     */
>      const JobDriver *driver;
>  
> -    /** Reference count of the block job */
> -    int refcnt;
> -
> -    /** Current state; See @JobStatus for details. */
> -    JobStatus status;
> -
>      /** AioContext to run the job coroutine in */
>      AioContext *aio_context;

"Fields set at initialization (job_create), and never modified" does not
apply here. blockjob.c:child_job_set_aio_ctx() changes it at runtime.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v2 03/14] job.h: define locked functions
  2021-11-04 14:53 ` [RFC PATCH v2 03/14] job.h: define locked functions Emanuele Giuseppe Esposito
@ 2021-12-16 16:48   ` Stefan Hajnoczi
  2021-12-16 17:11     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 34+ messages in thread
From: Stefan Hajnoczi @ 2021-12-16 16:48 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Paolo Bonzini, John Snow

[-- Attachment #1: Type: text/plain, Size: 2491 bytes --]

On Thu, Nov 04, 2021 at 10:53:23AM -0400, Emanuele Giuseppe Esposito wrote:
>  /** Returns whether the job is ready to be completed. */
>  bool job_is_ready(Job *job);
>  
> +/** Same as job_is_ready(), but assumes job_lock is held. */
> +bool job_is_ready_locked(Job *job);

What I see here is that some functions assume job_lock is held but don't
have _locked in their name (job_ref()), some assume job_lock is held and
have _locked in their name (job_is_ready_locked()), and some assume
job_lock is not held (job_is_ready()).

That means when _locked is not in the name I don't know whether this
function requires job_lock or will deadlock if called under job_lock.

Two ways to it obvious:

1. Always have _locked in the name if the function requires job_lock.
   Functions without _locked must not be called under job_lock.

2. Don't change the name but use the type system instead:

   /*
    * Define a unique type so the compiler warns us. It's just a pointer
    * so it can be efficiently passed by value.
    */
   typedef struct { Job *job; } LockedJob;

   LockedJob job_lock(Job *job);
   Job *job_unlock(LockedJob job);

   Now the compiler catches mistakes:

   bool job_is_completed(LockedJob job);
   bool job_is_ready(Job *job);

   Job *j;
   LockedJob l;
   job_is_completed(j) -> compiler error
   job_is_completed(l) -> ok
   job_is_ready(j) -> ok
   job_is_ready(l) -> compiler error

   This approach assumes per-Job locks but a similar API is possible
   with a global job_mutex too. There just needs to be a function to
   turn Job * into LockedJob and LockedJob back into Job*.

   This is slightly exotic. It's not an approach I've seen used in C, so
   it's not idiomatic and people might find it unfamiliar.

These are just ideas. If you want to keep it the way it is, that's okay
too (although a little confusing).

> diff --git a/job.c b/job.c
> index 0e4dacf028..e393c1222f 100644
> --- a/job.c
> +++ b/job.c
> @@ -242,7 +242,8 @@ bool job_cancel_requested(Job *job)
>      return job->cancelled;
>  }
>  
> -bool job_is_ready(Job *job)
> +/* Called with job_mutex held. */

This information should go with the doc comments (and it's already there
in job.h!). There is no rule on where to put doc comments but in this
case you already added them to job.h, so they are not needed here in
job.c. Leaving them could confuse other people into adding doc comments
into job.c instead of job.h.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v2 04/14] job.h: define unlocked functions
  2021-11-04 14:53 ` [RFC PATCH v2 04/14] job.h: define unlocked functions Emanuele Giuseppe Esposito
@ 2021-12-16 16:51   ` Stefan Hajnoczi
  0 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2021-12-16 16:51 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Paolo Bonzini, John Snow

[-- Attachment #1: Type: text/plain, Size: 840 bytes --]

On Thu, Nov 04, 2021 at 10:53:24AM -0400, Emanuele Giuseppe Esposito wrote:
> All these functions assume that the lock is not held, and acquire
> it internally.
> 
> These functions will be useful when job_lock is globally applied,
> as they will allow callers to access the job struct fields
> without worrying about the job lock.
> 
> Update also the comments in blockjob.c (and move them in job.c).
> 
> Note: at this stage, job_{lock/unlock} and job lock guard macros
> are *nop*.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>  include/qemu/job.h | 21 +++++++++++
>  blockjob.c         | 20 -----------
>  job.c              | 88 ++++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 107 insertions(+), 22 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v2 06/14] job.c: make job_event_* functions static
  2021-11-04 14:53 ` [RFC PATCH v2 06/14] job.c: make job_event_* functions static Emanuele Giuseppe Esposito
@ 2021-12-16 16:54   ` Stefan Hajnoczi
  0 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2021-12-16 16:54 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Paolo Bonzini, John Snow

[-- Attachment #1: Type: text/plain, Size: 422 bytes --]

On Thu, Nov 04, 2021 at 10:53:26AM -0400, Emanuele Giuseppe Esposito wrote:
> job_event_* functions can all be static, as they are not used
> outside job.c.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>  include/qemu/job.h |  6 ------
>  job.c              | 12 ++++++++++--
>  2 files changed, 10 insertions(+), 8 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v2 03/14] job.h: define locked functions
  2021-12-16 16:48   ` Stefan Hajnoczi
@ 2021-12-16 17:11     ` Vladimir Sementsov-Ogievskiy
  2021-12-20 10:15       ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 34+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-16 17:11 UTC (permalink / raw)
  To: Stefan Hajnoczi, Emanuele Giuseppe Esposito
  Cc: qemu-block, Kevin Wolf, Hanna Reitz, John Snow, Wen Congyang,
	Xie Changlong, Paolo Bonzini, Markus Armbruster, Fam Zheng,
	qemu-devel

16.12.2021 19:48, Stefan Hajnoczi wrote:
> On Thu, Nov 04, 2021 at 10:53:23AM -0400, Emanuele Giuseppe Esposito wrote:
>>   /** Returns whether the job is ready to be completed. */
>>   bool job_is_ready(Job *job);
>>   
>> +/** Same as job_is_ready(), but assumes job_lock is held. */
>> +bool job_is_ready_locked(Job *job);
> 
> What I see here is that some functions assume job_lock is held but don't
> have _locked in their name (job_ref()), some assume job_lock is held and
> have _locked in their name (job_is_ready_locked()), and some assume
> job_lock is not held (job_is_ready()).
> 
> That means when _locked is not in the name I don't know whether this
> function requires job_lock or will deadlock if called under job_lock.
> 
> Two ways to it obvious:
> 
> 1. Always have _locked in the name if the function requires job_lock.
>     Functions without _locked must not be called under job_lock.
> 
> 2. Don't change the name but use the type system instead:
> 
>     /*
>      * Define a unique type so the compiler warns us. It's just a pointer
>      * so it can be efficiently passed by value.
>      */
>     typedef struct { Job *job; } LockedJob;
> 
>     LockedJob job_lock(Job *job);
>     Job *job_unlock(LockedJob job);
> 
>     Now the compiler catches mistakes:
> 
>     bool job_is_completed(LockedJob job);
>     bool job_is_ready(Job *job);
> 
>     Job *j;
>     LockedJob l;
>     job_is_completed(j) -> compiler error
>     job_is_completed(l) -> ok
>     job_is_ready(j) -> ok
>     job_is_ready(l) -> compiler error
> 
>     This approach assumes per-Job locks but a similar API is possible
>     with a global job_mutex too. There just needs to be a function to
>     turn Job * into LockedJob and LockedJob back into Job*.
> 
>     This is slightly exotic. It's not an approach I've seen used in C, so
>     it's not idiomatic and people might find it unfamiliar.

Oh yes. If we need something, I'd prefer function renaming.

> 
> These are just ideas. If you want to keep it the way it is, that's okay
> too (although a little confusing).
> 
>> diff --git a/job.c b/job.c
>> index 0e4dacf028..e393c1222f 100644
>> --- a/job.c
>> +++ b/job.c
>> @@ -242,7 +242,8 @@ bool job_cancel_requested(Job *job)
>>       return job->cancelled;
>>   }
>>   
>> -bool job_is_ready(Job *job)
>> +/* Called with job_mutex held. */
> 
> This information should go with the doc comments (and it's already there
> in job.h!). There is no rule on where to put doc comments but in this
> case you already added them to job.h, so they are not needed here in
> job.c. Leaving them could confuse other people into adding doc comments
> into job.c instead of job.h.
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v2 05/14] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU
  2021-11-04 14:53 ` [RFC PATCH v2 05/14] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU Emanuele Giuseppe Esposito
@ 2021-12-18 11:53   ` Vladimir Sementsov-Ogievskiy
  2021-12-20 10:34     ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 34+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-18 11:53 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, John Snow, Wen Congyang, Xie Changlong,
	Paolo Bonzini, Markus Armbruster, Stefan Hajnoczi, Fam Zheng,
	qemu-devel

04.11.2021 17:53, Emanuele Giuseppe Esposito wrote:
> Once job lock is used and aiocontext is removed, mirror has
> to perform job operations under the same critical section,
> using the helpers prepared in previous commit.
> 
> Note: at this stage, job_{lock/unlock} and job lock guard macros
> are *nop*.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>   block/mirror.c | 8 +++-----
>   1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 00089e519b..f22fa7da6e 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -653,7 +653,7 @@ static int mirror_exit_common(Job *job)
>       BlockDriverState *target_bs;
>       BlockDriverState *mirror_top_bs;
>       Error *local_err = NULL;
> -    bool abort = job->ret < 0;
> +    bool abort = job_has_failed(job);
>       int ret = 0;
>   
>       if (s->prepared) {
> @@ -1161,9 +1161,7 @@ static void mirror_complete(Job *job, Error **errp)
>       s->should_complete = true;
>   
>       /* If the job is paused, it will be re-entered when it is resumed */
> -    if (!job->paused) {
> -        job_enter(job);
> -    }
> +    job_enter_not_paused(job);
>   }
>   
>   static void coroutine_fn mirror_pause(Job *job)
> @@ -1182,7 +1180,7 @@ static bool mirror_drained_poll(BlockJob *job)
>        * from one of our own drain sections, to avoid a deadlock waiting for
>        * ourselves.
>        */
> -    if (!s->common.job.paused && !job_is_cancelled(&job->job) && !s->in_drain) {
> +    if (job_not_paused_nor_cancelled(&s->common.job) && !s->in_drain) {
>           return true;
>       }
>   
> 

Why to introduce a separate API function for every use case?

Could we instead just use WITH_JOB_LOCK_GUARD() ?


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v2 10/14] jobs: protect jobs with job_lock/unlock
  2021-11-04 14:53 ` [RFC PATCH v2 10/14] jobs: protect jobs with job_lock/unlock Emanuele Giuseppe Esposito
@ 2021-12-18 11:57   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 34+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-18 11:57 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, John Snow, Wen Congyang, Xie Changlong,
	Paolo Bonzini, Markus Armbruster, Stefan Hajnoczi, Fam Zheng,
	qemu-devel

04.11.2021 17:53, Emanuele Giuseppe Esposito wrote:
> Introduce the job locking mechanism through the whole job API,
> following the comments and requirements of job-monitor (assume
> lock is held) and job-driver (lock is not held).
> 
> job_{lock/unlock} is independent from_job_{lock/unlock}.
> 
> Note: at this stage, job_{lock/unlock} and job lock guard macros
> are*nop*.
> 
> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>

JOB_LOCK_GUARD / WITH_JOB_LOCK_GUARD may be used in many cases

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v2 11/14] block_job_query: remove atomic read
  2021-11-04 14:53 ` [RFC PATCH v2 11/14] block_job_query: remove atomic read Emanuele Giuseppe Esposito
@ 2021-12-18 12:07   ` Vladimir Sementsov-Ogievskiy
  2021-12-23 11:37     ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 34+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-18 12:07 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, John Snow, Wen Congyang, Xie Changlong,
	Paolo Bonzini, Markus Armbruster, Stefan Hajnoczi, Fam Zheng,
	qemu-devel

04.11.2021 17:53, Emanuele Giuseppe Esposito wrote:
> Not sure what the atomic here was supposed to do, since job.busy
> is protected by the job lock.

In block_job_query() it is protected only since previous commit. So, before previous commit, atomic read make sense.

Hmm. but job_lock() is still a no-op at this point. So, actually, it would be more correct to drop this qatomic_read after patch 14.

> Since the whole function will
> be called under job_mutex, just remove the atomic.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>   blockjob.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/blockjob.c b/blockjob.c
> index dcc13dc336..426dcddcc1 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -314,6 +314,7 @@ int64_t block_job_ratelimit_get_delay(BlockJob *job, uint64_t n)
>       return ratelimit_calculate_delay(&job->limit, n);
>   }
>   
> +/* Called with job_mutex held */
>   BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
>   {
>       BlockJobInfo *info;
> @@ -332,13 +333,13 @@ BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
>       info = g_new0(BlockJobInfo, 1);
>       info->type      = g_strdup(job_type_str(&job->job));
>       info->device    = g_strdup(job->job.id);
> -    info->busy      = qatomic_read(&job->job.busy);
> +    info->busy      = job->job.busy;
>       info->paused    = job->job.pause_count > 0;
>       info->offset    = progress_current;
>       info->len       = progress_total;
>       info->speed     = job->speed;
>       info->io_status = job->iostatus;
> -    info->ready     = job_is_ready(&job->job),
> +    info->ready     = job_is_ready_locked(&job->job),
>       info->status    = job->job.status;
>       info->auto_finalize = job->job.auto_finalize;
>       info->auto_dismiss  = job->job.auto_dismiss;
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v2 13/14] jobs: add job lock in find_* functions
  2021-11-04 14:53 ` [RFC PATCH v2 13/14] jobs: add job lock in find_* functions Emanuele Giuseppe Esposito
@ 2021-12-18 12:11   ` Vladimir Sementsov-Ogievskiy
  2021-12-18 12:22     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 34+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-18 12:11 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, John Snow, Wen Congyang, Xie Changlong,
	Paolo Bonzini, Markus Armbruster, Stefan Hajnoczi, Fam Zheng,
	qemu-devel

04.11.2021 17:53, Emanuele Giuseppe Esposito wrote:
> Both blockdev.c and job-qmp.c have TOC/TOU conditions, because
> they first search for the job and then perform an action on it.
> Therefore, we need to do the search + action under the same
> job mutex critical section.
> 
> Note: at this stage, job_{lock/unlock} and job lock guard macros
> are *nop*.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>   blockdev.c | 9 +++++++++
>   job-qmp.c  | 8 ++++++++
>   2 files changed, 17 insertions(+)
> 
> diff --git a/blockdev.c b/blockdev.c
> index c5a835d9ed..0bd79757fc 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3327,12 +3327,14 @@ static BlockJob *find_block_job(const char *id, AioContext **aio_context,
>       assert(id != NULL);
>   
>       *aio_context = NULL;
> +    job_lock();

JOB_LOCK_GUARD() will look better in this case

>   
>       job = block_job_get(id);
>   
>       if (!job) {
>           error_set(errp, ERROR_CLASS_DEVICE_NOT_ACTIVE,
>                     "Block job '%s' not found", id);
> +        job_unlock();
>           return NULL;
>       }
>   
> @@ -3353,6 +3355,7 @@ void qmp_block_job_set_speed(const char *device, int64_t speed, Error **errp)
>   
>       block_job_set_speed(job, speed, errp);
>       aio_context_release(aio_context);
> +    job_unlock();

You add job_unlock(), but not job_lock() ? Something is wrong. And anyway, I thin JOB_LOCK_GUARD / WITH_JOB_LOCK_GUARD are generally safer

>   }
>   
>   void qmp_block_job_cancel(const char *device,
> @@ -3379,6 +3382,7 @@ void qmp_block_job_cancel(const char *device,
>       job_user_cancel(&job->job, force, errp);
>   out:
>       aio_context_release(aio_context);
> +    job_unlock();
>   }
>   
>   void qmp_block_job_pause(const char *device, Error **errp)
> @@ -3393,6 +3397,7 @@ void qmp_block_job_pause(const char *device, Error **errp)
>       trace_qmp_block_job_pause(job);
>       job_user_pause(&job->job, errp);
>       aio_context_release(aio_context);
> +    job_unlock();
>   }
>   
>   void qmp_block_job_resume(const char *device, Error **errp)
> @@ -3407,6 +3412,7 @@ void qmp_block_job_resume(const char *device, Error **errp)
>       trace_qmp_block_job_resume(job);
>       job_user_resume(&job->job, errp);
>       aio_context_release(aio_context);
> +    job_unlock();
>   }
>   
>   void qmp_block_job_complete(const char *device, Error **errp)
> @@ -3421,6 +3427,7 @@ void qmp_block_job_complete(const char *device, Error **errp)
>       trace_qmp_block_job_complete(job);
>       job_complete(&job->job, errp);
>       aio_context_release(aio_context);
> +    job_unlock();
>   }
>   
>   void qmp_block_job_finalize(const char *id, Error **errp)
> @@ -3444,6 +3451,7 @@ void qmp_block_job_finalize(const char *id, Error **errp)
>       aio_context = blk_get_aio_context(job->blk);
>       job_unref(&job->job);
>       aio_context_release(aio_context);
> +    job_unlock();
>   }
>   
>   void qmp_block_job_dismiss(const char *id, Error **errp)
> @@ -3460,6 +3468,7 @@ void qmp_block_job_dismiss(const char *id, Error **errp)
>       job = &bjob->job;
>       job_dismiss(&job, errp);
>       aio_context_release(aio_context);
> +    job_unlock();
>   }
>   
>   void qmp_change_backing_file(const char *device,
> diff --git a/job-qmp.c b/job-qmp.c
> index a355dc2954..8f07c51db8 100644
> --- a/job-qmp.c
> +++ b/job-qmp.c
> @@ -35,10 +35,12 @@ static Job *find_job(const char *id, AioContext **aio_context, Error **errp)
>       Job *job;
>   
>       *aio_context = NULL;
> +    job_lock();
>   
>       job = job_get(id);
>       if (!job) {
>           error_setg(errp, "Job not found");
> +        job_unlock();
>           return NULL;
>       }
>   
> @@ -60,6 +62,7 @@ void qmp_job_cancel(const char *id, Error **errp)
>       trace_qmp_job_cancel(job);
>       job_user_cancel(job, true, errp);
>       aio_context_release(aio_context);
> +    job_unlock();
>   }
>   
>   void qmp_job_pause(const char *id, Error **errp)
> @@ -74,6 +77,7 @@ void qmp_job_pause(const char *id, Error **errp)
>       trace_qmp_job_pause(job);
>       job_user_pause(job, errp);
>       aio_context_release(aio_context);
> +    job_unlock();
>   }
>   
>   void qmp_job_resume(const char *id, Error **errp)
> @@ -88,6 +92,7 @@ void qmp_job_resume(const char *id, Error **errp)
>       trace_qmp_job_resume(job);
>       job_user_resume(job, errp);
>       aio_context_release(aio_context);
> +    job_unlock();
>   }
>   
>   void qmp_job_complete(const char *id, Error **errp)
> @@ -102,6 +107,7 @@ void qmp_job_complete(const char *id, Error **errp)
>       trace_qmp_job_complete(job);
>       job_complete(job, errp);
>       aio_context_release(aio_context);
> +    job_unlock();
>   }
>   
>   void qmp_job_finalize(const char *id, Error **errp)
> @@ -125,6 +131,7 @@ void qmp_job_finalize(const char *id, Error **errp)
>       aio_context = job->aio_context;
>       job_unref(job);
>       aio_context_release(aio_context);
> +    job_unlock();
>   }
>   
>   void qmp_job_dismiss(const char *id, Error **errp)
> @@ -139,6 +146,7 @@ void qmp_job_dismiss(const char *id, Error **errp)
>       trace_qmp_job_dismiss(job);
>       job_dismiss(&job, errp);
>       aio_context_release(aio_context);
> +    job_unlock();
>   }
>   
>   static JobInfo *job_query_single(Job *job, Error **errp)
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v2 13/14] jobs: add job lock in find_* functions
  2021-12-18 12:11   ` Vladimir Sementsov-Ogievskiy
@ 2021-12-18 12:22     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 34+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-18 12:22 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, John Snow, Wen Congyang, Xie Changlong,
	Paolo Bonzini, Markus Armbruster, Stefan Hajnoczi, Fam Zheng,
	qemu-devel

18.12.2021 15:11, Vladimir Sementsov-Ogievskiy wrote:
> 04.11.2021 17:53, Emanuele Giuseppe Esposito wrote:
>> Both blockdev.c and job-qmp.c have TOC/TOU conditions, because
>> they first search for the job and then perform an action on it.
>> Therefore, we need to do the search + action under the same
>> job mutex critical section.
>>
>> Note: at this stage, job_{lock/unlock} and job lock guard macros
>> are *nop*.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>> ---
>>   blockdev.c | 9 +++++++++
>>   job-qmp.c  | 8 ++++++++
>>   2 files changed, 17 insertions(+)
>>
>> diff --git a/blockdev.c b/blockdev.c
>> index c5a835d9ed..0bd79757fc 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -3327,12 +3327,14 @@ static BlockJob *find_block_job(const char *id, AioContext **aio_context,
>>       assert(id != NULL);
>>       *aio_context = NULL;
>> +    job_lock();
> 
> JOB_LOCK_GUARD() will look better in this case
> 
>>       job = block_job_get(id);
>>       if (!job) {
>>           error_set(errp, ERROR_CLASS_DEVICE_NOT_ACTIVE,
>>                     "Block job '%s' not found", id);
>> +        job_unlock();
>>           return NULL;
>>       }
>> @@ -3353,6 +3355,7 @@ void qmp_block_job_set_speed(const char *device, int64_t speed, Error **errp)
>>       block_job_set_speed(job, speed, errp);
>>       aio_context_release(aio_context);
>> +    job_unlock();
> 
> You add job_unlock(), but not job_lock() ? Something is wrong. And anyway, I thin JOB_LOCK_GUARD / WITH_JOB_LOCK_GUARD are generally safer

Ah, I understand now what's going on. If comment "Returns with job_lock held on success" appear in this patch, it would be more obvious.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v2 14/14] job.c: enable job lock/unlock and remove Aiocontext locks
  2021-11-04 14:53 ` [RFC PATCH v2 14/14] job.c: enable job lock/unlock and remove Aiocontext locks Emanuele Giuseppe Esposito
@ 2021-12-18 12:24   ` Vladimir Sementsov-Ogievskiy
  2021-12-23 14:59     ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 34+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-18 12:24 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, John Snow, Wen Congyang, Xie Changlong,
	Paolo Bonzini, Markus Armbruster, Stefan Hajnoczi, Fam Zheng,
	qemu-devel

04.11.2021 17:53, Emanuele Giuseppe Esposito wrote:
> --- a/block/replication.c
> +++ b/block/replication.c
> @@ -728,9 +728,11 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
>            * disk, secondary disk in backup_job_completed().
>            */
>           if (s->backup_job) {
> +            aio_context_release(aio_context);
>               job_lock();
>               job_cancel_sync(&s->backup_job->job, true);
>               job_unlock();
> +            aio_context_acquire(aio_context);


Why we need it? If we never acquire aio context under job_lock, it should be safe to make a job-mutex critical section inside aio-context critical section.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v2 03/14] job.h: define locked functions
  2021-12-16 17:11     ` Vladimir Sementsov-Ogievskiy
@ 2021-12-20 10:15       ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-12-20 10:15 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Wen Congyang, Xie Changlong,
	Markus Armbruster, qemu-devel, Hanna Reitz, Paolo Bonzini,
	John Snow



On 16/12/2021 18:11, Vladimir Sementsov-Ogievskiy wrote:
> 16.12.2021 19:48, Stefan Hajnoczi wrote:
>> On Thu, Nov 04, 2021 at 10:53:23AM -0400, Emanuele Giuseppe Esposito 
>> wrote:
>>>   /** Returns whether the job is ready to be completed. */
>>>   bool job_is_ready(Job *job);
>>> +/** Same as job_is_ready(), but assumes job_lock is held. */
>>> +bool job_is_ready_locked(Job *job);
>>
>> What I see here is that some functions assume job_lock is held but don't
>> have _locked in their name (job_ref()), some assume job_lock is held and
>> have _locked in their name (job_is_ready_locked()), and some assume
>> job_lock is not held (job_is_ready()).
>>
>> That means when _locked is not in the name I don't know whether this
>> function requires job_lock or will deadlock if called under job_lock.
>>
>> Two ways to it obvious:
>>
>> 1. Always have _locked in the name if the function requires job_lock.
>>     Functions without _locked must not be called under job_lock.
>>
>> 2. Don't change the name but use the type system instead:
>>
>>     /*
>>      * Define a unique type so the compiler warns us. It's just a pointer
>>      * so it can be efficiently passed by value.
>>      */
>>     typedef struct { Job *job; } LockedJob;
>>
>>     LockedJob job_lock(Job *job);
>>     Job *job_unlock(LockedJob job);
>>
>>     Now the compiler catches mistakes:
>>
>>     bool job_is_completed(LockedJob job);
>>     bool job_is_ready(Job *job);
>>
>>     Job *j;
>>     LockedJob l;
>>     job_is_completed(j) -> compiler error
>>     job_is_completed(l) -> ok
>>     job_is_ready(j) -> ok
>>     job_is_ready(l) -> compiler error
>>
>>     This approach assumes per-Job locks but a similar API is possible
>>     with a global job_mutex too. There just needs to be a function to
>>     turn Job * into LockedJob and LockedJob back into Job*.
>>
>>     This is slightly exotic. It's not an approach I've seen used in C, so
>>     it's not idiomatic and people might find it unfamiliar.
> 
> Oh yes. If we need something, I'd prefer function renaming.

Ok, I will go with option 1.

> 
>>
>> These are just ideas. If you want to keep it the way it is, that's okay
>> too (although a little confusing).
>>
>>> diff --git a/job.c b/job.c
>>> index 0e4dacf028..e393c1222f 100644
>>> --- a/job.c
>>> +++ b/job.c
>>> @@ -242,7 +242,8 @@ bool job_cancel_requested(Job *job)
>>>       return job->cancelled;
>>>   }
>>> -bool job_is_ready(Job *job)
>>> +/* Called with job_mutex held. */
>>
>> This information should go with the doc comments (and it's already there
>> in job.h!). There is no rule on where to put doc comments but in this
>> case you already added them to job.h, so they are not needed here in
>> job.c. Leaving them could confuse other people into adding doc comments
>> into job.c instead of job.h.
>>

Yes, in general I will leave the comment for static functions in job.c 
and make sure the public ones are only documented in job.h.

Thank you,
Emanuele



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v2 05/14] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU
  2021-12-18 11:53   ` Vladimir Sementsov-Ogievskiy
@ 2021-12-20 10:34     ` Emanuele Giuseppe Esposito
  2021-12-20 10:47       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-12-20 10:34 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Wen Congyang, Xie Changlong,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow



On 18/12/2021 12:53, Vladimir Sementsov-Ogievskiy wrote:
> 04.11.2021 17:53, Emanuele Giuseppe Esposito wrote:
>> Once job lock is used and aiocontext is removed, mirror has
>> to perform job operations under the same critical section,
>> using the helpers prepared in previous commit.
>>
>> Note: at this stage, job_{lock/unlock} and job lock guard macros
>> are *nop*.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>> ---
>>   block/mirror.c | 8 +++-----
>>   1 file changed, 3 insertions(+), 5 deletions(-)
>>
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 00089e519b..f22fa7da6e 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -653,7 +653,7 @@ static int mirror_exit_common(Job *job)
>>       BlockDriverState *target_bs;
>>       BlockDriverState *mirror_top_bs;
>>       Error *local_err = NULL;
>> -    bool abort = job->ret < 0;
>> +    bool abort = job_has_failed(job);
>>       int ret = 0;
>>       if (s->prepared) {
>> @@ -1161,9 +1161,7 @@ static void mirror_complete(Job *job, Error **errp)
>>       s->should_complete = true;
>>       /* If the job is paused, it will be re-entered when it is 
>> resumed */
>> -    if (!job->paused) {
>> -        job_enter(job);
>> -    }
>> +    job_enter_not_paused(job);
>>   }
>>   static void coroutine_fn mirror_pause(Job *job)
>> @@ -1182,7 +1180,7 @@ static bool mirror_drained_poll(BlockJob *job)
>>        * from one of our own drain sections, to avoid a deadlock 
>> waiting for
>>        * ourselves.
>>        */
>> -    if (!s->common.job.paused && !job_is_cancelled(&job->job) && 
>> !s->in_drain) {
>> +    if (job_not_paused_nor_cancelled(&s->common.job) && !s->in_drain) {
>>           return true;
>>       }
>>
> 
> Why to introduce a separate API function for every use case?
> 
> Could we instead just use WITH_JOB_LOCK_GUARD() ?
> 

This implies making the struct job_mutex public. Is that ok for you?

Thank you,
Emanuele



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v2 05/14] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU
  2021-12-20 10:34     ` Emanuele Giuseppe Esposito
@ 2021-12-20 10:47       ` Vladimir Sementsov-Ogievskiy
  2021-12-23 11:37         ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 34+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-12-20 10:47 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, Hanna Reitz, John Snow, Wen Congyang, Xie Changlong,
	Paolo Bonzini, Markus Armbruster, Stefan Hajnoczi, Fam Zheng,
	qemu-devel

20.12.2021 13:34, Emanuele Giuseppe Esposito wrote:
> 
> 
> On 18/12/2021 12:53, Vladimir Sementsov-Ogievskiy wrote:
>> 04.11.2021 17:53, Emanuele Giuseppe Esposito wrote:
>>> Once job lock is used and aiocontext is removed, mirror has
>>> to perform job operations under the same critical section,
>>> using the helpers prepared in previous commit.
>>>
>>> Note: at this stage, job_{lock/unlock} and job lock guard macros
>>> are *nop*.
>>>
>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>> ---
>>>   block/mirror.c | 8 +++-----
>>>   1 file changed, 3 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/block/mirror.c b/block/mirror.c
>>> index 00089e519b..f22fa7da6e 100644
>>> --- a/block/mirror.c
>>> +++ b/block/mirror.c
>>> @@ -653,7 +653,7 @@ static int mirror_exit_common(Job *job)
>>>       BlockDriverState *target_bs;
>>>       BlockDriverState *mirror_top_bs;
>>>       Error *local_err = NULL;
>>> -    bool abort = job->ret < 0;
>>> +    bool abort = job_has_failed(job);
>>>       int ret = 0;
>>>       if (s->prepared) {
>>> @@ -1161,9 +1161,7 @@ static void mirror_complete(Job *job, Error **errp)
>>>       s->should_complete = true;
>>>       /* If the job is paused, it will be re-entered when it is resumed */
>>> -    if (!job->paused) {
>>> -        job_enter(job);
>>> -    }
>>> +    job_enter_not_paused(job);
>>>   }
>>>   static void coroutine_fn mirror_pause(Job *job)
>>> @@ -1182,7 +1180,7 @@ static bool mirror_drained_poll(BlockJob *job)
>>>        * from one of our own drain sections, to avoid a deadlock waiting for
>>>        * ourselves.
>>>        */
>>> -    if (!s->common.job.paused && !job_is_cancelled(&job->job) && !s->in_drain) {
>>> +    if (job_not_paused_nor_cancelled(&s->common.job) && !s->in_drain) {
>>>           return true;
>>>       }
>>>
>>
>> Why to introduce a separate API function for every use case?
>>
>> Could we instead just use WITH_JOB_LOCK_GUARD() ?
>>
> 
> This implies making the struct job_mutex public. Is that ok for you?
> 

Yes, I think it's OK.

Alternatively, you can use job_lock() / job_unlock(), or even rewrite WITH_JOB_LOCK_GUARD() macro using job_lock/job_unlock, to keep mutex private.. But I don't think it really worth it now.

Note that struct Job is already public, so if we'll use per-job mutex in future it still is not a problem. Only when we decide to make struct Job private, we'll have to decide something about JOB_LOCK_GUARD(), and at this point we'll just rewrite it to work through some helper function instead of directly touching the mutex.


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v2 02/14] job.h: categorize fields in struct Job
  2021-12-16 16:21   ` Stefan Hajnoczi
@ 2021-12-21 14:23     ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-12-21 14:23 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-block,
	Wen Congyang, Xie Changlong, Markus Armbruster, qemu-devel,
	Hanna Reitz, Paolo Bonzini, John Snow



On 16/12/2021 17:21, Stefan Hajnoczi wrote:
> On Thu, Nov 04, 2021 at 10:53:22AM -0400, Emanuele Giuseppe Esposito wrote:
>> Categorize the fields in struct Job to understand which ones
>> need to be protected by the job mutex and which don't.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>> ---
>>   include/qemu/job.h | 57 +++++++++++++++++++++++++++-------------------
>>   1 file changed, 34 insertions(+), 23 deletions(-)
>>
>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>> index ccf7826426..f7036ac6b3 100644
>> --- a/include/qemu/job.h
>> +++ b/include/qemu/job.h
>> @@ -40,27 +40,52 @@ typedef struct JobTxn JobTxn;
>>    * Long-running operation.
>>    */
>>   typedef struct Job {
>> +
>> +    /* Fields set at initialization (job_create), and never modified */
>> +
>>       /** The ID of the job. May be NULL for internal jobs. */
>>       char *id;
>>   
>> -    /** The type of this job. */
>> +    /**
>> +     * The type of this job.
>> +     * All callbacks are called with job_mutex *not* held.
>> +     */
>>       const JobDriver *driver;
>>   
>> -    /** Reference count of the block job */
>> -    int refcnt;
>> -
>> -    /** Current state; See @JobStatus for details. */
>> -    JobStatus status;
>> -
>>       /** AioContext to run the job coroutine in */
>>       AioContext *aio_context;
> 
> "Fields set at initialization (job_create), and never modified" does not
> apply here. blockjob.c:child_job_set_aio_ctx() changes it at runtime.
> 

Right. aio_context can theoretically avoid also the job_mutex, if we 
make sure that all klass->set_aio_ctx() are under BQL (they are) and 
under drains (work in progress). For now I will protect it with job_lock().

Thank you,
Emanuele



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v2 05/14] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU
  2021-12-20 10:47       ` Vladimir Sementsov-Ogievskiy
@ 2021-12-23 11:37         ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-12-23 11:37 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Wen Congyang, Xie Changlong,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow



On 20/12/2021 11:47, Vladimir Sementsov-Ogievskiy wrote:
> 20.12.2021 13:34, Emanuele Giuseppe Esposito wrote:
>>
>>
>> On 18/12/2021 12:53, Vladimir Sementsov-Ogievskiy wrote:
>>> 04.11.2021 17:53, Emanuele Giuseppe Esposito wrote:
>>>> Once job lock is used and aiocontext is removed, mirror has
>>>> to perform job operations under the same critical section,
>>>> using the helpers prepared in previous commit.
>>>>
>>>> Note: at this stage, job_{lock/unlock} and job lock guard macros
>>>> are *nop*.
>>>>
>>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>>> ---
>>>>   block/mirror.c | 8 +++-----
>>>>   1 file changed, 3 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/block/mirror.c b/block/mirror.c
>>>> index 00089e519b..f22fa7da6e 100644
>>>> --- a/block/mirror.c
>>>> +++ b/block/mirror.c
>>>> @@ -653,7 +653,7 @@ static int mirror_exit_common(Job *job)
>>>>       BlockDriverState *target_bs;
>>>>       BlockDriverState *mirror_top_bs;
>>>>       Error *local_err = NULL;
>>>> -    bool abort = job->ret < 0;
>>>> +    bool abort = job_has_failed(job);
>>>>       int ret = 0;
>>>>       if (s->prepared) {
>>>> @@ -1161,9 +1161,7 @@ static void mirror_complete(Job *job, Error 
>>>> **errp)
>>>>       s->should_complete = true;
>>>>       /* If the job is paused, it will be re-entered when it is 
>>>> resumed */
>>>> -    if (!job->paused) {
>>>> -        job_enter(job);
>>>> -    }
>>>> +    job_enter_not_paused(job);
>>>>   }
>>>>   static void coroutine_fn mirror_pause(Job *job)
>>>> @@ -1182,7 +1180,7 @@ static bool mirror_drained_poll(BlockJob *job)
>>>>        * from one of our own drain sections, to avoid a deadlock 
>>>> waiting for
>>>>        * ourselves.
>>>>        */
>>>> -    if (!s->common.job.paused && !job_is_cancelled(&job->job) && 
>>>> !s->in_drain) {
>>>> +    if (job_not_paused_nor_cancelled(&s->common.job) && 
>>>> !s->in_drain) {
>>>>           return true;
>>>>       }
>>>>
>>>
>>> Why to introduce a separate API function for every use case?
>>>
>>> Could we instead just use WITH_JOB_LOCK_GUARD() ?
>>>
>>
>> This implies making the struct job_mutex public. Is that ok for you?
>>
> 
> Yes, I think it's OK.
> 
> Alternatively, you can use job_lock() / job_unlock(), or even rewrite 
> WITH_JOB_LOCK_GUARD() macro using job_lock/job_unlock, to keep mutex 
> private.. But I don't think it really worth it now.
> 
> Note that struct Job is already public, so if we'll use per-job mutex in 
> future it still is not a problem. Only when we decide to make struct Job 
> private, we'll have to decide something about JOB_LOCK_GUARD(), and at 
> this point we'll just rewrite it to work through some helper function 
> instead of directly touching the mutex.
> 
> 

Ok I will do that. Just FYI the initial idea was that drivers like 
monitor would not need to know about job_mutex lock, that is why I made 
the helpers in mirror.c.

Thank you,
Emanuele



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v2 11/14] block_job_query: remove atomic read
  2021-12-18 12:07   ` Vladimir Sementsov-Ogievskiy
@ 2021-12-23 11:37     ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-12-23 11:37 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Wen Congyang, Xie Changlong,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow



On 18/12/2021 13:07, Vladimir Sementsov-Ogievskiy wrote:
> 04.11.2021 17:53, Emanuele Giuseppe Esposito wrote:
>> Not sure what the atomic here was supposed to do, since job.busy
>> is protected by the job lock.
> 
> In block_job_query() it is protected only since previous commit. So, 
> before previous commit, atomic read make sense.

To me it doesn't really, because it is protected with job_lock/unlock in 
job.c, and here is read with an atomic. But maybe I am missing something.

> Hmm. but job_lock() is still a no-op at this point. So, actually, it 
> would be more correct to drop this qatomic_read after patch 14.
> 

Will do.

Thank you,
Emanuele



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v2 14/14] job.c: enable job lock/unlock and remove Aiocontext locks
  2021-12-18 12:24   ` Vladimir Sementsov-Ogievskiy
@ 2021-12-23 14:59     ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 34+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-12-23 14:59 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, Fam Zheng, Wen Congyang, Xie Changlong,
	Markus Armbruster, qemu-devel, Hanna Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow



On 18/12/2021 13:24, Vladimir Sementsov-Ogievskiy wrote:
> 04.11.2021 17:53, Emanuele Giuseppe Esposito wrote:
>> --- a/block/replication.c
>> +++ b/block/replication.c
>> @@ -728,9 +728,11 @@ static void replication_stop(ReplicationState 
>> *rs, bool failover, Error **errp)
>>            * disk, secondary disk in backup_job_completed().
>>            */
>>           if (s->backup_job) {
>> +            aio_context_release(aio_context);
>>               job_lock();
>>               job_cancel_sync(&s->backup_job->job, true);
>>               job_unlock();
>> +            aio_context_acquire(aio_context);
> 
> 
> Why we need it? If we never acquire aio context under job_lock, it 
> should be safe to make a job-mutex critical section inside aio-context 
> critical section.
> 

Right, it works also with the aio context taken.
I will remove this hunk.

Thank you,
Emanuele



^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2021-12-23 15:01 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-04 14:53 [RFC PATCH v2 00/14] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
2021-11-04 14:53 ` [RFC PATCH v2 01/14] job.c: make job_lock/unlock public Emanuele Giuseppe Esposito
2021-12-16 16:18   ` Stefan Hajnoczi
2021-11-04 14:53 ` [RFC PATCH v2 02/14] job.h: categorize fields in struct Job Emanuele Giuseppe Esposito
2021-12-16 16:21   ` Stefan Hajnoczi
2021-12-21 14:23     ` Emanuele Giuseppe Esposito
2021-11-04 14:53 ` [RFC PATCH v2 03/14] job.h: define locked functions Emanuele Giuseppe Esposito
2021-12-16 16:48   ` Stefan Hajnoczi
2021-12-16 17:11     ` Vladimir Sementsov-Ogievskiy
2021-12-20 10:15       ` Emanuele Giuseppe Esposito
2021-11-04 14:53 ` [RFC PATCH v2 04/14] job.h: define unlocked functions Emanuele Giuseppe Esposito
2021-12-16 16:51   ` Stefan Hajnoczi
2021-11-04 14:53 ` [RFC PATCH v2 05/14] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU Emanuele Giuseppe Esposito
2021-12-18 11:53   ` Vladimir Sementsov-Ogievskiy
2021-12-20 10:34     ` Emanuele Giuseppe Esposito
2021-12-20 10:47       ` Vladimir Sementsov-Ogievskiy
2021-12-23 11:37         ` Emanuele Giuseppe Esposito
2021-11-04 14:53 ` [RFC PATCH v2 06/14] job.c: make job_event_* functions static Emanuele Giuseppe Esposito
2021-12-16 16:54   ` Stefan Hajnoczi
2021-11-04 14:53 ` [RFC PATCH v2 07/14] job.c: move inner aiocontext lock in callbacks Emanuele Giuseppe Esposito
2021-11-04 14:53 ` [RFC PATCH v2 08/14] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2021-11-04 14:53 ` [RFC PATCH v2 09/14] jobs: remove aiocontext locks since the functions are under BQL Emanuele Giuseppe Esposito
2021-11-04 14:53 ` [RFC PATCH v2 10/14] jobs: protect jobs with job_lock/unlock Emanuele Giuseppe Esposito
2021-12-18 11:57   ` Vladimir Sementsov-Ogievskiy
2021-11-04 14:53 ` [RFC PATCH v2 11/14] block_job_query: remove atomic read Emanuele Giuseppe Esposito
2021-12-18 12:07   ` Vladimir Sementsov-Ogievskiy
2021-12-23 11:37     ` Emanuele Giuseppe Esposito
2021-11-04 14:53 ` [RFC PATCH v2 12/14] jobs: use job locks and helpers also in the unit tests Emanuele Giuseppe Esposito
2021-11-04 14:53 ` [RFC PATCH v2 13/14] jobs: add job lock in find_* functions Emanuele Giuseppe Esposito
2021-12-18 12:11   ` Vladimir Sementsov-Ogievskiy
2021-12-18 12:22     ` Vladimir Sementsov-Ogievskiy
2021-11-04 14:53 ` [RFC PATCH v2 14/14] job.c: enable job lock/unlock and remove Aiocontext locks Emanuele Giuseppe Esposito
2021-12-18 12:24   ` Vladimir Sementsov-Ogievskiy
2021-12-23 14:59     ` Emanuele Giuseppe Esposito

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.