All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 0/8] blockjob: AioContext change support for mirror and backup
@ 2016-06-16 16:56 Stefan Hajnoczi
  2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 1/8] blockjob: move iostatus reset out of block_job_enter() Stefan Hajnoczi
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2016-06-16 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, jjherne, Fam Zheng, Paolo Bonzini, Jeff Cody, mreitz,
	Stefan Hajnoczi

v5:
 * Mark .pause()/.resume() as coroutine_fn [Paolo]
 * Switch to coroutine-friendly mirror_drain() in mirror_pause() [Paolo]
 * Rename block_job_is_paused() to block_job_should_pause() [Fam]
 * New patch to add block_job_get_aio_context() helper and use it in the
   aio_poll() loop [Fam]
 * New patch to make bdrv_remove_aio_notifier() safe during attached/detach
   callbacks

v4:
 * Add .pause()/.resume() callbacks to really quiesce during
   block_job_pause_point() [Paolo]
 * Add AioContext change support for backup block job
 * Tested drive_mirror + migration and drive_backup + reboot

v3:
 * Push infrastructure down into blockjob.c so other jobs can reuse it [Stefan]
 * Tested with drive_mirror + migration [Stefan]

v2:
 * Fam introduced the concept of a synchronous aio_poll() loop to quiesce the
   block job during detach

When dataplane is enabled or disabled the drive switches to a new AioContext.
The mirror and backup block jobs must also move to the new AioContext so that
drive accesses are always made within its AioContext.

This series extends the block job pause functionality so that detaching from an
AioContext pauses the job and attaching to the new AioContext resumes the job.

Pause points are added to the mirror job so that long I/O loops can yield for
an AioContext switch.  Other block jobs need pause points too but this can be
done as a follow-up series.

Stefan Hajnoczi (8):
  blockjob: move iostatus reset out of block_job_enter()
  blockjob: rename block_job_is_paused()
  blockjob: add pause points
  blockjob: add block_job_get_aio_context()
  block: use safe iteration over AioContext notifiers
  blockjob: add AioContext attached callback
  mirror: follow AioContext change gracefully
  backup: follow AioContext change gracefully

 block.c                   |  46 ++++++++++++++++-----
 block/backup.c            |  22 ++++++----
 block/mirror.c            |  43 +++++++++++++++----
 blockdev.c                |   1 +
 blockjob.c                | 103 ++++++++++++++++++++++++++++++++++++++++------
 include/block/block_int.h |   2 +
 include/block/blockjob.h  |  51 +++++++++++++++++------
 7 files changed, 217 insertions(+), 51 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v5 1/8] blockjob: move iostatus reset out of block_job_enter()
  2016-06-16 16:56 [Qemu-devel] [PATCH v5 0/8] blockjob: AioContext change support for mirror and backup Stefan Hajnoczi
@ 2016-06-16 16:56 ` Stefan Hajnoczi
  2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 2/8] blockjob: rename block_job_is_paused() Stefan Hajnoczi
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2016-06-16 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, jjherne, Fam Zheng, Paolo Bonzini, Jeff Cody, mreitz,
	Stefan Hajnoczi

The QMP block-job-resume command and cancellation may want to reset the
job's iostatus.  The next patches add a user who does not want to reset
iostatus so move it up to block_job_enter() callers.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockdev.c | 1 +
 blockjob.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 7fd515a..19b963c 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3799,6 +3799,7 @@ void qmp_block_job_resume(const char *device, Error **errp)
 
     job->user_paused = false;
     trace_qmp_block_job_resume(job);
+    block_job_iostatus_reset(job);
     block_job_resume(job);
     aio_context_release(aio_context);
 }
diff --git a/blockjob.c b/blockjob.c
index c095cc5..463bccf 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -269,7 +269,6 @@ void block_job_resume(BlockJob *job)
 
 void block_job_enter(BlockJob *job)
 {
-    block_job_iostatus_reset(job);
     if (job->co && !job->busy) {
         qemu_coroutine_enter(job->co, NULL);
     }
@@ -278,6 +277,7 @@ void block_job_enter(BlockJob *job)
 void block_job_cancel(BlockJob *job)
 {
     job->cancelled = true;
+    block_job_iostatus_reset(job);
     block_job_enter(job);
 }
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v5 2/8] blockjob: rename block_job_is_paused()
  2016-06-16 16:56 [Qemu-devel] [PATCH v5 0/8] blockjob: AioContext change support for mirror and backup Stefan Hajnoczi
  2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 1/8] blockjob: move iostatus reset out of block_job_enter() Stefan Hajnoczi
@ 2016-06-16 16:56 ` Stefan Hajnoczi
  2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 3/8] blockjob: add pause points Stefan Hajnoczi
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2016-06-16 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, jjherne, Fam Zheng, Paolo Bonzini, Jeff Cody, mreitz,
	Stefan Hajnoczi

The block_job_is_paused() function name is not great because callers
only use it to determine whether pausing has been requested.  Rename it
to highlight those semantics and remove it from the public header file
as there are no external callers.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 blockjob.c               | 4 ++--
 include/block/blockjob.h | 9 ---------
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 463bccf..08ac430 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -252,7 +252,7 @@ void block_job_pause(BlockJob *job)
     job->pause_count++;
 }
 
-bool block_job_is_paused(BlockJob *job)
+static bool block_job_should_pause(BlockJob *job)
 {
     return job->pause_count > 0;
 }
@@ -361,7 +361,7 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
     }
 
     job->busy = false;
-    if (block_job_is_paused(job)) {
+    if (block_job_should_pause(job)) {
         qemu_coroutine_yield();
     } else {
         co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 00ac418..8fcecf9 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -348,15 +348,6 @@ void block_job_event_completed(BlockJob *job, const char *msg);
 void block_job_event_ready(BlockJob *job);
 
 /**
- * block_job_is_paused:
- * @job: The job being queried.
- *
- * Returns whether the job is currently paused, or will pause
- * as soon as it reaches a sleeping point.
- */
-bool block_job_is_paused(BlockJob *job);
-
-/**
  * block_job_cancel_sync:
  * @job: The job to be canceled.
  *
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v5 3/8] blockjob: add pause points
  2016-06-16 16:56 [Qemu-devel] [PATCH v5 0/8] blockjob: AioContext change support for mirror and backup Stefan Hajnoczi
  2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 1/8] blockjob: move iostatus reset out of block_job_enter() Stefan Hajnoczi
  2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 2/8] blockjob: rename block_job_is_paused() Stefan Hajnoczi
@ 2016-06-16 16:56 ` Stefan Hajnoczi
  2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 4/8] blockjob: add block_job_get_aio_context() Stefan Hajnoczi
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2016-06-16 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, jjherne, Fam Zheng, Paolo Bonzini, Jeff Cody, mreitz,
	Stefan Hajnoczi

Block jobs are coroutines that usually perform I/O but sometimes also
sleep or yield.  Currently only sleeping or yielded block jobs can be
paused.  This means jobs that do not sleep or yield (using
block_job_yield()) are unaffected by block_job_pause().

Add block_job_pause_point() so that block jobs can mark quiescent points
that are suitable for pausing.  This solves the problem that it can take
a block job a long time to pause if it is performing a long series of
I/O operations.

Transitioning to paused state involves a .pause()/.resume() callback.
These callbacks are used to ensure that I/O and event loop activity has
ceased while the job is at a pause point.

Note that this patch introduces a stricter pause state than previously.
The job->busy flag was incorrectly documented as a quiescent state
without I/O pending.  This is violated by any job that has I/O pending
across sleep or block_job_yield(), like the mirror block job.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 blockjob.c               | 46 ++++++++++++++++++++++++++++++++++++++--------
 include/block/blockjob.h | 35 ++++++++++++++++++++++++++++++++---
 2 files changed, 70 insertions(+), 11 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 08ac430..9357456 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -257,6 +257,32 @@ static bool block_job_should_pause(BlockJob *job)
     return job->pause_count > 0;
 }
 
+void coroutine_fn block_job_pause_point(BlockJob *job)
+{
+    if (!block_job_should_pause(job)) {
+        return;
+    }
+    if (block_job_is_cancelled(job)) {
+        return;
+    }
+
+    if (job->driver->pause) {
+        job->driver->pause(job);
+    }
+
+    if (!block_job_is_cancelled(job)) {
+        job->paused = true;
+        job->busy = false;
+        qemu_coroutine_yield(); /* wait for block_job_resume() */
+        job->busy = true;
+        job->paused = false;
+    }
+
+    if (job->driver->resume) {
+        job->driver->resume(job);
+    }
+}
+
 void block_job_resume(BlockJob *job)
 {
     assert(job->pause_count > 0);
@@ -360,13 +386,13 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
         return;
     }
 
-    job->busy = false;
-    if (block_job_should_pause(job)) {
-        qemu_coroutine_yield();
-    } else {
+    if (!block_job_should_pause(job)) {
+        job->busy = false;
         co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
+        job->busy = true;
     }
-    job->busy = true;
+
+    block_job_pause_point(job);
 }
 
 void block_job_yield(BlockJob *job)
@@ -378,9 +404,13 @@ void block_job_yield(BlockJob *job)
         return;
     }
 
-    job->busy = false;
-    qemu_coroutine_yield();
-    job->busy = true;
+    if (!block_job_should_pause(job)) {
+        job->busy = false;
+        qemu_coroutine_yield();
+        job->busy = true;
+    }
+
+    block_job_pause_point(job);
 }
 
 BlockJobInfo *block_job_query(BlockJob *job)
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 8fcecf9..7739f37 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -70,6 +70,20 @@ typedef struct BlockJobDriver {
      * never both.
      */
     void (*abort)(BlockJob *job);
+
+    /**
+     * If the callback is not NULL, it will be invoked when the job transitions
+     * into the paused state.  Paused jobs must not perform any asynchronous
+     * I/O or event loop activity.  This callback is used to quiesce jobs.
+     */
+    void coroutine_fn (*pause)(BlockJob *job);
+
+    /**
+     * If the callback is not NULL, it will be invoked when the job transitions
+     * out of the paused state.  Any asynchronous I/O or event loop activity
+     * should be restarted from this callback.
+     */
+    void coroutine_fn (*resume)(BlockJob *job);
 } BlockJobDriver;
 
 /**
@@ -119,13 +133,19 @@ struct BlockJob {
     bool user_paused;
 
     /**
-     * Set to false by the job while it is in a quiescent state, where
-     * no I/O is pending and the job has yielded on any condition
-     * that is not detected by #aio_poll, such as a timer.
+     * Set to false by the job while the coroutine has yielded and may be
+     * re-entered by block_job_enter().  There may still be I/O or event loop
+     * activity pending.
      */
     bool busy;
 
     /**
+     * Set to true by the job while it is in a quiescent state, where
+     * no I/O or event loop activity is pending.
+     */
+    bool paused;
+
+    /**
      * Set to true when the job is ready to be completed.
      */
     bool ready;
@@ -299,6 +319,15 @@ bool block_job_is_cancelled(BlockJob *job);
 BlockJobInfo *block_job_query(BlockJob *job);
 
 /**
+ * block_job_pause_point:
+ * @job: The job that is ready to pause.
+ *
+ * Pause now if block_job_pause() has been called.  Block jobs that perform
+ * lots of I/O must call this between requests so that the job can be paused.
+ */
+void coroutine_fn block_job_pause_point(BlockJob *job);
+
+/**
  * block_job_pause:
  * @job: The job to be paused.
  *
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v5 4/8] blockjob: add block_job_get_aio_context()
  2016-06-16 16:56 [Qemu-devel] [PATCH v5 0/8] blockjob: AioContext change support for mirror and backup Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 3/8] blockjob: add pause points Stefan Hajnoczi
@ 2016-06-16 16:56 ` Stefan Hajnoczi
  2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 5/8] block: use safe iteration over AioContext notifiers Stefan Hajnoczi
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2016-06-16 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, jjherne, Fam Zheng, Paolo Bonzini, Jeff Cody, mreitz,
	Stefan Hajnoczi

Add a helper function to document why block jobs sometimes run in the
QEMU main loop and to avoid code duplication in a following patch.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 blockjob.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 9357456..18722b0 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -60,6 +60,17 @@ BlockJob *block_job_next(BlockJob *job)
     return QLIST_NEXT(job, job_list);
 }
 
+/* Normally the job runs in its BlockBackend's AioContext.  The exception is
+ * block_job_defer_to_main_loop() where it runs in the QEMU main loop.  Code
+ * that supports both cases uses this helper function.
+ */
+static AioContext *block_job_get_aio_context(BlockJob *job)
+{
+    return job->deferred_to_main_loop ?
+           qemu_get_aio_context() :
+           blk_get_aio_context(job->blk);
+}
+
 void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
                        int64_t speed, BlockCompletionFunc *cb,
                        void *opaque, Error **errp)
@@ -337,9 +348,7 @@ static int block_job_finish_sync(BlockJob *job,
         return -EBUSY;
     }
     while (!job->completed) {
-        aio_poll(job->deferred_to_main_loop ? qemu_get_aio_context() :
-                                              blk_get_aio_context(job->blk),
-                 true);
+        aio_poll(block_job_get_aio_context(job), true);
     }
     ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
     block_job_unref(job);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v5 5/8] block: use safe iteration over AioContext notifiers
  2016-06-16 16:56 [Qemu-devel] [PATCH v5 0/8] blockjob: AioContext change support for mirror and backup Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 4/8] blockjob: add block_job_get_aio_context() Stefan Hajnoczi
@ 2016-06-16 16:56 ` Stefan Hajnoczi
  2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 6/8] blockjob: add AioContext attached callback Stefan Hajnoczi
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2016-06-16 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, jjherne, Fam Zheng, Paolo Bonzini, Jeff Cody, mreitz,
	Stefan Hajnoczi

It's possible that an AioContext notifier user was close to finishing
when .detach_aio_context() or .attached_aio_context() is called.  In
that case they may call bdrv_remove_aio_context_notifier() during the
callback.

Use safe iteration to avoid crashing when the notifier list is modified
during iteration.  We must not only handle the case where the current
aio notifier is removed during a callback but also the one where any
other aio notifier is removed.

The next patch adds an AioContext notifier for block jobs and they
really could be terminating just as .detach_aio_context() is invoked.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c                   | 46 ++++++++++++++++++++++++++++++++++++----------
 include/block/block_int.h |  2 ++
 2 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/block.c b/block.c
index f54bc25..691b201 100644
--- a/block.c
+++ b/block.c
@@ -3601,18 +3601,34 @@ AioContext *bdrv_get_aio_context(BlockDriverState *bs)
     return bs->aio_context;
 }
 
+static void bdrv_do_remove_aio_context_notifier(BdrvAioNotifier *ban)
+{
+    QLIST_REMOVE(ban, list);
+    g_free(ban);
+}
+
 void bdrv_detach_aio_context(BlockDriverState *bs)
 {
-    BdrvAioNotifier *baf;
+    BdrvAioNotifier *baf, *baf_tmp;
     BdrvChild *child;
 
     if (!bs->drv) {
         return;
     }
 
-    QLIST_FOREACH(baf, &bs->aio_notifiers, list) {
-        baf->detach_aio_context(baf->opaque);
+    assert(!bs->walking_aio_notifiers);
+    bs->walking_aio_notifiers = true;
+    QLIST_FOREACH_SAFE(baf, &bs->aio_notifiers, list, baf_tmp) {
+        if (baf->deleted) {
+            bdrv_do_remove_aio_context_notifier(baf);
+        } else {
+            baf->detach_aio_context(baf->opaque);
+        }
     }
+    /* Never mind iterating again to check for ->deleted.  bdrv_close() will
+     * remove remaining aio notifiers if we aren't called again.
+     */
+    bs->walking_aio_notifiers = false;
 
     if (bs->drv->bdrv_detach_aio_context) {
         bs->drv->bdrv_detach_aio_context(bs);
@@ -3627,7 +3643,7 @@ void bdrv_detach_aio_context(BlockDriverState *bs)
 void bdrv_attach_aio_context(BlockDriverState *bs,
                              AioContext *new_context)
 {
-    BdrvAioNotifier *ban;
+    BdrvAioNotifier *ban, *ban_tmp;
     BdrvChild *child;
 
     if (!bs->drv) {
@@ -3643,9 +3659,16 @@ void bdrv_attach_aio_context(BlockDriverState *bs,
         bs->drv->bdrv_attach_aio_context(bs, new_context);
     }
 
-    QLIST_FOREACH(ban, &bs->aio_notifiers, list) {
-        ban->attached_aio_context(new_context, ban->opaque);
+    assert(!bs->walking_aio_notifiers);
+    bs->walking_aio_notifiers = true;
+    QLIST_FOREACH_SAFE(ban, &bs->aio_notifiers, list, ban_tmp) {
+        if (ban->deleted) {
+            bdrv_do_remove_aio_context_notifier(ban);
+        } else {
+            ban->attached_aio_context(new_context, ban->opaque);
+        }
     }
+    bs->walking_aio_notifiers = false;
 }
 
 void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
@@ -3687,11 +3710,14 @@ void bdrv_remove_aio_context_notifier(BlockDriverState *bs,
     QLIST_FOREACH_SAFE(ban, &bs->aio_notifiers, list, ban_next) {
         if (ban->attached_aio_context == attached_aio_context &&
             ban->detach_aio_context   == detach_aio_context   &&
-            ban->opaque               == opaque)
+            ban->opaque               == opaque               &&
+            ban->deleted              == false)
         {
-            QLIST_REMOVE(ban, list);
-            g_free(ban);
-
+            if (bs->walking_aio_notifiers) {
+                ban->deleted = true;
+            } else {
+                bdrv_do_remove_aio_context_notifier(ban);
+            }
             return;
         }
     }
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 8a4963c..647e97b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -359,6 +359,7 @@ typedef struct BdrvAioNotifier {
     void (*detach_aio_context)(void *opaque);
 
     void *opaque;
+    bool deleted;
 
     QLIST_ENTRY(BdrvAioNotifier) list;
 } BdrvAioNotifier;
@@ -425,6 +426,7 @@ struct BlockDriverState {
      * BDS may register themselves in this list to be notified of changes
      * regarding this BDS's context */
     QLIST_HEAD(, BdrvAioNotifier) aio_notifiers;
+    bool walking_aio_notifiers; /* to make removal during iteration safe */
 
     char filename[PATH_MAX];
     char backing_file[PATH_MAX]; /* if non zero, the image is a diff of
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v5 6/8] blockjob: add AioContext attached callback
  2016-06-16 16:56 [Qemu-devel] [PATCH v5 0/8] blockjob: AioContext change support for mirror and backup Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 5/8] block: use safe iteration over AioContext notifiers Stefan Hajnoczi
@ 2016-06-16 16:56 ` Stefan Hajnoczi
  2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 7/8] mirror: follow AioContext change gracefully Stefan Hajnoczi
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2016-06-16 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, jjherne, Fam Zheng, Paolo Bonzini, Jeff Cody, mreitz,
	Stefan Hajnoczi

Block jobs that use additional BDSes or event loop resources need a
callback to get their affairs in order when the AioContext is switched.

Simple block jobs don't need an attach callback, they automatically work
thanks to the generic attach/detach notifiers that this patch adds.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 blockjob.c               | 38 ++++++++++++++++++++++++++++++++++++++
 include/block/blockjob.h |  7 +++++++
 2 files changed, 45 insertions(+)

diff --git a/blockjob.c b/blockjob.c
index 18722b0..ef47b62 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -71,6 +71,38 @@ static AioContext *block_job_get_aio_context(BlockJob *job)
            blk_get_aio_context(job->blk);
 }
 
+static void block_job_attached_aio_context(AioContext *new_context,
+                                           void *opaque)
+{
+    BlockJob *job = opaque;
+
+    if (job->driver->attached_aio_context) {
+        job->driver->attached_aio_context(job, new_context);
+    }
+
+    block_job_resume(job);
+}
+
+static void block_job_detach_aio_context(void *opaque)
+{
+    BlockJob *job = opaque;
+
+    /* In case the job terminates during aio_poll()... */
+    block_job_ref(job);
+
+    block_job_pause(job);
+
+    if (!job->paused) {
+        /* If job is !job->busy this kicks it into the next pause point. */
+        block_job_enter(job);
+    }
+    while (!job->paused && !job->completed) {
+        aio_poll(block_job_get_aio_context(job), true);
+    }
+
+    block_job_unref(job);
+}
+
 void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
                        int64_t speed, BlockCompletionFunc *cb,
                        void *opaque, Error **errp)
@@ -103,6 +135,9 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
 
     QLIST_INSERT_HEAD(&block_jobs, job, job_list);
 
+    blk_add_aio_context_notifier(blk, block_job_attached_aio_context,
+                                 block_job_detach_aio_context, job);
+
     /* Only set speed when necessary to avoid NotSupported error */
     if (speed != 0) {
         Error *local_err = NULL;
@@ -128,6 +163,9 @@ void block_job_unref(BlockJob *job)
         BlockDriverState *bs = blk_bs(job->blk);
         bs->job = NULL;
         bdrv_op_unblock_all(bs, job->blocker);
+        blk_remove_aio_context_notifier(job->blk,
+                                        block_job_attached_aio_context,
+                                        block_job_detach_aio_context, job);
         blk_unref(job->blk);
         error_free(job->blocker);
         g_free(job->id);
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 7739f37..7dc720c 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -84,6 +84,13 @@ typedef struct BlockJobDriver {
      * should be restarted from this callback.
      */
     void coroutine_fn (*resume)(BlockJob *job);
+
+    /*
+     * If the callback is not NULL, it will be invoked before the job is
+     * resumed in a new AioContext.  This is the place to move any resources
+     * besides job->blk to the new AioContext.
+     */
+    void (*attached_aio_context)(BlockJob *job, AioContext *new_context);
 } BlockJobDriver;
 
 /**
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v5 7/8] mirror: follow AioContext change gracefully
  2016-06-16 16:56 [Qemu-devel] [PATCH v5 0/8] blockjob: AioContext change support for mirror and backup Stefan Hajnoczi
                   ` (5 preceding siblings ...)
  2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 6/8] blockjob: add AioContext attached callback Stefan Hajnoczi
@ 2016-06-16 16:56 ` Stefan Hajnoczi
  2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 8/8] backup: " Stefan Hajnoczi
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2016-06-16 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, jjherne, Fam Zheng, Paolo Bonzini, Jeff Cody, mreitz,
	Stefan Hajnoczi

Add block_job_pause_point() calls to mark quiescent points and make sure
to complete in-flight requests when switching AioContexts.

This patch solves undefined behavior in the mirror block job when the
BDS AioContext is changed by dataplane.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/mirror.c | 43 +++++++++++++++++++++++++++++++++++--------
 1 file changed, 35 insertions(+), 8 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 80fd3c7..86e0284 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -331,6 +331,8 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         mirror_wait_for_io(s);
     }
 
+    block_job_pause_point(&s->common);
+
     /* Find the number of consective dirty chunks following the first dirty
      * one, and wait for in flight requests in them. */
     while (nb_chunks * sectors_per_chunk < (s->buf_size >> BDRV_SECTOR_BITS)) {
@@ -581,6 +583,8 @@ static void coroutine_fn mirror_run(void *opaque)
             if (now - last_pause_ns > SLICE_TIME) {
                 last_pause_ns = now;
                 block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, 0);
+            } else {
+                block_job_pause_point(&s->common);
             }
 
             if (block_job_is_cancelled(&s->common)) {
@@ -612,6 +616,8 @@ static void coroutine_fn mirror_run(void *opaque)
             goto immediate_exit;
         }
 
+        block_job_pause_point(&s->common);
+
         cnt = bdrv_get_dirty_count(s->dirty_bitmap);
         /* s->common.offset contains the number of bytes already processed so
          * far, cnt is the number of dirty sectors remaining and
@@ -781,18 +787,39 @@ static void mirror_complete(BlockJob *job, Error **errp)
     block_job_enter(&s->common);
 }
 
+/* There is no matching mirror_resume() because mirror_run() will begin
+ * iterating again when the job is resumed.
+ */
+static void coroutine_fn mirror_pause(BlockJob *job)
+{
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+
+    mirror_drain(s);
+}
+
+static void mirror_attached_aio_context(BlockJob *job, AioContext *new_context)
+{
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+
+    blk_set_aio_context(s->target, new_context);
+}
+
 static const BlockJobDriver mirror_job_driver = {
-    .instance_size = sizeof(MirrorBlockJob),
-    .job_type      = BLOCK_JOB_TYPE_MIRROR,
-    .set_speed     = mirror_set_speed,
-    .complete      = mirror_complete,
+    .instance_size          = sizeof(MirrorBlockJob),
+    .job_type               = BLOCK_JOB_TYPE_MIRROR,
+    .set_speed              = mirror_set_speed,
+    .complete               = mirror_complete,
+    .pause                  = mirror_pause,
+    .attached_aio_context   = mirror_attached_aio_context,
 };
 
 static const BlockJobDriver commit_active_job_driver = {
-    .instance_size = sizeof(MirrorBlockJob),
-    .job_type      = BLOCK_JOB_TYPE_COMMIT,
-    .set_speed     = mirror_set_speed,
-    .complete      = mirror_complete,
+    .instance_size          = sizeof(MirrorBlockJob),
+    .job_type               = BLOCK_JOB_TYPE_COMMIT,
+    .set_speed              = mirror_set_speed,
+    .complete               = mirror_complete,
+    .pause                  = mirror_pause,
+    .attached_aio_context   = mirror_attached_aio_context,
 };
 
 static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v5 8/8] backup: follow AioContext change gracefully
  2016-06-16 16:56 [Qemu-devel] [PATCH v5 0/8] blockjob: AioContext change support for mirror and backup Stefan Hajnoczi
                   ` (6 preceding siblings ...)
  2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 7/8] mirror: follow AioContext change gracefully Stefan Hajnoczi
@ 2016-06-16 16:56 ` Stefan Hajnoczi
  2016-06-16 17:06 ` [Qemu-devel] [PATCH v5 0/8] blockjob: AioContext change support for mirror and backup Paolo Bonzini
  2016-06-17 12:11 ` Stefan Hajnoczi
  9 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2016-06-16 16:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, jjherne, Fam Zheng, Paolo Bonzini, Jeff Cody, mreitz,
	Stefan Hajnoczi

Move s->target to the new AioContext when there is an AioContext change.

The backup_run() coroutine does not use asynchronous I/O so there is no
need to wait for in-flight requests in a BlockJobDriver->pause()
callback.

Guest writes are intercepted by the backup job.  Treat them as guest
activity and do it even while the job is paused.  This is necessary
since the only alternative would be to fail a job that experienced guest
writes during pause once the job is resumed.  In practice the guest
writes don't interfere with AioContext switching since bdrv_drain() is
used by bdrv_set_aio_context().

Loops already contain pause points because of block_job_sleep_ns() calls
in the yield_and_check() helper function.  It is necessary to convert a
raw qemu_coroutine_yield() to block_job_yield() so the
MIRROR_SYNC_MODE_NONE case can pause.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/backup.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index feeb9f8..581269b 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -246,12 +246,20 @@ static void backup_abort(BlockJob *job)
     }
 }
 
+static void backup_attached_aio_context(BlockJob *job, AioContext *aio_context)
+{
+    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
+
+    blk_set_aio_context(s->target, aio_context);
+}
+
 static const BlockJobDriver backup_job_driver = {
-    .instance_size  = sizeof(BackupBlockJob),
-    .job_type       = BLOCK_JOB_TYPE_BACKUP,
-    .set_speed      = backup_set_speed,
-    .commit         = backup_commit,
-    .abort          = backup_abort,
+    .instance_size          = sizeof(BackupBlockJob),
+    .job_type               = BLOCK_JOB_TYPE_BACKUP,
+    .set_speed              = backup_set_speed,
+    .commit                 = backup_commit,
+    .abort                  = backup_abort,
+    .attached_aio_context   = backup_attached_aio_context,
 };
 
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
@@ -392,9 +400,7 @@ static void coroutine_fn backup_run(void *opaque)
         while (!block_job_is_cancelled(&job->common)) {
             /* Yield until the job is cancelled.  We just let our before_write
              * notify callback service CoW requests. */
-            job->common.busy = false;
-            qemu_coroutine_yield();
-            job->common.busy = true;
+            block_job_yield(&job->common);
         }
     } else if (job->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
         ret = backup_run_incremental(job);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/8] blockjob: AioContext change support for mirror and backup
  2016-06-16 16:56 [Qemu-devel] [PATCH v5 0/8] blockjob: AioContext change support for mirror and backup Stefan Hajnoczi
                   ` (7 preceding siblings ...)
  2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 8/8] backup: " Stefan Hajnoczi
@ 2016-06-16 17:06 ` Paolo Bonzini
  2016-06-17 12:11 ` Stefan Hajnoczi
  9 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2016-06-16 17:06 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Kevin Wolf, jjherne, Fam Zheng, Jeff Cody, mreitz

On 16/06/2016 18:56, Stefan Hajnoczi wrote:
> v5:
>  * Mark .pause()/.resume() as coroutine_fn [Paolo]
>  * Switch to coroutine-friendly mirror_drain() in mirror_pause() [Paolo]
>  * Rename block_job_is_paused() to block_job_should_pause() [Fam]
>  * New patch to add block_job_get_aio_context() helper and use it in the
>    aio_poll() loop [Fam]
>  * New patch to make bdrv_remove_aio_notifier() safe during attached/detach
>    callbacks

Good catch too.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

> v4:
>  * Add .pause()/.resume() callbacks to really quiesce during
>    block_job_pause_point() [Paolo]
>  * Add AioContext change support for backup block job
>  * Tested drive_mirror + migration and drive_backup + reboot
> 
> v3:
>  * Push infrastructure down into blockjob.c so other jobs can reuse it [Stefan]
>  * Tested with drive_mirror + migration [Stefan]
> 
> v2:
>  * Fam introduced the concept of a synchronous aio_poll() loop to quiesce the
>    block job during detach
> 
> When dataplane is enabled or disabled the drive switches to a new AioContext.
> The mirror and backup block jobs must also move to the new AioContext so that
> drive accesses are always made within its AioContext.
> 
> This series extends the block job pause functionality so that detaching from an
> AioContext pauses the job and attaching to the new AioContext resumes the job.
> 
> Pause points are added to the mirror job so that long I/O loops can yield for
> an AioContext switch.  Other block jobs need pause points too but this can be
> done as a follow-up series.
> 
> Stefan Hajnoczi (8):
>   blockjob: move iostatus reset out of block_job_enter()
>   blockjob: rename block_job_is_paused()
>   blockjob: add pause points
>   blockjob: add block_job_get_aio_context()
>   block: use safe iteration over AioContext notifiers
>   blockjob: add AioContext attached callback
>   mirror: follow AioContext change gracefully
>   backup: follow AioContext change gracefully
> 
>  block.c                   |  46 ++++++++++++++++-----
>  block/backup.c            |  22 ++++++----
>  block/mirror.c            |  43 +++++++++++++++----
>  blockdev.c                |   1 +
>  blockjob.c                | 103 ++++++++++++++++++++++++++++++++++++++++------
>  include/block/block_int.h |   2 +
>  include/block/blockjob.h  |  51 +++++++++++++++++------
>  7 files changed, 217 insertions(+), 51 deletions(-)
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/8] blockjob: AioContext change support for mirror and backup
  2016-06-16 16:56 [Qemu-devel] [PATCH v5 0/8] blockjob: AioContext change support for mirror and backup Stefan Hajnoczi
                   ` (8 preceding siblings ...)
  2016-06-16 17:06 ` [Qemu-devel] [PATCH v5 0/8] blockjob: AioContext change support for mirror and backup Paolo Bonzini
@ 2016-06-17 12:11 ` Stefan Hajnoczi
  9 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2016-06-17 12:11 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Kevin Wolf, Fam Zheng, Jeff Cody, mreitz, jjherne,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2511 bytes --]

On Thu, Jun 16, 2016 at 05:56:21PM +0100, Stefan Hajnoczi wrote:
> v5:
>  * Mark .pause()/.resume() as coroutine_fn [Paolo]
>  * Switch to coroutine-friendly mirror_drain() in mirror_pause() [Paolo]
>  * Rename block_job_is_paused() to block_job_should_pause() [Fam]
>  * New patch to add block_job_get_aio_context() helper and use it in the
>    aio_poll() loop [Fam]
>  * New patch to make bdrv_remove_aio_notifier() safe during attached/detach
>    callbacks
> 
> v4:
>  * Add .pause()/.resume() callbacks to really quiesce during
>    block_job_pause_point() [Paolo]
>  * Add AioContext change support for backup block job
>  * Tested drive_mirror + migration and drive_backup + reboot
> 
> v3:
>  * Push infrastructure down into blockjob.c so other jobs can reuse it [Stefan]
>  * Tested with drive_mirror + migration [Stefan]
> 
> v2:
>  * Fam introduced the concept of a synchronous aio_poll() loop to quiesce the
>    block job during detach
> 
> When dataplane is enabled or disabled the drive switches to a new AioContext.
> The mirror and backup block jobs must also move to the new AioContext so that
> drive accesses are always made within its AioContext.
> 
> This series extends the block job pause functionality so that detaching from an
> AioContext pauses the job and attaching to the new AioContext resumes the job.
> 
> Pause points are added to the mirror job so that long I/O loops can yield for
> an AioContext switch.  Other block jobs need pause points too but this can be
> done as a follow-up series.
> 
> Stefan Hajnoczi (8):
>   blockjob: move iostatus reset out of block_job_enter()
>   blockjob: rename block_job_is_paused()
>   blockjob: add pause points
>   blockjob: add block_job_get_aio_context()
>   block: use safe iteration over AioContext notifiers
>   blockjob: add AioContext attached callback
>   mirror: follow AioContext change gracefully
>   backup: follow AioContext change gracefully
> 
>  block.c                   |  46 ++++++++++++++++-----
>  block/backup.c            |  22 ++++++----
>  block/mirror.c            |  43 +++++++++++++++----
>  blockdev.c                |   1 +
>  blockjob.c                | 103 ++++++++++++++++++++++++++++++++++++++++------
>  include/block/block_int.h |   2 +
>  include/block/blockjob.h  |  51 +++++++++++++++++------
>  7 files changed, 217 insertions(+), 51 deletions(-)

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-06-17 12:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-16 16:56 [Qemu-devel] [PATCH v5 0/8] blockjob: AioContext change support for mirror and backup Stefan Hajnoczi
2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 1/8] blockjob: move iostatus reset out of block_job_enter() Stefan Hajnoczi
2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 2/8] blockjob: rename block_job_is_paused() Stefan Hajnoczi
2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 3/8] blockjob: add pause points Stefan Hajnoczi
2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 4/8] blockjob: add block_job_get_aio_context() Stefan Hajnoczi
2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 5/8] block: use safe iteration over AioContext notifiers Stefan Hajnoczi
2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 6/8] blockjob: add AioContext attached callback Stefan Hajnoczi
2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 7/8] mirror: follow AioContext change gracefully Stefan Hajnoczi
2016-06-16 16:56 ` [Qemu-devel] [PATCH v5 8/8] backup: " Stefan Hajnoczi
2016-06-16 17:06 ` [Qemu-devel] [PATCH v5 0/8] blockjob: AioContext change support for mirror and backup Paolo Bonzini
2016-06-17 12:11 ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.