[Qemu-devel] [PATCH 0/4] blockjob: reinstate busy=false/busy=true in block_job_sleep

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/4] blockjob: reinstate busy=false/busy=true in block_job_sleep_ns
@ 2017-11-29 10:25 Paolo Bonzini
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 1/4] block: Expect graph changes in bdrv_parent_drained_begin/end Paolo Bonzini
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Paolo Bonzini @ 2017-11-29 10:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, qemu-block, kwolf, jcody

In order to avoid reentering a scheduled coroutine this series uses
a QEMUTimer outside co_aio_sleep_ns.  The timer callback just calls
block_job_enter, and can be easily cancelled with timer_del.

This fixes the outstanding issues with block jobs, without reverting
the new checks that Jeff added.

I have not run the full qemu-iotests suite, but the main affected
tests (030, 041, 055, 097, 176, 200) work fine.

Paolo

Kevin Wolf (1):
  block: Expect graph changes in bdrv_parent_drained_begin/end

Paolo Bonzini (3):
  blockjob: remove clock argument from block_job_sleep_ns
  blockjob: introduce block_job_do_yield
  blockjob: reimplement block_job_sleep_ns to allow cancellation

 block/backup.c               |  4 +--
 block/commit.c               |  2 +-
 block/io.c                   |  8 ++---
 block/mirror.c               |  6 ++--
 block/stream.c               |  2 +-
 blockjob.c                   | 76 ++++++++++++++++++++++++++++++++++----------
 include/block/blockjob.h     |  5 ++-
 include/block/blockjob_int.h |  7 ++--
 tests/test-blockjob-txn.c    |  2 +-
 9 files changed, 79 insertions(+), 33 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 1/4] block: Expect graph changes in bdrv_parent_drained_begin/end
  2017-11-29 10:25 [Qemu-devel] [PATCH 0/4] blockjob: reinstate busy=false/busy=true in block_job_sleep_ns Paolo Bonzini
@ 2017-11-29 10:25 ` Paolo Bonzini
  2017-11-29 12:27   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
  2017-11-29 13:44   ` Stefan Hajnoczi
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 2/4] blockjob: remove clock argument from block_job_sleep_ns Paolo Bonzini
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 23+ messages in thread
From: Paolo Bonzini @ 2017-11-29 10:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, qemu-block, kwolf, jcody

From: Kevin Wolf <kwolf@redhat.com>

The .drained_begin/end callbacks can (directly or indirectly via
aio_poll()) cause block nodes to be removed or the current BdrvChild to
point to a different child node.

Use QLIST_FOREACH_SAFE() to make sure we don't access invalid
BlockDriverStates or accidentally continue iterating the parents of the
new child node instead of the node we actually came from.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
---
 block/io.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/io.c b/block/io.c
index 4fdf93a014..6773926fc1 100644
--- a/block/io.c
+++ b/block/io.c
@@ -42,9 +42,9 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
 
 void bdrv_parent_drained_begin(BlockDriverState *bs)
 {
-    BdrvChild *c;
+    BdrvChild *c, *next;
 
-    QLIST_FOREACH(c, &bs->parents, next_parent) {
+    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
         if (c->role->drained_begin) {
             c->role->drained_begin(c);
         }
@@ -53,9 +53,9 @@ void bdrv_parent_drained_begin(BlockDriverState *bs)
 
 void bdrv_parent_drained_end(BlockDriverState *bs)
 {
-    BdrvChild *c;
+    BdrvChild *c, *next;
 
-    QLIST_FOREACH(c, &bs->parents, next_parent) {
+    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
         if (c->role->drained_end) {
             c->role->drained_end(c);
         }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 2/4] blockjob: remove clock argument from block_job_sleep_ns
  2017-11-29 10:25 [Qemu-devel] [PATCH 0/4] blockjob: reinstate busy=false/busy=true in block_job_sleep_ns Paolo Bonzini
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 1/4] block: Expect graph changes in bdrv_parent_drained_begin/end Paolo Bonzini
@ 2017-11-29 10:25 ` Paolo Bonzini
  2017-11-29 12:45   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
                     ` (2 more replies)
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 3/4] blockjob: introduce block_job_do_yield Paolo Bonzini
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 23+ messages in thread
From: Paolo Bonzini @ 2017-11-29 10:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, qemu-block, kwolf, jcody

All callers are using QEMU_CLOCK_REALTIME, and it will not be possible to
support more than one clock when block_job_sleep_ns switches to a single
timer stored in the BlockJob struct.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/backup.c               | 4 ++--
 block/commit.c               | 2 +-
 block/mirror.c               | 6 +++---
 block/stream.c               | 2 +-
 blockjob.c                   | 5 +++--
 include/block/blockjob_int.h | 7 +++----
 tests/test-blockjob-txn.c    | 2 +-
 7 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 06ddbfd03d..99e6bcc748 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -346,9 +346,9 @@ static bool coroutine_fn yield_and_check(BackupBlockJob *job)
         uint64_t delay_ns = ratelimit_calculate_delay(&job->limit,
                                                       job->bytes_read);
         job->bytes_read = 0;
-        block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, delay_ns);
+        block_job_sleep_ns(&job->common, delay_ns);
     } else {
-        block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, 0);
+        block_job_sleep_ns(&job->common, 0);
     }
 
     if (block_job_is_cancelled(&job->common)) {
diff --git a/block/commit.c b/block/commit.c
index 5036eec434..c5327551ce 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -174,7 +174,7 @@ static void coroutine_fn commit_run(void *opaque)
         /* Note that even when no rate limit is applied we need to yield
          * with no pending I/O here so that bdrv_drain_all() returns.
          */
-        block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
+        block_job_sleep_ns(&s->common, delay_ns);
         if (block_job_is_cancelled(&s->common)) {
             break;
         }
diff --git a/block/mirror.c b/block/mirror.c
index 307b6391a8..c9badc1203 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -598,7 +598,7 @@ static void mirror_throttle(MirrorBlockJob *s)
 
     if (now - s->last_pause_ns > SLICE_TIME) {
         s->last_pause_ns = now;
-        block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, 0);
+        block_job_sleep_ns(&s->common, 0);
     } else {
         block_job_pause_point(&s->common);
     }
@@ -870,13 +870,13 @@ static void coroutine_fn mirror_run(void *opaque)
         ret = 0;
         trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
         if (!s->synced) {
-            block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
+            block_job_sleep_ns(&s->common, delay_ns);
             if (block_job_is_cancelled(&s->common)) {
                 break;
             }
         } else if (!should_complete) {
             delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
-            block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
+            block_job_sleep_ns(&s->common, delay_ns);
         }
         s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
     }
diff --git a/block/stream.c b/block/stream.c
index e6f72346e5..499cdacdb0 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -141,7 +141,7 @@ static void coroutine_fn stream_run(void *opaque)
         /* Note that even when no rate limit is applied we need to yield
          * with no pending I/O here so that bdrv_drain_all() returns.
          */
-        block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
+        block_job_sleep_ns(&s->common, delay_ns);
         if (block_job_is_cancelled(&s->common)) {
             break;
         }
diff --git a/blockjob.c b/blockjob.c
index 2f0cc1528b..db9e4fc89a 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -788,7 +788,7 @@ bool block_job_is_cancelled(BlockJob *job)
     return job->cancelled;
 }
 
-void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
+void block_job_sleep_ns(BlockJob *job, int64_t ns)
 {
     assert(job->busy);
 
@@ -803,7 +803,8 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
      * it wakes and runs, otherwise we risk double-entry or entry after
      * completion. */
     if (!block_job_should_pause(job)) {
-        co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
+        co_aio_sleep_ns(blk_get_aio_context(job->blk),
+                        QEMU_CLOCK_REALTIME, ns);
     }
 
     block_job_pause_point(job);
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 43f3be2965..f7ab183a39 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -139,14 +139,13 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
 /**
  * block_job_sleep_ns:
  * @job: The job that calls the function.
- * @clock: The clock to sleep on.
  * @ns: How many nanoseconds to stop for.
  *
  * Put the job to sleep (assuming that it wasn't canceled) for @ns
- * nanoseconds.  Canceling the job will not interrupt the wait, so the
- * cancel will not process until the coroutine wakes up.
+ * %QEMU_CLOCK_REALTIME nanoseconds.  Canceling the job will not interrupt
+ * the wait, so the cancel will not process until the coroutine wakes up.
  */
-void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
+void block_job_sleep_ns(BlockJob *job, int64_t ns);
 
 /**
  * block_job_yield:
diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
index c77343fc04..3591c9617f 100644
--- a/tests/test-blockjob-txn.c
+++ b/tests/test-blockjob-txn.c
@@ -44,7 +44,7 @@ static void coroutine_fn test_block_job_run(void *opaque)
 
     while (s->iterations--) {
         if (s->use_timer) {
-            block_job_sleep_ns(job, QEMU_CLOCK_REALTIME, 0);
+            block_job_sleep_ns(job, 0);
         } else {
             block_job_yield(job);
         }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 3/4] blockjob: introduce block_job_do_yield
  2017-11-29 10:25 [Qemu-devel] [PATCH 0/4] blockjob: reinstate busy=false/busy=true in block_job_sleep_ns Paolo Bonzini
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 1/4] block: Expect graph changes in bdrv_parent_drained_begin/end Paolo Bonzini
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 2/4] blockjob: remove clock argument from block_job_sleep_ns Paolo Bonzini
@ 2017-11-29 10:25 ` Paolo Bonzini
  2017-11-29 13:30   ` Jeff Cody
  2017-11-29 13:55   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 4/4] blockjob: reimplement block_job_sleep_ns to allow cancellation Paolo Bonzini
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 23+ messages in thread
From: Paolo Bonzini @ 2017-11-29 10:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, qemu-block, kwolf, jcody

Hide the clearing of job->busy in a single function, and set it
in block_job_enter.  This lets block_job_do_yield verify that
qemu_coroutine_enter is not used while job->busy = false.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockjob.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index db9e4fc89a..4d22b7d2fb 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -729,6 +729,15 @@ static bool block_job_should_pause(BlockJob *job)
     return job->pause_count > 0;
 }
 
+static void block_job_do_yield(BlockJob *job)
+{
+    job->busy = false;
+    qemu_coroutine_yield();
+
+    /* Set by block_job_enter before re-entering the coroutine.  */
+    assert(job->busy);
+}
+
 void coroutine_fn block_job_pause_point(BlockJob *job)
 {
     assert(job && block_job_started(job));
@@ -746,9 +755,7 @@ void coroutine_fn block_job_pause_point(BlockJob *job)
 
     if (block_job_should_pause(job) && !block_job_is_cancelled(job)) {
         job->paused = true;
-        job->busy = false;
-        qemu_coroutine_yield(); /* wait for block_job_resume() */
-        job->busy = true;
+        block_job_do_yield(job);
         job->paused = false;
     }
 
@@ -778,9 +785,12 @@ void block_job_enter(BlockJob *job)
         return;
     }
 
-    if (!job->busy) {
-        bdrv_coroutine_enter(blk_bs(job->blk), job->co);
+    if (job->busy) {
+        return;
     }
+
+    job->busy = true;
+    aio_co_wake(job->co);
 }
 
 bool block_job_is_cancelled(BlockJob *job)
@@ -819,11 +829,9 @@ void block_job_yield(BlockJob *job)
         return;
     }
 
-    job->busy = false;
     if (!block_job_should_pause(job)) {
-        qemu_coroutine_yield();
+        block_job_do_yield(job);
     }
-    job->busy = true;
 
     block_job_pause_point(job);
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 4/4] blockjob: reimplement block_job_sleep_ns to allow cancellation
  2017-11-29 10:25 [Qemu-devel] [PATCH 0/4] blockjob: reinstate busy=false/busy=true in block_job_sleep_ns Paolo Bonzini
                   ` (2 preceding siblings ...)
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 3/4] blockjob: introduce block_job_do_yield Paolo Bonzini
@ 2017-11-29 10:25 ` Paolo Bonzini
  2017-11-29 12:54   ` Kevin Wolf
                     ` (3 more replies)
  2017-11-29 13:13 ` [Qemu-devel] [PATCH 0/4] blockjob: reinstate busy=false/busy=true in block_job_sleep_ns Jeff Cody
  2017-11-29 13:14 ` Fam Zheng
  5 siblings, 4 replies; 23+ messages in thread
From: Paolo Bonzini @ 2017-11-29 10:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, qemu-block, kwolf, jcody

This reverts the effects of commit 4afeffc857 ("blockjob: do not allow
coroutine double entry or entry-after-completion", 2017-11-21)

This fixed the symptom of a bug rather than the root cause. Canceling the
wait on a sleeping blockjob coroutine is generally fine, we just need to
make it work correctly across AioContexts.  To do so, use a QEMUTimer
that calls block_job_enter.  Use a mutex to ensure that block_job_enter
synchronizes correctly with block_job_sleep_ns.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockjob.c                   | 57 +++++++++++++++++++++++++++++++++++---------
 include/block/blockjob.h     |  5 +++-
 include/block/blockjob_int.h |  4 ++--
 3 files changed, 52 insertions(+), 14 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 4d22b7d2fb..3fdaabbc1f 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -37,6 +37,26 @@
 #include "qemu/timer.h"
 #include "qapi-event.h"
 
+/* Right now, this mutex is only needed to synchronize accesses to job->busy,
+ * especially concurrent calls to block_job_enter.
+ */
+static QemuMutex block_job_mutex;
+
+static void block_job_lock(void)
+{
+    qemu_mutex_lock(&block_job_mutex);
+}
+
+static void block_job_unlock(void)
+{
+    qemu_mutex_unlock(&block_job_mutex);
+}
+
+static void __attribute__((__constructor__)) block_job_init(void)
+{
+    qemu_mutex_init(&block_job_mutex);
+}
+
 static void block_job_event_cancelled(BlockJob *job);
 static void block_job_event_completed(BlockJob *job, const char *msg);
 
@@ -161,6 +181,7 @@ void block_job_unref(BlockJob *job)
         blk_unref(job->blk);
         error_free(job->blocker);
         g_free(job->id);
+        assert(!timer_pending(&job->sleep_timer));
         g_free(job);
     }
 }
@@ -287,6 +308,13 @@ static void coroutine_fn block_job_co_entry(void *opaque)
     job->driver->start(job);
 }
 
+static void block_job_sleep_timer_cb(void *opaque)
+{
+    BlockJob *job = opaque;
+
+    block_job_enter(job);
+}
+
 void block_job_start(BlockJob *job)
 {
     assert(job && !block_job_started(job) && job->paused &&
@@ -556,7 +584,7 @@ BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
     info->type      = g_strdup(BlockJobType_str(job->driver->job_type));
     info->device    = g_strdup(job->id);
     info->len       = job->len;
-    info->busy      = job->busy;
+    info->busy      = atomic_read(&job->busy);
     info->paused    = job->pause_count > 0;
     info->offset    = job->offset;
     info->speed     = job->speed;
@@ -664,6 +692,9 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     job->paused        = true;
     job->pause_count   = 1;
     job->refcnt        = 1;
+    aio_timer_init(qemu_get_aio_context(), &job->sleep_timer,
+                   QEMU_CLOCK_REALTIME, SCALE_NS,
+                   block_job_sleep_timer_cb, job);
 
     error_setg(&job->blocker, "block device is in use by block job: %s",
                BlockJobType_str(driver->job_type));
@@ -729,9 +760,14 @@ static bool block_job_should_pause(BlockJob *job)
     return job->pause_count > 0;
 }
 
-static void block_job_do_yield(BlockJob *job)
+static void block_job_do_yield(BlockJob *job, uint64_t ns)
 {
+    block_job_lock();
+    if (ns != -1) {
+        timer_mod(&job->sleep_timer, ns);
+    }
     job->busy = false;
+    block_job_unlock();
     qemu_coroutine_yield();
 
     /* Set by block_job_enter before re-entering the coroutine.  */
@@ -755,7 +791,7 @@ void coroutine_fn block_job_pause_point(BlockJob *job)
 
     if (block_job_should_pause(job) && !block_job_is_cancelled(job)) {
         job->paused = true;
-        block_job_do_yield(job);
+        block_job_do_yield(job, -1);
         job->paused = false;
     }
 
@@ -785,11 +821,16 @@ void block_job_enter(BlockJob *job)
         return;
     }
 
+    block_job_lock();
     if (job->busy) {
+        block_job_unlock();
         return;
     }
 
+    assert(!job->deferred_to_main_loop);
+    timer_del(&job->sleep_timer);
     job->busy = true;
+    block_job_unlock();
     aio_co_wake(job->co);
 }
 
@@ -807,14 +848,8 @@ void block_job_sleep_ns(BlockJob *job, int64_t ns)
         return;
     }
 
-    /* We need to leave job->busy set here, because when we have
-     * put a coroutine to 'sleep', we have scheduled it to run in
-     * the future.  We cannot enter that same coroutine again before
-     * it wakes and runs, otherwise we risk double-entry or entry after
-     * completion. */
     if (!block_job_should_pause(job)) {
-        co_aio_sleep_ns(blk_get_aio_context(job->blk),
-                        QEMU_CLOCK_REALTIME, ns);
+        block_job_do_yield(job, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + ns);
     }
 
     block_job_pause_point(job);
@@ -830,7 +865,7 @@ void block_job_yield(BlockJob *job)
     }
 
     if (!block_job_should_pause(job)) {
-        block_job_do_yield(job);
+        block_job_do_yield(job, -1);
     }
 
     block_job_pause_point(job);
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 67c0968fa5..956f0d6819 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -77,7 +77,7 @@ typedef struct BlockJob {
     /**
      * Set to false by the job while the coroutine has yielded and may be
      * re-entered by block_job_enter().  There may still be I/O or event loop
-     * activity pending.
+     * activity pending.  Accessed under block_job_mutex (in blockjob.c).
      */
     bool busy;
 
@@ -135,6 +135,9 @@ typedef struct BlockJob {
      */
     int ret;
 
+    /** Timer that is used by @block_job_sleep_ns.  */
+    QEMUTimer sleep_timer;
+
     /** Non-NULL if this job is part of a transaction */
     BlockJobTxn *txn;
     QLIST_ENTRY(BlockJob) txn_list;
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index f7ab183a39..c9b23b0cc9 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -142,8 +142,8 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
  * @ns: How many nanoseconds to stop for.
  *
  * Put the job to sleep (assuming that it wasn't canceled) for @ns
- * %QEMU_CLOCK_REALTIME nanoseconds.  Canceling the job will not interrupt
- * the wait, so the cancel will not process until the coroutine wakes up.
+ * %QEMU_CLOCK_REALTIME nanoseconds.  Canceling the job will immediately
+ * interrupt the wait.
  */
 void block_job_sleep_ns(BlockJob *job, int64_t ns);
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/4] block: Expect graph changes in bdrv_parent_drained_begin/end
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 1/4] block: Expect graph changes in bdrv_parent_drained_begin/end Paolo Bonzini
@ 2017-11-29 12:27   ` Alberto Garcia
  2017-11-29 13:44   ` Stefan Hajnoczi
  1 sibling, 0 replies; 23+ messages in thread
From: Alberto Garcia @ 2017-11-29 12:27 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, famz, qemu-block

On Wed 29 Nov 2017 11:25:10 AM CET, Paolo Bonzini wrote:
> From: Kevin Wolf <kwolf@redhat.com>
>
> The .drained_begin/end callbacks can (directly or indirectly via
> aio_poll()) cause block nodes to be removed or the current BdrvChild to
> point to a different child node.
>
> Use QLIST_FOREACH_SAFE() to make sure we don't access invalid
> BlockDriverStates or accidentally continue iterating the parents of the
> new child node instead of the node we actually came from.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Tested-by: Jeff Cody <jcody@redhat.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Reviewed-by: Jeff Cody <jcody@redhat.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 2/4] blockjob: remove clock argument from block_job_sleep_ns
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 2/4] blockjob: remove clock argument from block_job_sleep_ns Paolo Bonzini
@ 2017-11-29 12:45   ` Alberto Garcia
  2017-11-29 13:14   ` [Qemu-devel] " Jeff Cody
  2017-11-29 13:55   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2 siblings, 0 replies; 23+ messages in thread
From: Alberto Garcia @ 2017-11-29 12:45 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, famz, qemu-block

On Wed 29 Nov 2017 11:25:11 AM CET, Paolo Bonzini wrote:
> All callers are using QEMU_CLOCK_REALTIME, and it will not be possible to
> support more than one clock when block_job_sleep_ns switches to a single
> timer stored in the BlockJob struct.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] blockjob: reimplement block_job_sleep_ns to allow cancellation
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 4/4] blockjob: reimplement block_job_sleep_ns to allow cancellation Paolo Bonzini
@ 2017-11-29 12:54   ` Kevin Wolf
  2017-11-29 13:52     ` Jeff Cody
  2017-11-29 13:56   ` Jeff Cody
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Kevin Wolf @ 2017-11-29 12:54 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, qemu-block, jcody

Am 29.11.2017 um 11:25 hat Paolo Bonzini geschrieben:
> This reverts the effects of commit 4afeffc857 ("blockjob: do not allow
> coroutine double entry or entry-after-completion", 2017-11-21)
> 
> This fixed the symptom of a bug rather than the root cause. Canceling the
> wait on a sleeping blockjob coroutine is generally fine, we just need to
> make it work correctly across AioContexts.  To do so, use a QEMUTimer
> that calls block_job_enter.  Use a mutex to ensure that block_job_enter
> synchronizes correctly with block_job_sleep_ns.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockjob.c                   | 57 +++++++++++++++++++++++++++++++++++---------
>  include/block/blockjob.h     |  5 +++-
>  include/block/blockjob_int.h |  4 ++--
>  3 files changed, 52 insertions(+), 14 deletions(-)
> 
> diff --git a/blockjob.c b/blockjob.c
> index 4d22b7d2fb..3fdaabbc1f 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -37,6 +37,26 @@
>  #include "qemu/timer.h"
>  #include "qapi-event.h"
>  
> +/* Right now, this mutex is only needed to synchronize accesses to job->busy,
> + * especially concurrent calls to block_job_enter.
> + */

As discussed with Paolo on IRC, I'll replace the second line of this
comment, which he had in a different place originally and became prone
to misunderstanding now. The new version is:

/* Right now, this mutex is only needed to synchronize accesses to job->busy,
 * such as concurrent calls to block_job_do_yield and block_job_enter.
 */

Kevin

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] blockjob: reinstate busy=false/busy=true in block_job_sleep_ns
  2017-11-29 10:25 [Qemu-devel] [PATCH 0/4] blockjob: reinstate busy=false/busy=true in block_job_sleep_ns Paolo Bonzini
                   ` (3 preceding siblings ...)
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 4/4] blockjob: reimplement block_job_sleep_ns to allow cancellation Paolo Bonzini
@ 2017-11-29 13:13 ` Jeff Cody
  2017-11-29 13:14 ` Fam Zheng
  5 siblings, 0 replies; 23+ messages in thread
From: Jeff Cody @ 2017-11-29 13:13 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, qemu-block, kwolf

On Wed, Nov 29, 2017 at 11:25:09AM +0100, Paolo Bonzini wrote:
> In order to avoid reentering a scheduled coroutine this series uses
> a QEMUTimer outside co_aio_sleep_ns.  The timer callback just calls
> block_job_enter, and can be easily cancelled with timer_del.
> 
> This fixes the outstanding issues with block jobs, without reverting
> the new checks that Jeff added.
> 
> I have not run the full qemu-iotests suite, but the main affected
> tests (030, 041, 055, 097, 176, 200) work fine.
> 
> Paolo
> 
> Kevin Wolf (1):
>   block: Expect graph changes in bdrv_parent_drained_begin/end
> 
> Paolo Bonzini (3):
>   blockjob: remove clock argument from block_job_sleep_ns
>   blockjob: introduce block_job_do_yield
>   blockjob: reimplement block_job_sleep_ns to allow cancellation
> 
>  block/backup.c               |  4 +--
>  block/commit.c               |  2 +-
>  block/io.c                   |  8 ++---
>  block/mirror.c               |  6 ++--
>  block/stream.c               |  2 +-
>  blockjob.c                   | 76 ++++++++++++++++++++++++++++++++++----------
>  include/block/blockjob.h     |  5 ++-
>  include/block/blockjob_int.h |  7 ++--
>  tests/test-blockjob-txn.c    |  2 +-
>  9 files changed, 79 insertions(+), 33 deletions(-)
> 
> -- 
> 2.14.3
>


For the series:

Tested-By: Jeff Cody <jcody@redhat.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] blockjob: reinstate busy=false/busy=true in block_job_sleep_ns
  2017-11-29 10:25 [Qemu-devel] [PATCH 0/4] blockjob: reinstate busy=false/busy=true in block_job_sleep_ns Paolo Bonzini
                   ` (4 preceding siblings ...)
  2017-11-29 13:13 ` [Qemu-devel] [PATCH 0/4] blockjob: reinstate busy=false/busy=true in block_job_sleep_ns Jeff Cody
@ 2017-11-29 13:14 ` Fam Zheng
  5 siblings, 0 replies; 23+ messages in thread
From: Fam Zheng @ 2017-11-29 13:14 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, qemu-block, kwolf, jcody

On Wed, 11/29 11:25, Paolo Bonzini wrote:
> In order to avoid reentering a scheduled coroutine this series uses
> a QEMUTimer outside co_aio_sleep_ns.  The timer callback just calls
> block_job_enter, and can be easily cancelled with timer_del.
> 
> This fixes the outstanding issues with block jobs, without reverting
> the new checks that Jeff added.

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] blockjob: remove clock argument from block_job_sleep_ns
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 2/4] blockjob: remove clock argument from block_job_sleep_ns Paolo Bonzini
  2017-11-29 12:45   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
@ 2017-11-29 13:14   ` Jeff Cody
  2017-11-29 13:55   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2 siblings, 0 replies; 23+ messages in thread
From: Jeff Cody @ 2017-11-29 13:14 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, qemu-block, kwolf


On Wed, Nov 29, 2017 at 11:25:11AM +0100, Paolo Bonzini wrote:
> All callers are using QEMU_CLOCK_REALTIME, and it will not be possible to
> support more than one clock when block_job_sleep_ns switches to a single
> timer stored in the BlockJob struct.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Jeff Cody <jcody@redhat.com>


> ---
>  block/backup.c               | 4 ++--
>  block/commit.c               | 2 +-
>  block/mirror.c               | 6 +++---
>  block/stream.c               | 2 +-
>  blockjob.c                   | 5 +++--
>  include/block/blockjob_int.h | 7 +++----
>  tests/test-blockjob-txn.c    | 2 +-
>  7 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 06ddbfd03d..99e6bcc748 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -346,9 +346,9 @@ static bool coroutine_fn yield_and_check(BackupBlockJob *job)
>          uint64_t delay_ns = ratelimit_calculate_delay(&job->limit,
>                                                        job->bytes_read);
>          job->bytes_read = 0;
> -        block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, delay_ns);
> +        block_job_sleep_ns(&job->common, delay_ns);
>      } else {
> -        block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, 0);
> +        block_job_sleep_ns(&job->common, 0);
>      }
>  
>      if (block_job_is_cancelled(&job->common)) {
> diff --git a/block/commit.c b/block/commit.c
> index 5036eec434..c5327551ce 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -174,7 +174,7 @@ static void coroutine_fn commit_run(void *opaque)
>          /* Note that even when no rate limit is applied we need to yield
>           * with no pending I/O here so that bdrv_drain_all() returns.
>           */
> -        block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
> +        block_job_sleep_ns(&s->common, delay_ns);
>          if (block_job_is_cancelled(&s->common)) {
>              break;
>          }
> diff --git a/block/mirror.c b/block/mirror.c
> index 307b6391a8..c9badc1203 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -598,7 +598,7 @@ static void mirror_throttle(MirrorBlockJob *s)
>  
>      if (now - s->last_pause_ns > SLICE_TIME) {
>          s->last_pause_ns = now;
> -        block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, 0);
> +        block_job_sleep_ns(&s->common, 0);
>      } else {
>          block_job_pause_point(&s->common);
>      }
> @@ -870,13 +870,13 @@ static void coroutine_fn mirror_run(void *opaque)
>          ret = 0;
>          trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
>          if (!s->synced) {
> -            block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
> +            block_job_sleep_ns(&s->common, delay_ns);
>              if (block_job_is_cancelled(&s->common)) {
>                  break;
>              }
>          } else if (!should_complete) {
>              delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
> -            block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
> +            block_job_sleep_ns(&s->common, delay_ns);
>          }
>          s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>      }
> diff --git a/block/stream.c b/block/stream.c
> index e6f72346e5..499cdacdb0 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -141,7 +141,7 @@ static void coroutine_fn stream_run(void *opaque)
>          /* Note that even when no rate limit is applied we need to yield
>           * with no pending I/O here so that bdrv_drain_all() returns.
>           */
> -        block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
> +        block_job_sleep_ns(&s->common, delay_ns);
>          if (block_job_is_cancelled(&s->common)) {
>              break;
>          }
> diff --git a/blockjob.c b/blockjob.c
> index 2f0cc1528b..db9e4fc89a 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -788,7 +788,7 @@ bool block_job_is_cancelled(BlockJob *job)
>      return job->cancelled;
>  }
>  
> -void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
> +void block_job_sleep_ns(BlockJob *job, int64_t ns)
>  {
>      assert(job->busy);
>  
> @@ -803,7 +803,8 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
>       * it wakes and runs, otherwise we risk double-entry or entry after
>       * completion. */
>      if (!block_job_should_pause(job)) {
> -        co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
> +        co_aio_sleep_ns(blk_get_aio_context(job->blk),
> +                        QEMU_CLOCK_REALTIME, ns);
>      }
>  
>      block_job_pause_point(job);
> diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
> index 43f3be2965..f7ab183a39 100644
> --- a/include/block/blockjob_int.h
> +++ b/include/block/blockjob_int.h
> @@ -139,14 +139,13 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
>  /**
>   * block_job_sleep_ns:
>   * @job: The job that calls the function.
> - * @clock: The clock to sleep on.
>   * @ns: How many nanoseconds to stop for.
>   *
>   * Put the job to sleep (assuming that it wasn't canceled) for @ns
> - * nanoseconds.  Canceling the job will not interrupt the wait, so the
> - * cancel will not process until the coroutine wakes up.
> + * %QEMU_CLOCK_REALTIME nanoseconds.  Canceling the job will not interrupt
> + * the wait, so the cancel will not process until the coroutine wakes up.
>   */
> -void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
> +void block_job_sleep_ns(BlockJob *job, int64_t ns);
>  
>  /**
>   * block_job_yield:
> diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
> index c77343fc04..3591c9617f 100644
> --- a/tests/test-blockjob-txn.c
> +++ b/tests/test-blockjob-txn.c
> @@ -44,7 +44,7 @@ static void coroutine_fn test_block_job_run(void *opaque)
>  
>      while (s->iterations--) {
>          if (s->use_timer) {
> -            block_job_sleep_ns(job, QEMU_CLOCK_REALTIME, 0);
> +            block_job_sleep_ns(job, 0);
>          } else {
>              block_job_yield(job);
>          }
> -- 
> 2.14.3
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] blockjob: introduce block_job_do_yield
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 3/4] blockjob: introduce block_job_do_yield Paolo Bonzini
@ 2017-11-29 13:30   ` Jeff Cody
  2017-11-29 13:55   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  1 sibling, 0 replies; 23+ messages in thread
From: Jeff Cody @ 2017-11-29 13:30 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, qemu-block, kwolf

On Wed, Nov 29, 2017 at 11:25:12AM +0100, Paolo Bonzini wrote:
> Hide the clearing of job->busy in a single function, and set it
> in block_job_enter.  This lets block_job_do_yield verify that
> qemu_coroutine_enter is not used while job->busy = false.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Jeff Cody <jcody@redhat.com>

> ---
>  blockjob.c | 24 ++++++++++++++++--------
>  1 file changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/blockjob.c b/blockjob.c
> index db9e4fc89a..4d22b7d2fb 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -729,6 +729,15 @@ static bool block_job_should_pause(BlockJob *job)
>      return job->pause_count > 0;
>  }
>  
> +static void block_job_do_yield(BlockJob *job)
> +{
> +    job->busy = false;
> +    qemu_coroutine_yield();
> +
> +    /* Set by block_job_enter before re-entering the coroutine.  */
> +    assert(job->busy);
> +}
> +
>  void coroutine_fn block_job_pause_point(BlockJob *job)
>  {
>      assert(job && block_job_started(job));
> @@ -746,9 +755,7 @@ void coroutine_fn block_job_pause_point(BlockJob *job)
>  
>      if (block_job_should_pause(job) && !block_job_is_cancelled(job)) {
>          job->paused = true;
> -        job->busy = false;
> -        qemu_coroutine_yield(); /* wait for block_job_resume() */
> -        job->busy = true;
> +        block_job_do_yield(job);
>          job->paused = false;
>      }
>  
> @@ -778,9 +785,12 @@ void block_job_enter(BlockJob *job)
>          return;
>      }
>  
> -    if (!job->busy) {
> -        bdrv_coroutine_enter(blk_bs(job->blk), job->co);
> +    if (job->busy) {
> +        return;
>      }
> +
> +    job->busy = true;
> +    aio_co_wake(job->co);
>  }
>  
>  bool block_job_is_cancelled(BlockJob *job)
> @@ -819,11 +829,9 @@ void block_job_yield(BlockJob *job)
>          return;
>      }
>  
> -    job->busy = false;
>      if (!block_job_should_pause(job)) {
> -        qemu_coroutine_yield();
> +        block_job_do_yield(job);
>      }
> -    job->busy = true;
>  
>      block_job_pause_point(job);
>  }
> -- 
> 2.14.3
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/4] block: Expect graph changes in bdrv_parent_drained_begin/end
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 1/4] block: Expect graph changes in bdrv_parent_drained_begin/end Paolo Bonzini
  2017-11-29 12:27   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
@ 2017-11-29 13:44   ` Stefan Hajnoczi
  2017-11-29 13:55     ` Kevin Wolf
  1 sibling, 1 reply; 23+ messages in thread
From: Stefan Hajnoczi @ 2017-11-29 13:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, kwolf, famz, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2389 bytes --]

On Wed, Nov 29, 2017 at 11:25:10AM +0100, Paolo Bonzini wrote:
> From: Kevin Wolf <kwolf@redhat.com>
> 
> The .drained_begin/end callbacks can (directly or indirectly via
> aio_poll()) cause block nodes to be removed or the current BdrvChild to
> point to a different child node.
> 
> Use QLIST_FOREACH_SAFE() to make sure we don't access invalid
> BlockDriverStates or accidentally continue iterating the parents of the
> new child node instead of the node we actually came from.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Tested-by: Jeff Cody <jcody@redhat.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Reviewed-by: Jeff Cody <jcody@redhat.com>
> ---
>  block/io.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index 4fdf93a014..6773926fc1 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -42,9 +42,9 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>  
>  void bdrv_parent_drained_begin(BlockDriverState *bs)
>  {
> -    BdrvChild *c;
> +    BdrvChild *c, *next;
>  
> -    QLIST_FOREACH(c, &bs->parents, next_parent) {
> +    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
>          if (c->role->drained_begin) {
>              c->role->drained_begin(c);
>          }
> @@ -53,9 +53,9 @@ void bdrv_parent_drained_begin(BlockDriverState *bs)
>  
>  void bdrv_parent_drained_end(BlockDriverState *bs)
>  {
> -    BdrvChild *c;
> +    BdrvChild *c, *next;
>  
> -    QLIST_FOREACH(c, &bs->parents, next_parent) {
> +    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
>          if (c->role->drained_end) {
>              c->role->drained_end(c);
>          }

Hmm...not sure if this patch is enough.

Thinking about this more, if this sub-graph changes then
.drained_begin() callbacks are not necessarily paired with
.drained_end() callbacks.

In other words, all of the following are possible:

1. Only .drained_begin() is called
2. .drained_begin() is called, then .drained_end()
3. Only .drained_end() is called

It makes me wonder if we need to either:

Take references and ensure .drained_end() gets called if and only if
.drained_begin() was also called.

OR

Document that .drained_begin/end() calls may not be paired and audit
existing code to check that this is safe.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] blockjob: reimplement block_job_sleep_ns to allow cancellation
  2017-11-29 12:54   ` Kevin Wolf
@ 2017-11-29 13:52     ` Jeff Cody
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff Cody @ 2017-11-29 13:52 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, qemu-devel, famz, qemu-block

On Wed, Nov 29, 2017 at 01:54:39PM +0100, Kevin Wolf wrote:
> Am 29.11.2017 um 11:25 hat Paolo Bonzini geschrieben:
> > This reverts the effects of commit 4afeffc857 ("blockjob: do not allow
> > coroutine double entry or entry-after-completion", 2017-11-21)
> > 
> > This fixed the symptom of a bug rather than the root cause. Canceling the
> > wait on a sleeping blockjob coroutine is generally fine, we just need to
> > make it work correctly across AioContexts.  To do so, use a QEMUTimer
> > that calls block_job_enter.  Use a mutex to ensure that block_job_enter
> > synchronizes correctly with block_job_sleep_ns.
> > 
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >  blockjob.c                   | 57 +++++++++++++++++++++++++++++++++++---------
> >  include/block/blockjob.h     |  5 +++-
> >  include/block/blockjob_int.h |  4 ++--
> >  3 files changed, 52 insertions(+), 14 deletions(-)
> > 
> > diff --git a/blockjob.c b/blockjob.c
> > index 4d22b7d2fb..3fdaabbc1f 100644
> > --- a/blockjob.c
> > +++ b/blockjob.c
> > @@ -37,6 +37,26 @@
> >  #include "qemu/timer.h"
> >  #include "qapi-event.h"
> >  
> > +/* Right now, this mutex is only needed to synchronize accesses to job->busy,
> > + * especially concurrent calls to block_job_enter.
> > + */
> 
> As discussed with Paolo on IRC, I'll replace the second line of this
> comment, which he had in a different place originally and became prone
> to misunderstanding now. The new version is:
> 
> /* Right now, this mutex is only needed to synchronize accesses to job->busy,
>  * such as concurrent calls to block_job_do_yield and block_job_enter.
>  */
> 

By synchronizing job->busy, job->sleep_timer is also protected.  But
job->sleep_timer needs synchronization too, it just gets it since job->busy
needs it too.  Maybe worth saying:


 /* Right now, this mutex is only needed to synchronize accesses to
  * job->busy and job->sleep_timer, such as concurrent calls to
  * block_job_do_yield and block_job_enter.
  */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/4] block: Expect graph changes in bdrv_parent_drained_begin/end
  2017-11-29 13:44   ` Stefan Hajnoczi
@ 2017-11-29 13:55     ` Kevin Wolf
  2017-12-01 10:13       ` Stefan Hajnoczi
  0 siblings, 1 reply; 23+ messages in thread
From: Kevin Wolf @ 2017-11-29 13:55 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Paolo Bonzini, qemu-devel, famz, qemu-block

[-- Attachment #1: Type: text/plain, Size: 3217 bytes --]

Am 29.11.2017 um 14:44 hat Stefan Hajnoczi geschrieben:
> On Wed, Nov 29, 2017 at 11:25:10AM +0100, Paolo Bonzini wrote:
> > From: Kevin Wolf <kwolf@redhat.com>
> > 
> > The .drained_begin/end callbacks can (directly or indirectly via
> > aio_poll()) cause block nodes to be removed or the current BdrvChild to
> > point to a different child node.
> > 
> > Use QLIST_FOREACH_SAFE() to make sure we don't access invalid
> > BlockDriverStates or accidentally continue iterating the parents of the
> > new child node instead of the node we actually came from.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > Tested-by: Jeff Cody <jcody@redhat.com>
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Reviewed-by: Jeff Cody <jcody@redhat.com>
> > ---
> >  block/io.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/block/io.c b/block/io.c
> > index 4fdf93a014..6773926fc1 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -42,9 +42,9 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
> >  
> >  void bdrv_parent_drained_begin(BlockDriverState *bs)
> >  {
> > -    BdrvChild *c;
> > +    BdrvChild *c, *next;
> >  
> > -    QLIST_FOREACH(c, &bs->parents, next_parent) {
> > +    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
> >          if (c->role->drained_begin) {
> >              c->role->drained_begin(c);
> >          }
> > @@ -53,9 +53,9 @@ void bdrv_parent_drained_begin(BlockDriverState *bs)
> >  
> >  void bdrv_parent_drained_end(BlockDriverState *bs)
> >  {
> > -    BdrvChild *c;
> > +    BdrvChild *c, *next;
> >  
> > -    QLIST_FOREACH(c, &bs->parents, next_parent) {
> > +    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
> >          if (c->role->drained_end) {
> >              c->role->drained_end(c);
> >          }
> 
> Hmm...not sure if this patch is enough.

Depends on what you have in mind, but generally, no, it's certainly not
enough to fix all bad things that can happen during drain...

> Thinking about this more, if this sub-graph changes then
> .drained_begin() callbacks are not necessarily paired with
> .drained_end() callbacks.
> 
> In other words, all of the following are possible:
> 
> 1. Only .drained_begin() is called
> 2. .drained_begin() is called, then .drained_end()
> 3. Only .drained_end() is called

bdrv_replace_child_noperm() makes sure to call .drained_end() and
.drained_begin() depending on whether the old and new child node are
currently drained. So I think this might actually work.

> It makes me wonder if we need to either:
> 
> Take references and ensure .drained_end() gets called if and only if
> .drained_begin() was also called.
> 
> OR
> 
> Document that .drained_begin/end() calls may not be paired and audit
> existing code to check that this is safe.

How would the latter even possibly work? If the .drained_end is missing,
the parent would never know that it can send new requests again. A
missing .drained_begin could be ignored in the .drained_end callback,
but it wouldn't be safe because the parent wouldn't actually stop to
send new requests.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 2/4] blockjob: remove clock argument from block_job_sleep_ns
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 2/4] blockjob: remove clock argument from block_job_sleep_ns Paolo Bonzini
  2017-11-29 12:45   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
  2017-11-29 13:14   ` [Qemu-devel] " Jeff Cody
@ 2017-11-29 13:55   ` Stefan Hajnoczi
  2 siblings, 0 replies; 23+ messages in thread
From: Stefan Hajnoczi @ 2017-11-29 13:55 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, kwolf, famz, qemu-block

[-- Attachment #1: Type: text/plain, Size: 725 bytes --]

On Wed, Nov 29, 2017 at 11:25:11AM +0100, Paolo Bonzini wrote:
> All callers are using QEMU_CLOCK_REALTIME, and it will not be possible to
> support more than one clock when block_job_sleep_ns switches to a single
> timer stored in the BlockJob struct.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/backup.c               | 4 ++--
>  block/commit.c               | 2 +-
>  block/mirror.c               | 6 +++---
>  block/stream.c               | 2 +-
>  blockjob.c                   | 5 +++--
>  include/block/blockjob_int.h | 7 +++----
>  tests/test-blockjob-txn.c    | 2 +-
>  7 files changed, 14 insertions(+), 14 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 3/4] blockjob: introduce block_job_do_yield
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 3/4] blockjob: introduce block_job_do_yield Paolo Bonzini
  2017-11-29 13:30   ` Jeff Cody
@ 2017-11-29 13:55   ` Stefan Hajnoczi
  1 sibling, 0 replies; 23+ messages in thread
From: Stefan Hajnoczi @ 2017-11-29 13:55 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, kwolf, famz, qemu-block

[-- Attachment #1: Type: text/plain, Size: 474 bytes --]

On Wed, Nov 29, 2017 at 11:25:12AM +0100, Paolo Bonzini wrote:
> Hide the clearing of job->busy in a single function, and set it
> in block_job_enter.  This lets block_job_do_yield verify that
> qemu_coroutine_enter is not used while job->busy = false.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockjob.c | 24 ++++++++++++++++--------
>  1 file changed, 16 insertions(+), 8 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] blockjob: reimplement block_job_sleep_ns to allow cancellation
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 4/4] blockjob: reimplement block_job_sleep_ns to allow cancellation Paolo Bonzini
  2017-11-29 12:54   ` Kevin Wolf
@ 2017-11-29 13:56   ` Jeff Cody
  2017-11-29 14:21     ` Kevin Wolf
  2017-11-29 14:07   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2017-11-29 21:12   ` [Qemu-devel] " Eric Blake
  3 siblings, 1 reply; 23+ messages in thread
From: Jeff Cody @ 2017-11-29 13:56 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, famz, qemu-block, kwolf

On Wed, Nov 29, 2017 at 11:25:13AM +0100, Paolo Bonzini wrote:
> This reverts the effects of commit 4afeffc857 ("blockjob: do not allow
> coroutine double entry or entry-after-completion", 2017-11-21)
> 
> This fixed the symptom of a bug rather than the root cause. Canceling the
> wait on a sleeping blockjob coroutine is generally fine, we just need to
> make it work correctly across AioContexts.  To do so, use a QEMUTimer
> that calls block_job_enter.  Use a mutex to ensure that block_job_enter
> synchronizes correctly with block_job_sleep_ns.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockjob.c                   | 57 +++++++++++++++++++++++++++++++++++---------
>  include/block/blockjob.h     |  5 +++-
>  include/block/blockjob_int.h |  4 ++--
>  3 files changed, 52 insertions(+), 14 deletions(-)
> 
> diff --git a/blockjob.c b/blockjob.c
> index 4d22b7d2fb..3fdaabbc1f 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -37,6 +37,26 @@
>  #include "qemu/timer.h"
>  #include "qapi-event.h"
>  
> +/* Right now, this mutex is only needed to synchronize accesses to job->busy,
> + * especially concurrent calls to block_job_enter.
> + */

See my comment on Kevin's comment


> +static QemuMutex block_job_mutex;
> +
> +static void block_job_lock(void)
> +{
> +    qemu_mutex_lock(&block_job_mutex);
> +}
> +
> +static void block_job_unlock(void)
> +{
> +    qemu_mutex_unlock(&block_job_mutex);
> +}
> +
> +static void __attribute__((__constructor__)) block_job_init(void)
> +{
> +    qemu_mutex_init(&block_job_mutex);
> +}
> +
>  static void block_job_event_cancelled(BlockJob *job);
>  static void block_job_event_completed(BlockJob *job, const char *msg);
>  
> @@ -161,6 +181,7 @@ void block_job_unref(BlockJob *job)
>          blk_unref(job->blk);
>          error_free(job->blocker);
>          g_free(job->id);
> +        assert(!timer_pending(&job->sleep_timer));
>          g_free(job);
>      }
>  }
> @@ -287,6 +308,13 @@ static void coroutine_fn block_job_co_entry(void *opaque)
>      job->driver->start(job);
>  }
>  
> +static void block_job_sleep_timer_cb(void *opaque)
> +{
> +    BlockJob *job = opaque;
> +
> +    block_job_enter(job);
> +}
> +
>  void block_job_start(BlockJob *job)
>  {
>      assert(job && !block_job_started(job) && job->paused &&
> @@ -556,7 +584,7 @@ BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
>      info->type      = g_strdup(BlockJobType_str(job->driver->job_type));
>      info->device    = g_strdup(job->id);
>      info->len       = job->len;
> -    info->busy      = job->busy;
> +    info->busy      = atomic_read(&job->busy);
>      info->paused    = job->pause_count > 0;
>      info->offset    = job->offset;
>      info->speed     = job->speed;
> @@ -664,6 +692,9 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
>      job->paused        = true;
>      job->pause_count   = 1;
>      job->refcnt        = 1;
> +    aio_timer_init(qemu_get_aio_context(), &job->sleep_timer,
> +                   QEMU_CLOCK_REALTIME, SCALE_NS,
> +                   block_job_sleep_timer_cb, job);
>  
>      error_setg(&job->blocker, "block device is in use by block job: %s",
>                 BlockJobType_str(driver->job_type));
> @@ -729,9 +760,14 @@ static bool block_job_should_pause(BlockJob *job)
>      return job->pause_count > 0;
>  }
>  
> -static void block_job_do_yield(BlockJob *job)
> +static void block_job_do_yield(BlockJob *job, uint64_t ns)

Maybe worth a comment that using '-1' on a uint64_t is by design, and not a
bug, so this doesn't get 'fixed' in the future? 

>  {
> +    block_job_lock();
> +    if (ns != -1) {
> +        timer_mod(&job->sleep_timer, ns);
> +    }
>      job->busy = false;
> +    block_job_unlock();
>      qemu_coroutine_yield();
>  
>      /* Set by block_job_enter before re-entering the coroutine.  */
> @@ -755,7 +791,7 @@ void coroutine_fn block_job_pause_point(BlockJob *job)
>  
>      if (block_job_should_pause(job) && !block_job_is_cancelled(job)) {
>          job->paused = true;
> -        block_job_do_yield(job);
> +        block_job_do_yield(job, -1);
>          job->paused = false;
>      }
>  
> @@ -785,11 +821,16 @@ void block_job_enter(BlockJob *job)
>          return;
>      }
>  
> +    block_job_lock();
>      if (job->busy) {
> +        block_job_unlock();
>          return;
>      }
>  
> +    assert(!job->deferred_to_main_loop);
> +    timer_del(&job->sleep_timer);
>      job->busy = true;
> +    block_job_unlock();
>      aio_co_wake(job->co);
>  }
>  
> @@ -807,14 +848,8 @@ void block_job_sleep_ns(BlockJob *job, int64_t ns)
>          return;
>      }
>  
> -    /* We need to leave job->busy set here, because when we have
> -     * put a coroutine to 'sleep', we have scheduled it to run in
> -     * the future.  We cannot enter that same coroutine again before
> -     * it wakes and runs, otherwise we risk double-entry or entry after
> -     * completion. */
>      if (!block_job_should_pause(job)) {
> -        co_aio_sleep_ns(blk_get_aio_context(job->blk),
> -                        QEMU_CLOCK_REALTIME, ns);
> +        block_job_do_yield(job, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + ns);
>      }
>  
>      block_job_pause_point(job);
> @@ -830,7 +865,7 @@ void block_job_yield(BlockJob *job)
>      }
>  
>      if (!block_job_should_pause(job)) {
> -        block_job_do_yield(job);
> +        block_job_do_yield(job, -1);
>      }
>  
>      block_job_pause_point(job);
> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> index 67c0968fa5..956f0d6819 100644
> --- a/include/block/blockjob.h
> +++ b/include/block/blockjob.h
> @@ -77,7 +77,7 @@ typedef struct BlockJob {
>      /**
>       * Set to false by the job while the coroutine has yielded and may be
>       * re-entered by block_job_enter().  There may still be I/O or event loop
> -     * activity pending.
> +     * activity pending.  Accessed under block_job_mutex (in blockjob.c).
>       */
>      bool busy;
>  
> @@ -135,6 +135,9 @@ typedef struct BlockJob {
>       */
>      int ret;
>  
> +    /** Timer that is used by @block_job_sleep_ns.  */

Maybe also add similar statement as above for busy:

       /* Accessed under block_job_mutex (in blockjob.c) */

> +    QEMUTimer sleep_timer;
> +
>      /** Non-NULL if this job is part of a transaction */
>      BlockJobTxn *txn;
>      QLIST_ENTRY(BlockJob) txn_list;
> diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
> index f7ab183a39..c9b23b0cc9 100644
> --- a/include/block/blockjob_int.h
> +++ b/include/block/blockjob_int.h
> @@ -142,8 +142,8 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
>   * @ns: How many nanoseconds to stop for.
>   *
>   * Put the job to sleep (assuming that it wasn't canceled) for @ns
> - * %QEMU_CLOCK_REALTIME nanoseconds.  Canceling the job will not interrupt
> - * the wait, so the cancel will not process until the coroutine wakes up.
> + * %QEMU_CLOCK_REALTIME nanoseconds.  Canceling the job will immediately
> + * interrupt the wait.
>   */
>  void block_job_sleep_ns(BlockJob *job, int64_t ns);
>  
> -- 
> 2.14.3
>

My only concerns were regarding comments, and Kevin can fix them when
applying if he wants.  So:


Reviewed-by: Jeff Cody <jcody@redhat.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 4/4] blockjob: reimplement block_job_sleep_ns to allow cancellation
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 4/4] blockjob: reimplement block_job_sleep_ns to allow cancellation Paolo Bonzini
  2017-11-29 12:54   ` Kevin Wolf
  2017-11-29 13:56   ` Jeff Cody
@ 2017-11-29 14:07   ` Stefan Hajnoczi
  2017-11-29 21:12   ` [Qemu-devel] " Eric Blake
  3 siblings, 0 replies; 23+ messages in thread
From: Stefan Hajnoczi @ 2017-11-29 14:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, kwolf, famz, qemu-block

[-- Attachment #1: Type: text/plain, Size: 903 bytes --]

On Wed, Nov 29, 2017 at 11:25:13AM +0100, Paolo Bonzini wrote:
> This reverts the effects of commit 4afeffc857 ("blockjob: do not allow
> coroutine double entry or entry-after-completion", 2017-11-21)
> 
> This fixed the symptom of a bug rather than the root cause. Canceling the
> wait on a sleeping blockjob coroutine is generally fine, we just need to
> make it work correctly across AioContexts.  To do so, use a QEMUTimer
> that calls block_job_enter.  Use a mutex to ensure that block_job_enter
> synchronizes correctly with block_job_sleep_ns.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockjob.c                   | 57 +++++++++++++++++++++++++++++++++++---------
>  include/block/blockjob.h     |  5 +++-
>  include/block/blockjob_int.h |  4 ++--
>  3 files changed, 52 insertions(+), 14 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] blockjob: reimplement block_job_sleep_ns to allow cancellation
  2017-11-29 13:56   ` Jeff Cody
@ 2017-11-29 14:21     ` Kevin Wolf
  2017-11-29 14:25       ` Jeff Cody
  0 siblings, 1 reply; 23+ messages in thread
From: Kevin Wolf @ 2017-11-29 14:21 UTC (permalink / raw)
  To: Jeff Cody; +Cc: Paolo Bonzini, qemu-devel, famz, qemu-block

Am 29.11.2017 um 14:56 hat Jeff Cody geschrieben:
> On Wed, Nov 29, 2017 at 11:25:13AM +0100, Paolo Bonzini wrote:
> > This reverts the effects of commit 4afeffc857 ("blockjob: do not allow
> > coroutine double entry or entry-after-completion", 2017-11-21)
> > 
> > This fixed the symptom of a bug rather than the root cause. Canceling the
> > wait on a sleeping blockjob coroutine is generally fine, we just need to
> > make it work correctly across AioContexts.  To do so, use a QEMUTimer
> > that calls block_job_enter.  Use a mutex to ensure that block_job_enter
> > synchronizes correctly with block_job_sleep_ns.
> > 
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

> My only concerns were regarding comments, and Kevin can fix them when
> applying if he wants.

I'm squashing in the following changes.

Kevin


diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 956f0d6819..00403d9482 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -135,7 +135,10 @@ typedef struct BlockJob {
      */
     int ret;
 
-    /** Timer that is used by @block_job_sleep_ns.  */
+    /**
+     * Timer that is used by @block_job_sleep_ns. Accessed under
+     * block_job_mutex (in blockjob.c).
+     */
     QEMUTimer sleep_timer;
 
     /** Non-NULL if this job is part of a transaction */
diff --git a/blockjob.c b/blockjob.c
index 83f5c891be..dd8c685d8a 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -37,9 +37,9 @@
 #include "qemu/timer.h"
 #include "qapi-event.h"
 
-/* Right now, this mutex is only needed to synchronize accesses to job->busy,
- * such as concurrent calls to block_job_do_yield and block_job_enter.
- */
+/* Right now, this mutex is only needed to synchronize accesses to job->busy
+ * and and job->sleep_timer, such as concurrent calls to block_job_do_yield and
+ * block_job_enter. */
 static QemuMutex block_job_mutex;
 
 static void block_job_lock(void)
@@ -760,6 +760,12 @@ static bool block_job_should_pause(BlockJob *job)
     return job->pause_count > 0;
 }
 
+/* Yield, and schedule a timer to reenter the coroutine after @ns nanoseconds.
+ * Reentering the job coroutine with block_job_enter() before the timer has
+ * expired is allowed and cancels the timer.
+ *
+ * If @ns is (uint64_t) -1, no timer is scheduled and block_job_enter() must be
+ * called explicitly. */
 static void block_job_do_yield(BlockJob *job, uint64_t ns)
 {
     block_job_lock();

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] blockjob: reimplement block_job_sleep_ns to allow cancellation
  2017-11-29 14:21     ` Kevin Wolf
@ 2017-11-29 14:25       ` Jeff Cody
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff Cody @ 2017-11-29 14:25 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, qemu-devel, famz, qemu-block

On Wed, Nov 29, 2017 at 03:21:37PM +0100, Kevin Wolf wrote:
> Am 29.11.2017 um 14:56 hat Jeff Cody geschrieben:
> > On Wed, Nov 29, 2017 at 11:25:13AM +0100, Paolo Bonzini wrote:
> > > This reverts the effects of commit 4afeffc857 ("blockjob: do not allow
> > > coroutine double entry or entry-after-completion", 2017-11-21)
> > > 
> > > This fixed the symptom of a bug rather than the root cause. Canceling the
> > > wait on a sleeping blockjob coroutine is generally fine, we just need to
> > > make it work correctly across AioContexts.  To do so, use a QEMUTimer
> > > that calls block_job_enter.  Use a mutex to ensure that block_job_enter
> > > synchronizes correctly with block_job_sleep_ns.
> > > 
> > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> > My only concerns were regarding comments, and Kevin can fix them when
> > applying if he wants.
> 
> I'm squashing in the following changes.
> 
> Kevin
> 
> 
> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> index 956f0d6819..00403d9482 100644
> --- a/include/block/blockjob.h
> +++ b/include/block/blockjob.h
> @@ -135,7 +135,10 @@ typedef struct BlockJob {
>       */
>      int ret;
>  
> -    /** Timer that is used by @block_job_sleep_ns.  */
> +    /**
> +     * Timer that is used by @block_job_sleep_ns. Accessed under
> +     * block_job_mutex (in blockjob.c).
> +     */
>      QEMUTimer sleep_timer;
>  
>      /** Non-NULL if this job is part of a transaction */
> diff --git a/blockjob.c b/blockjob.c
> index 83f5c891be..dd8c685d8a 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -37,9 +37,9 @@
>  #include "qemu/timer.h"
>  #include "qapi-event.h"
>  
> -/* Right now, this mutex is only needed to synchronize accesses to job->busy,
> - * such as concurrent calls to block_job_do_yield and block_job_enter.
> - */
> +/* Right now, this mutex is only needed to synchronize accesses to job->busy
> + * and and job->sleep_timer, such as concurrent calls to block_job_do_yield and

s/and and/and

> + * block_job_enter. */
>  static QemuMutex block_job_mutex;
>  
>  static void block_job_lock(void)
> @@ -760,6 +760,12 @@ static bool block_job_should_pause(BlockJob *job)
>      return job->pause_count > 0;
>  }
>  
> +/* Yield, and schedule a timer to reenter the coroutine after @ns nanoseconds.
> + * Reentering the job coroutine with block_job_enter() before the timer has
> + * expired is allowed and cancels the timer.
> + *
> + * If @ns is (uint64_t) -1, no timer is scheduled and block_job_enter() must be
> + * called explicitly. */
>  static void block_job_do_yield(BlockJob *job, uint64_t ns)
>  {
>      block_job_lock();

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] blockjob: reimplement block_job_sleep_ns to allow cancellation
  2017-11-29 10:25 ` [Qemu-devel] [PATCH 4/4] blockjob: reimplement block_job_sleep_ns to allow cancellation Paolo Bonzini
                     ` (2 preceding siblings ...)
  2017-11-29 14:07   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2017-11-29 21:12   ` Eric Blake
  3 siblings, 0 replies; 23+ messages in thread
From: Eric Blake @ 2017-11-29 21:12 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, jcody, famz, qemu-block

On 11/29/2017 04:25 AM, Paolo Bonzini wrote:
> This reverts the effects of commit 4afeffc857 ("blockjob: do not allow
> coroutine double entry or entry-after-completion", 2017-11-21)
> 
> This fixed the symptom of a bug rather than the root cause. Canceling the
> wait on a sleeping blockjob coroutine is generally fine, we just need to
> make it work correctly across AioContexts.  To do so, use a QEMUTimer
> that calls block_job_enter.  Use a mutex to ensure that block_job_enter
> synchronizes correctly with block_job_sleep_ns.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

I'm a bit late in reviewing; just pointing out that:

> @@ -807,14 +848,8 @@ void block_job_sleep_ns(BlockJob *job, int64_t ns)
>           return;
>       }
>   
> -    /* We need to leave job->busy set here, because when we have
> -     * put a coroutine to 'sleep', we have scheduled it to run in
> -     * the future.  We cannot enter that same coroutine again before
> -     * it wakes and runs, otherwise we risk double-entry or entry after
> -     * completion. */
>       if (!block_job_should_pause(job)) {
> -        co_aio_sleep_ns(blk_get_aio_context(job->blk),
> -                        QEMU_CLOCK_REALTIME, ns);
> +        block_job_do_yield(job, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + ns);

This conflicts slightly with 
https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01681.html that 
simplifies co_aio_sleep_ns.  Are we left with any callers of 
co_aio_sleep_ns() that can use anything other than QEMU_CLOCK_REALTIME, 
or can that also be simplified?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/4] block: Expect graph changes in bdrv_parent_drained_begin/end
  2017-11-29 13:55     ` Kevin Wolf
@ 2017-12-01 10:13       ` Stefan Hajnoczi
  0 siblings, 0 replies; 23+ messages in thread
From: Stefan Hajnoczi @ 2017-12-01 10:13 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, qemu-devel, famz, qemu-block

[-- Attachment #1: Type: text/plain, Size: 3769 bytes --]

On Wed, Nov 29, 2017 at 02:55:28PM +0100, Kevin Wolf wrote:
> Am 29.11.2017 um 14:44 hat Stefan Hajnoczi geschrieben:
> > On Wed, Nov 29, 2017 at 11:25:10AM +0100, Paolo Bonzini wrote:
> > > From: Kevin Wolf <kwolf@redhat.com>
> > > 
> > > The .drained_begin/end callbacks can (directly or indirectly via
> > > aio_poll()) cause block nodes to be removed or the current BdrvChild to
> > > point to a different child node.
> > > 
> > > Use QLIST_FOREACH_SAFE() to make sure we don't access invalid
> > > BlockDriverStates or accidentally continue iterating the parents of the
> > > new child node instead of the node we actually came from.
> > > 
> > > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > > Tested-by: Jeff Cody <jcody@redhat.com>
> > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > Reviewed-by: Jeff Cody <jcody@redhat.com>
> > > ---
> > >  block/io.c | 8 ++++----
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/block/io.c b/block/io.c
> > > index 4fdf93a014..6773926fc1 100644
> > > --- a/block/io.c
> > > +++ b/block/io.c
> > > @@ -42,9 +42,9 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
> > >  
> > >  void bdrv_parent_drained_begin(BlockDriverState *bs)
> > >  {
> > > -    BdrvChild *c;
> > > +    BdrvChild *c, *next;
> > >  
> > > -    QLIST_FOREACH(c, &bs->parents, next_parent) {
> > > +    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
> > >          if (c->role->drained_begin) {
> > >              c->role->drained_begin(c);
> > >          }
> > > @@ -53,9 +53,9 @@ void bdrv_parent_drained_begin(BlockDriverState *bs)
> > >  
> > >  void bdrv_parent_drained_end(BlockDriverState *bs)
> > >  {
> > > -    BdrvChild *c;
> > > +    BdrvChild *c, *next;
> > >  
> > > -    QLIST_FOREACH(c, &bs->parents, next_parent) {
> > > +    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
> > >          if (c->role->drained_end) {
> > >              c->role->drained_end(c);
> > >          }
> > 
> > Hmm...not sure if this patch is enough.
> 
> Depends on what you have in mind, but generally, no, it's certainly not
> enough to fix all bad things that can happen during drain...
> 
> > Thinking about this more, if this sub-graph changes then
> > .drained_begin() callbacks are not necessarily paired with
> > .drained_end() callbacks.
> > 
> > In other words, all of the following are possible:
> > 
> > 1. Only .drained_begin() is called
> > 2. .drained_begin() is called, then .drained_end()
> > 3. Only .drained_end() is called
> 
> bdrv_replace_child_noperm() makes sure to call .drained_end() and
> .drained_begin() depending on whether the old and new child node are
> currently drained. So I think this might actually work.
> 
> > It makes me wonder if we need to either:
> > 
> > Take references and ensure .drained_end() gets called if and only if
> > .drained_begin() was also called.
> > 
> > OR
> > 
> > Document that .drained_begin/end() calls may not be paired and audit
> > existing code to check that this is safe.
> 
> How would the latter even possibly work? If the .drained_end is missing,
> the parent would never know that it can send new requests again. A
> missing .drained_begin could be ignored in the .drained_end callback,
> but it wouldn't be safe because the parent wouldn't actually stop to
> send new requests.

Yesterday I looked at a similar issue in Fam's new RFC that moves
.drained_begin()/.drained_end() to AioContext.  I came to the conclusion
that while it's easy to implement unmatched begin/end calls (as we do
today) it's not possible to get useful semantics out of them :).  So we
have to ensure the the begin/end calls match.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-12-01 15:44 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-29 10:25 [Qemu-devel] [PATCH 0/4] blockjob: reinstate busy=false/busy=true in block_job_sleep_ns Paolo Bonzini
2017-11-29 10:25 ` [Qemu-devel] [PATCH 1/4] block: Expect graph changes in bdrv_parent_drained_begin/end Paolo Bonzini
2017-11-29 12:27   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
2017-11-29 13:44   ` Stefan Hajnoczi
2017-11-29 13:55     ` Kevin Wolf
2017-12-01 10:13       ` Stefan Hajnoczi
2017-11-29 10:25 ` [Qemu-devel] [PATCH 2/4] blockjob: remove clock argument from block_job_sleep_ns Paolo Bonzini
2017-11-29 12:45   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
2017-11-29 13:14   ` [Qemu-devel] " Jeff Cody
2017-11-29 13:55   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2017-11-29 10:25 ` [Qemu-devel] [PATCH 3/4] blockjob: introduce block_job_do_yield Paolo Bonzini
2017-11-29 13:30   ` Jeff Cody
2017-11-29 13:55   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2017-11-29 10:25 ` [Qemu-devel] [PATCH 4/4] blockjob: reimplement block_job_sleep_ns to allow cancellation Paolo Bonzini
2017-11-29 12:54   ` Kevin Wolf
2017-11-29 13:52     ` Jeff Cody
2017-11-29 13:56   ` Jeff Cody
2017-11-29 14:21     ` Kevin Wolf
2017-11-29 14:25       ` Jeff Cody
2017-11-29 14:07   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2017-11-29 21:12   ` [Qemu-devel] " Eric Blake
2017-11-29 13:13 ` [Qemu-devel] [PATCH 0/4] blockjob: reinstate busy=false/busy=true in block_job_sleep_ns Jeff Cody
2017-11-29 13:14 ` Fam Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.