All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] block-copy: protect block-copy internal structures
@ 2021-06-08  7:33 Emanuele Giuseppe Esposito
  2021-06-08  7:33 ` [PATCH v3 1/5] block-copy: streamline choice of copy_range vs. read/write Emanuele Giuseppe Esposito
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-06-08  7:33 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Emanuele Giuseppe Esposito,
	Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

This serie of patches aims to reduce the usage of the
AioContexlock in block-copy, by introducing smaller granularity
locks thus on making the block layer thread safe. 

This serie depends on my previous serie that brings thread safety
to the smaller API used by block-copy, like ratelimit, progressmeter
abd co-shared-resource.

What's missing for block-copy to be fully thread-safe is fixing
the CoSleep API to allow cross-thread sleep and wakeup.
Paolo is working on it.

Patch 1 introduces the .method field in BlockCopyState, to be used
instead of .use_copy_range, .copy_size and .zeros.
Patch 2-3 provide comments and refactoring in preparation to
the lock added in patch 4 on BlockCopyTask, BlockCopyCallState and
BlockCopyState. Patch 5 uses load_acquire/store_release to make sure
BlockCopyCallState OUT fields are updated before finished is set to
true. 

Based-on: <20210518094058.25952-1-eesposit@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
v3:
* Use a single lock instead of two [Paolo, Vladimir]
* Extend lock to protect also BdrvDirtyBitmap API [Vladimir]
* Drop patch 6 (set .method as atomic) since with current refactoring
  it can be simply included in the near critical sections protected by
  the lock

Emanuele Giuseppe Esposito (4):
  block-copy: improve comments of BlockCopyTask and BlockCopyState types
    and functions
  block-copy: move progress_set_remaining in block_copy_task_end
  block-copy: add a CoMutex
  block-copy: atomic .cancelled and .finished fields in
    BlockCopyCallState

Paolo Bonzini (1):
  block-copy: streamline choice of copy_range vs. read/write

 block/block-copy.c | 319 +++++++++++++++++++++++++--------------------
 1 file changed, 181 insertions(+), 138 deletions(-)

-- 
2.30.2



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/5] block-copy: streamline choice of copy_range vs. read/write
  2021-06-08  7:33 [PATCH v3 0/5] block-copy: protect block-copy internal structures Emanuele Giuseppe Esposito
@ 2021-06-08  7:33 ` Emanuele Giuseppe Esposito
  2021-06-09  8:51   ` Vladimir Sementsov-Ogievskiy
  2021-06-08  7:33 ` [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions Emanuele Giuseppe Esposito
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-06-08  7:33 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Emanuele Giuseppe Esposito,
	Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

From: Paolo Bonzini <pbonzini@redhat.com>

Put the logic to determine the copy size in a separate function, so
that there is a simple state machine for the possible methods of
copying data from one BlockDriverState to the other.

Use .method instead of .copy_range as in-out argument, and
include also .zeroes as an additional copy method.

While at it, store the common computation of block_copy_max_transfer
into a new field of BlockCopyState, and make sure that we always
obey max_transfer; that's more efficient even for the
COPY_RANGE_READ_WRITE case.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/block-copy.c | 171 ++++++++++++++++++++++-----------------------
 1 file changed, 85 insertions(+), 86 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 943e30b7e6..d58051288b 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -28,6 +28,14 @@
 #define BLOCK_COPY_MAX_WORKERS 64
 #define BLOCK_COPY_SLICE_TIME 100000000ULL /* ns */
 
+typedef enum {
+    COPY_READ_WRITE_CLUSTER,
+    COPY_READ_WRITE,
+    COPY_WRITE_ZEROES,
+    COPY_RANGE_SMALL,
+    COPY_RANGE_FULL
+} BlockCopyMethod;
+
 static coroutine_fn int block_copy_task_entry(AioTask *task);
 
 typedef struct BlockCopyCallState {
@@ -64,8 +72,7 @@ typedef struct BlockCopyTask {
     BlockCopyCallState *call_state;
     int64_t offset;
     int64_t bytes;
-    bool zeroes;
-    bool copy_range;
+    BlockCopyMethod method;
     QLIST_ENTRY(BlockCopyTask) list;
     CoQueue wait_queue; /* coroutines blocked on this task */
 } BlockCopyTask;
@@ -86,8 +93,8 @@ typedef struct BlockCopyState {
     BdrvDirtyBitmap *copy_bitmap;
     int64_t in_flight_bytes;
     int64_t cluster_size;
-    bool use_copy_range;
-    int64_t copy_size;
+    BlockCopyMethod method;
+    int64_t max_transfer;
     uint64_t len;
     QLIST_HEAD(, BlockCopyTask) tasks; /* All tasks from all block-copy calls */
     QLIST_HEAD(, BlockCopyCallState) calls;
@@ -149,6 +156,24 @@ static bool coroutine_fn block_copy_wait_one(BlockCopyState *s, int64_t offset,
     return true;
 }
 
+static int64_t block_copy_chunk_size(BlockCopyState *s)
+{
+    switch (s->method) {
+    case COPY_READ_WRITE_CLUSTER:
+        return s->cluster_size;
+    case COPY_READ_WRITE:
+    case COPY_RANGE_SMALL:
+        return MIN(MAX(s->cluster_size, BLOCK_COPY_MAX_BUFFER),
+                   s->max_transfer);
+    case COPY_RANGE_FULL:
+        return MIN(MAX(s->cluster_size, BLOCK_COPY_MAX_COPY_RANGE),
+                   s->max_transfer);
+    default:
+        /* Cannot have COPY_WRITE_ZEROES here.  */
+        abort();
+    }
+}
+
 /*
  * Search for the first dirty area in offset/bytes range and create task at
  * the beginning of it.
@@ -158,8 +183,9 @@ static BlockCopyTask *block_copy_task_create(BlockCopyState *s,
                                              int64_t offset, int64_t bytes)
 {
     BlockCopyTask *task;
-    int64_t max_chunk = MIN_NON_ZERO(s->copy_size, call_state->max_chunk);
+    int64_t max_chunk = block_copy_chunk_size(s);
 
+    max_chunk = MIN_NON_ZERO(max_chunk, call_state->max_chunk);
     if (!bdrv_dirty_bitmap_next_dirty_area(s->copy_bitmap,
                                            offset, offset + bytes,
                                            max_chunk, &offset, &bytes))
@@ -183,7 +209,7 @@ static BlockCopyTask *block_copy_task_create(BlockCopyState *s,
         .call_state = call_state,
         .offset = offset,
         .bytes = bytes,
-        .copy_range = s->use_copy_range,
+        .method = s->method,
     };
     qemu_co_queue_init(&task->wait_queue);
     QLIST_INSERT_HEAD(&s->tasks, task, list);
@@ -267,28 +293,27 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
         .len = bdrv_dirty_bitmap_size(copy_bitmap),
         .write_flags = write_flags,
         .mem = shres_create(BLOCK_COPY_MAX_MEM),
+        .max_transfer = QEMU_ALIGN_DOWN(block_copy_max_transfer(source, target)
+                                        , cluster_size),
     };
 
-    if (block_copy_max_transfer(source, target) < cluster_size) {
+    if (s->max_transfer < cluster_size) {
         /*
          * copy_range does not respect max_transfer. We don't want to bother
          * with requests smaller than block-copy cluster size, so fallback to
          * buffered copying (read and write respect max_transfer on their
          * behalf).
          */
-        s->use_copy_range = false;
-        s->copy_size = cluster_size;
+        s->method = COPY_READ_WRITE_CLUSTER;
     } else if (write_flags & BDRV_REQ_WRITE_COMPRESSED) {
         /* Compression supports only cluster-size writes and no copy-range. */
-        s->use_copy_range = false;
-        s->copy_size = cluster_size;
+        s->method = COPY_READ_WRITE_CLUSTER;
     } else {
         /*
          * We enable copy-range, but keep small copy_size, until first
          * successful copy_range (look at block_copy_do_copy).
          */
-        s->use_copy_range = use_copy_range;
-        s->copy_size = MAX(s->cluster_size, BLOCK_COPY_MAX_BUFFER);
+        s->method = use_copy_range ? COPY_RANGE_SMALL : COPY_READ_WRITE;
     }
 
     ratelimit_init(&s->rate_limit);
@@ -343,17 +368,14 @@ static coroutine_fn int block_copy_task_run(AioTaskPool *pool,
  *
  * No sync here: nor bitmap neighter intersecting requests handling, only copy.
  *
- * @copy_range is an in-out argument: if *copy_range is false, copy_range is not
- * done. If *copy_range is true, copy_range is attempted. If the copy_range
- * attempt fails, the function falls back to the usual read+write and
- * *copy_range is set to false. *copy_range and zeroes must not be true
- * simultaneously.
- *
+ * @method is an in-out argument, so that copy_range can be either extended to
+ * a full-size buffer or disabled if the copy_range attempt fails.  The output
+ * value of @method should be used for subsequent tasks.
  * Returns 0 on success.
  */
 static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
                                            int64_t offset, int64_t bytes,
-                                           bool zeroes, bool *copy_range,
+                                           BlockCopyMethod *method,
                                            bool *error_is_read)
 {
     int ret;
@@ -367,9 +389,9 @@ static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
     assert(offset + bytes <= s->len ||
            offset + bytes == QEMU_ALIGN_UP(s->len, s->cluster_size));
     assert(nbytes < INT_MAX);
-    assert(!(*copy_range && zeroes));
 
-    if (zeroes) {
+    switch (*method) {
+    case COPY_WRITE_ZEROES:
         ret = bdrv_co_pwrite_zeroes(s->target, offset, nbytes, s->write_flags &
                                     ~BDRV_REQ_WRITE_COMPRESSED);
         if (ret < 0) {
@@ -377,89 +399,67 @@ static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
             *error_is_read = false;
         }
         return ret;
-    }
 
-    if (*copy_range) {
+    case COPY_RANGE_SMALL:
+    case COPY_RANGE_FULL:
         ret = bdrv_co_copy_range(s->source, offset, s->target, offset, nbytes,
                                  0, s->write_flags);
-        if (ret < 0) {
-            trace_block_copy_copy_range_fail(s, offset, ret);
-            *copy_range = false;
-            /* Fallback to read+write with allocated buffer */
-        } else {
+        if (ret >= 0) {
+            /* Successful copy-range, increase copy_size.  */
+            *method = COPY_RANGE_FULL;
             return 0;
         }
-    }
-
-    /*
-     * In case of failed copy_range request above, we may proceed with buffered
-     * request larger than BLOCK_COPY_MAX_BUFFER. Still, further requests will
-     * be properly limited, so don't care too much. Moreover the most likely
-     * case (copy_range is unsupported for the configuration, so the very first
-     * copy_range request fails) is handled by setting large copy_size only
-     * after first successful copy_range.
-     */
 
-    bounce_buffer = qemu_blockalign(s->source->bs, nbytes);
+        trace_block_copy_copy_range_fail(s, offset, ret);
+        *method = COPY_READ_WRITE;
+        /* Fall through to read+write with allocated buffer */
 
-    ret = bdrv_co_pread(s->source, offset, nbytes, bounce_buffer, 0);
-    if (ret < 0) {
-        trace_block_copy_read_fail(s, offset, ret);
-        *error_is_read = true;
-        goto out;
-    }
+    default:
+        /*
+         * In case of failed copy_range request above, we may proceed with
+         * buffered request larger than BLOCK_COPY_MAX_BUFFER.
+         * Still, further requests will be properly limited, so don't care too
+         * much. Moreover the most likely case (copy_range is unsupported for
+         * the configuration, so the very first copy_range request fails)
+         * is handled by setting large copy_size only after first successful
+         * copy_range.
+         */
 
-    ret = bdrv_co_pwrite(s->target, offset, nbytes, bounce_buffer,
-                         s->write_flags);
-    if (ret < 0) {
-        trace_block_copy_write_fail(s, offset, ret);
-        *error_is_read = false;
-        goto out;
-    }
+        bounce_buffer = qemu_blockalign(s->source->bs, nbytes);
 
-out:
-    qemu_vfree(bounce_buffer);
+        ret = bdrv_co_pread(s->source, offset, nbytes, bounce_buffer, 0);
+        if (ret < 0) {
+            trace_block_copy_read_fail(s, offset, ret);
+            *error_is_read = true;
+            goto out;
+        }
 
-    return ret;
-}
+        ret = bdrv_co_pwrite(s->target, offset, nbytes, bounce_buffer,
+                            s->write_flags);
+        if (ret < 0) {
+            trace_block_copy_write_fail(s, offset, ret);
+            *error_is_read = false;
+            goto out;
+        }
 
-static void block_copy_handle_copy_range_result(BlockCopyState *s,
-                                                bool is_success)
-{
-    if (!s->use_copy_range) {
-        /* already disabled */
-        return;
+out:
+        qemu_vfree(bounce_buffer);
     }
 
-    if (is_success) {
-        /*
-         * Successful copy-range. Now increase copy_size.  copy_range
-         * does not respect max_transfer (it's a TODO), so we factor
-         * that in here.
-         */
-        s->copy_size =
-                MIN(MAX(s->cluster_size, BLOCK_COPY_MAX_COPY_RANGE),
-                    QEMU_ALIGN_DOWN(block_copy_max_transfer(s->source,
-                                                            s->target),
-                                    s->cluster_size));
-    } else {
-        /* Copy-range failed, disable it. */
-        s->use_copy_range = false;
-        s->copy_size = MAX(s->cluster_size, BLOCK_COPY_MAX_BUFFER);
-    }
+    return ret;
 }
 
 static coroutine_fn int block_copy_task_entry(AioTask *task)
 {
     BlockCopyTask *t = container_of(task, BlockCopyTask, task);
+    BlockCopyState *s = t->s;
     bool error_is_read = false;
-    bool copy_range = t->copy_range;
+    BlockCopyMethod method = t->method;
     int ret;
 
-    ret = block_copy_do_copy(t->s, t->offset, t->bytes, t->zeroes,
-                             &copy_range, &error_is_read);
-    if (t->copy_range) {
-        block_copy_handle_copy_range_result(t->s, copy_range);
+    ret = block_copy_do_copy(s, t->offset, t->bytes, &method, &error_is_read);
+    if (s->method == t->method) {
+        s->method = method;
     }
     if (ret < 0) {
         if (!t->call_state->ret) {
@@ -642,8 +642,7 @@ block_copy_dirty_clusters(BlockCopyCallState *call_state)
             continue;
         }
         if (ret & BDRV_BLOCK_ZERO) {
-            task->zeroes = true;
-            task->copy_range = false;
+            task->method = COPY_WRITE_ZEROES;
         }
 
         if (!call_state->ignore_ratelimit) {
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions
  2021-06-08  7:33 [PATCH v3 0/5] block-copy: protect block-copy internal structures Emanuele Giuseppe Esposito
  2021-06-08  7:33 ` [PATCH v3 1/5] block-copy: streamline choice of copy_range vs. read/write Emanuele Giuseppe Esposito
@ 2021-06-08  7:33 ` Emanuele Giuseppe Esposito
  2021-06-09  9:12   ` Vladimir Sementsov-Ogievskiy
  2021-06-08  7:33 ` [PATCH v3 3/5] block-copy: move progress_set_remaining in block_copy_task_end Emanuele Giuseppe Esposito
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-06-08  7:33 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Emanuele Giuseppe Esposito,
	Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

As done in BlockCopyCallState, categorize BlockCopyTask
and BlockCopyState in IN, State and OUT fields.
This is just to understand which field has to be protected with a lock.

.sleep_state is handled in the series "coroutine: new sleep/wake API"
and thus here left as TODO.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block/block-copy.c | 47 ++++++++++++++++++++++++++++++----------------
 1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index d58051288b..b3533a3003 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -56,25 +56,33 @@ typedef struct BlockCopyCallState {
     QLIST_ENTRY(BlockCopyCallState) list;
 
     /* State */
-    int ret;
     bool finished;
-    QemuCoSleep sleep;
-    bool cancelled;
+    QemuCoSleep sleep; /* TODO: protect API with a lock */
 
     /* OUT parameters */
+    bool cancelled;
     bool error_is_read;
+    int ret;
 } BlockCopyCallState;
 
 typedef struct BlockCopyTask {
     AioTask task;
 
+    /*
+     * IN parameters. Initialized in block_copy_task_create()
+     * and never changed.
+     */
     BlockCopyState *s;
     BlockCopyCallState *call_state;
     int64_t offset;
-    int64_t bytes;
-    BlockCopyMethod method;
-    QLIST_ENTRY(BlockCopyTask) list;
+    int64_t bytes; /* only re-set in task_shrink, before running the task */
+    BlockCopyMethod method; /* initialized in block_copy_dirty_clusters() */
+
+    /* State */
     CoQueue wait_queue; /* coroutines blocked on this task */
+
+    /* To reference all call states from BlockCopyState */
+    QLIST_ENTRY(BlockCopyTask) list;
 } BlockCopyTask;
 
 static int64_t task_end(BlockCopyTask *task)
@@ -90,15 +98,25 @@ typedef struct BlockCopyState {
      */
     BdrvChild *source;
     BdrvChild *target;
-    BdrvDirtyBitmap *copy_bitmap;
+
+    /* State */
     int64_t in_flight_bytes;
-    int64_t cluster_size;
     BlockCopyMethod method;
-    int64_t max_transfer;
-    uint64_t len;
     QLIST_HEAD(, BlockCopyTask) tasks; /* All tasks from all block-copy calls */
     QLIST_HEAD(, BlockCopyCallState) calls;
+    /* State fields that use a thread-safe API */
+    BdrvDirtyBitmap *copy_bitmap;
+    ProgressMeter *progress;
+    SharedResource *mem;
+    RateLimit rate_limit;
 
+    /*
+     * IN parameters. Initialized in block_copy_state_new()
+     * and never changed.
+     */
+    int64_t cluster_size;
+    int64_t max_transfer;
+    uint64_t len;
     BdrvRequestFlags write_flags;
 
     /*
@@ -114,14 +132,11 @@ typedef struct BlockCopyState {
      * In this case, block_copy() will query the source’s allocation status,
      * skip unallocated regions, clear them in the copy_bitmap, and invoke
      * block_copy_reset_unallocated() every time it does.
+     *
+     * This field is set in backup_run() before coroutines are run,
+     * therefore is an IN.
      */
     bool skip_unallocated;
-
-    ProgressMeter *progress;
-
-    SharedResource *mem;
-
-    RateLimit rate_limit;
 } BlockCopyState;
 
 static BlockCopyTask *find_conflicting_task(BlockCopyState *s,
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 3/5] block-copy: move progress_set_remaining in block_copy_task_end
  2021-06-08  7:33 [PATCH v3 0/5] block-copy: protect block-copy internal structures Emanuele Giuseppe Esposito
  2021-06-08  7:33 ` [PATCH v3 1/5] block-copy: streamline choice of copy_range vs. read/write Emanuele Giuseppe Esposito
  2021-06-08  7:33 ` [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions Emanuele Giuseppe Esposito
@ 2021-06-08  7:33 ` Emanuele Giuseppe Esposito
  2021-06-08  7:33 ` [PATCH v3 4/5] block-copy: add a CoMutex Emanuele Giuseppe Esposito
  2021-06-08  7:33 ` [PATCH v3 5/5] block-copy: atomic .cancelled and .finished fields in BlockCopyCallState Emanuele Giuseppe Esposito
  4 siblings, 0 replies; 19+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-06-08  7:33 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Emanuele Giuseppe Esposito,
	Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

Moving this function in task_end ensures to update the progress
anyways, even if there is an error.

It also helps in next patch, allowing task_end to have only
one critical section.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block/block-copy.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index b3533a3003..e2adb5b2ea 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -263,6 +263,9 @@ static void coroutine_fn block_copy_task_end(BlockCopyTask *task, int ret)
         bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->offset, task->bytes);
     }
     QLIST_REMOVE(task, list);
+    progress_set_remaining(task->s->progress,
+                           bdrv_get_dirty_count(task->s->copy_bitmap) +
+                           task->s->in_flight_bytes);
     qemu_co_queue_restart_all(&task->wait_queue);
 }
 
@@ -647,9 +650,6 @@ block_copy_dirty_clusters(BlockCopyCallState *call_state)
         }
         if (s->skip_unallocated && !(ret & BDRV_BLOCK_ALLOCATED)) {
             block_copy_task_end(task, 0);
-            progress_set_remaining(s->progress,
-                                   bdrv_get_dirty_count(s->copy_bitmap) +
-                                   s->in_flight_bytes);
             trace_block_copy_skip_range(s, task->offset, task->bytes);
             offset = task_end(task);
             bytes = end - offset;
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 4/5] block-copy: add a CoMutex
  2021-06-08  7:33 [PATCH v3 0/5] block-copy: protect block-copy internal structures Emanuele Giuseppe Esposito
                   ` (2 preceding siblings ...)
  2021-06-08  7:33 ` [PATCH v3 3/5] block-copy: move progress_set_remaining in block_copy_task_end Emanuele Giuseppe Esposito
@ 2021-06-08  7:33 ` Emanuele Giuseppe Esposito
  2021-06-09 12:25   ` Vladimir Sementsov-Ogievskiy
  2021-06-08  7:33 ` [PATCH v3 5/5] block-copy: atomic .cancelled and .finished fields in BlockCopyCallState Emanuele Giuseppe Esposito
  4 siblings, 1 reply; 19+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-06-08  7:33 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Emanuele Giuseppe Esposito,
	Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

Add a CoMutex to protect concurrent access of block-copy
data structures.

This mutex also protects .copy_bitmap, because its thread-safe
API does not prevent it from assigning two tasks to the same
bitmap region.

.finished, .cancelled and reads to .ret and .error_is_read will be
protected in the following patch, because are used also outside
coroutines.

Also set block_copy_task_create as coroutine_fn because:
1) it is static and only invoked by coroutine functions
2) this patch introduces and uses a CoMutex lock there

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block/block-copy.c | 82 ++++++++++++++++++++++++++++++----------------
 1 file changed, 54 insertions(+), 28 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index e2adb5b2ea..56f62913e4 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -61,6 +61,7 @@ typedef struct BlockCopyCallState {
 
     /* OUT parameters */
     bool cancelled;
+    /* Fields protected by lock in BlockCopyState */
     bool error_is_read;
     int ret;
 } BlockCopyCallState;
@@ -78,7 +79,7 @@ typedef struct BlockCopyTask {
     int64_t bytes; /* only re-set in task_shrink, before running the task */
     BlockCopyMethod method; /* initialized in block_copy_dirty_clusters() */
 
-    /* State */
+    /* State. Protected by lock in BlockCopyState */
     CoQueue wait_queue; /* coroutines blocked on this task */
 
     /* To reference all call states from BlockCopyState */
@@ -99,7 +100,8 @@ typedef struct BlockCopyState {
     BdrvChild *source;
     BdrvChild *target;
 
-    /* State */
+    /* State. Protected by lock */
+    CoMutex lock;
     int64_t in_flight_bytes;
     BlockCopyMethod method;
     QLIST_HEAD(, BlockCopyTask) tasks; /* All tasks from all block-copy calls */
@@ -139,8 +141,10 @@ typedef struct BlockCopyState {
     bool skip_unallocated;
 } BlockCopyState;
 
-static BlockCopyTask *find_conflicting_task(BlockCopyState *s,
-                                            int64_t offset, int64_t bytes)
+/* Called with lock held */
+static BlockCopyTask *find_conflicting_task_locked(BlockCopyState *s,
+                                                   int64_t offset,
+                                                   int64_t bytes)
 {
     BlockCopyTask *t;
 
@@ -160,18 +164,22 @@ static BlockCopyTask *find_conflicting_task(BlockCopyState *s,
 static bool coroutine_fn block_copy_wait_one(BlockCopyState *s, int64_t offset,
                                              int64_t bytes)
 {
-    BlockCopyTask *task = find_conflicting_task(s, offset, bytes);
+    BlockCopyTask *task;
+
+    QEMU_LOCK_GUARD(&s->lock);
+    task = find_conflicting_task_locked(s, offset, bytes);
 
     if (!task) {
         return false;
     }
 
-    qemu_co_queue_wait(&task->wait_queue, NULL);
+    qemu_co_queue_wait(&task->wait_queue, &s->lock);
 
     return true;
 }
 
-static int64_t block_copy_chunk_size(BlockCopyState *s)
+/* Called with lock held */
+static int64_t block_copy_chunk_size_locked(BlockCopyState *s)
 {
     switch (s->method) {
     case COPY_READ_WRITE_CLUSTER:
@@ -193,14 +201,16 @@ static int64_t block_copy_chunk_size(BlockCopyState *s)
  * Search for the first dirty area in offset/bytes range and create task at
  * the beginning of it.
  */
-static BlockCopyTask *block_copy_task_create(BlockCopyState *s,
-                                             BlockCopyCallState *call_state,
-                                             int64_t offset, int64_t bytes)
+static coroutine_fn BlockCopyTask *block_copy_task_create(BlockCopyState *s,
+                                                BlockCopyCallState *call_state,
+                                                int64_t offset, int64_t bytes)
 {
     BlockCopyTask *task;
-    int64_t max_chunk = block_copy_chunk_size(s);
+    int64_t max_chunk;
 
-    max_chunk = MIN_NON_ZERO(max_chunk, call_state->max_chunk);
+    QEMU_LOCK_GUARD(&s->lock);
+    max_chunk = MIN_NON_ZERO(block_copy_chunk_size_locked(s),
+                             call_state->max_chunk);
     if (!bdrv_dirty_bitmap_next_dirty_area(s->copy_bitmap,
                                            offset, offset + bytes,
                                            max_chunk, &offset, &bytes))
@@ -212,7 +222,7 @@ static BlockCopyTask *block_copy_task_create(BlockCopyState *s,
     bytes = QEMU_ALIGN_UP(bytes, s->cluster_size);
 
     /* region is dirty, so no existent tasks possible in it */
-    assert(!find_conflicting_task(s, offset, bytes));
+    assert(!find_conflicting_task_locked(s, offset, bytes));
 
     bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
     s->in_flight_bytes += bytes;
@@ -248,16 +258,19 @@ static void coroutine_fn block_copy_task_shrink(BlockCopyTask *task,
 
     assert(new_bytes > 0 && new_bytes < task->bytes);
 
-    task->s->in_flight_bytes -= task->bytes - new_bytes;
-    bdrv_set_dirty_bitmap(task->s->copy_bitmap,
-                          task->offset + new_bytes, task->bytes - new_bytes);
-
-    task->bytes = new_bytes;
-    qemu_co_queue_restart_all(&task->wait_queue);
+    WITH_QEMU_LOCK_GUARD(&task->s->lock) {
+        task->s->in_flight_bytes -= task->bytes - new_bytes;
+        bdrv_set_dirty_bitmap(task->s->copy_bitmap,
+                              task->offset + new_bytes,
+                              task->bytes - new_bytes);
+        task->bytes = new_bytes;
+        qemu_co_queue_restart_all(&task->wait_queue);
+    }
 }
 
 static void coroutine_fn block_copy_task_end(BlockCopyTask *task, int ret)
 {
+    QEMU_LOCK_GUARD(&task->s->lock);
     task->s->in_flight_bytes -= task->bytes;
     if (ret < 0) {
         bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->offset, task->bytes);
@@ -335,6 +348,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
     }
 
     ratelimit_init(&s->rate_limit);
+    qemu_co_mutex_init(&s->lock);
     QLIST_INIT(&s->tasks);
     QLIST_INIT(&s->calls);
 
@@ -390,6 +404,8 @@ static coroutine_fn int block_copy_task_run(AioTaskPool *pool,
  * a full-size buffer or disabled if the copy_range attempt fails.  The output
  * value of @method should be used for subsequent tasks.
  * Returns 0 on success.
+ *
+ * Called with lock held.
  */
 static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
                                            int64_t offset, int64_t bytes,
@@ -476,16 +492,20 @@ static coroutine_fn int block_copy_task_entry(AioTask *task)
     int ret;
 
     ret = block_copy_do_copy(s, t->offset, t->bytes, &method, &error_is_read);
-    if (s->method == t->method) {
-        s->method = method;
-    }
-    if (ret < 0) {
-        if (!t->call_state->ret) {
-            t->call_state->ret = ret;
-            t->call_state->error_is_read = error_is_read;
+
+    WITH_QEMU_LOCK_GUARD(&t->s->lock) {
+        if (s->method == t->method) {
+            s->method = method;
+        }
+
+        if (ret < 0) {
+            if (!t->call_state->ret) {
+                t->call_state->ret = ret;
+                t->call_state->error_is_read = error_is_read;
+            }
+        } else {
+            progress_work_done(t->s->progress, t->bytes);
         }
-    } else {
-        progress_work_done(t->s->progress, t->bytes);
     }
     co_put_to_shres(t->s->mem, t->bytes);
     block_copy_task_end(t, ret);
@@ -587,10 +607,12 @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
     bytes = clusters * s->cluster_size;
 
     if (!ret) {
+        qemu_co_mutex_lock(&s->lock);
         bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
         progress_set_remaining(s->progress,
                                bdrv_get_dirty_count(s->copy_bitmap) +
                                s->in_flight_bytes);
+        qemu_co_mutex_unlock(&s->lock);
     }
 
     *count = bytes;
@@ -729,7 +751,9 @@ static int coroutine_fn block_copy_common(BlockCopyCallState *call_state)
 {
     int ret;
 
+    qemu_co_mutex_lock(&call_state->s->lock);
     QLIST_INSERT_HEAD(&call_state->s->calls, call_state, list);
+    qemu_co_mutex_unlock(&call_state->s->lock);
 
     do {
         ret = block_copy_dirty_clusters(call_state);
@@ -756,7 +780,9 @@ static int coroutine_fn block_copy_common(BlockCopyCallState *call_state)
         call_state->cb(call_state->cb_opaque);
     }
 
+    qemu_co_mutex_lock(&call_state->s->lock);
     QLIST_REMOVE(call_state, list);
+    qemu_co_mutex_unlock(&call_state->s->lock);
 
     return ret;
 }
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 5/5] block-copy: atomic .cancelled and .finished fields in BlockCopyCallState
  2021-06-08  7:33 [PATCH v3 0/5] block-copy: protect block-copy internal structures Emanuele Giuseppe Esposito
                   ` (3 preceding siblings ...)
  2021-06-08  7:33 ` [PATCH v3 4/5] block-copy: add a CoMutex Emanuele Giuseppe Esposito
@ 2021-06-08  7:33 ` Emanuele Giuseppe Esposito
  4 siblings, 0 replies; 19+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-06-08  7:33 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Emanuele Giuseppe Esposito,
	Vladimir Sementsov-Ogievskiy, qemu-devel, Max Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

By adding acquire/release pairs, we ensure that .ret and .error_is_read
fields are written by block_copy_dirty_clusters before .finished is true.

The atomic here are necessary because the fields are concurrently modified
also outside coroutines.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block/block-copy.c | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 56f62913e4..55b6ce6a57 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -56,11 +56,11 @@ typedef struct BlockCopyCallState {
     QLIST_ENTRY(BlockCopyCallState) list;
 
     /* State */
-    bool finished;
+    bool finished; /* atomic */
     QemuCoSleep sleep; /* TODO: protect API with a lock */
 
     /* OUT parameters */
-    bool cancelled;
+    bool cancelled; /* atomic */
     /* Fields protected by lock in BlockCopyState */
     bool error_is_read;
     int ret;
@@ -648,7 +648,8 @@ block_copy_dirty_clusters(BlockCopyCallState *call_state)
     assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
     assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));
 
-    while (bytes && aio_task_pool_status(aio) == 0 && !call_state->cancelled) {
+    while (bytes && aio_task_pool_status(aio) == 0 &&
+           !qatomic_read(&call_state->cancelled)) {
         BlockCopyTask *task;
         int64_t status_bytes;
 
@@ -758,7 +759,7 @@ static int coroutine_fn block_copy_common(BlockCopyCallState *call_state)
     do {
         ret = block_copy_dirty_clusters(call_state);
 
-        if (ret == 0 && !call_state->cancelled) {
+        if (ret == 0 && !qatomic_read(&call_state->cancelled)) {
             ret = block_copy_wait_one(call_state->s, call_state->offset,
                                       call_state->bytes);
         }
@@ -772,9 +773,9 @@ static int coroutine_fn block_copy_common(BlockCopyCallState *call_state)
          * 2. We have waited for some intersecting block-copy request
          *    It may have failed and produced new dirty bits.
          */
-    } while (ret > 0 && !call_state->cancelled);
+    } while (ret > 0 && !qatomic_read(&call_state->cancelled));
 
-    call_state->finished = true;
+    qatomic_store_release(&call_state->finished, true);
 
     if (call_state->cb) {
         call_state->cb(call_state->cb_opaque);
@@ -837,35 +838,37 @@ void block_copy_call_free(BlockCopyCallState *call_state)
         return;
     }
 
-    assert(call_state->finished);
+    assert(qatomic_load_acquire(&call_state->finished));
     g_free(call_state);
 }
 
 bool block_copy_call_finished(BlockCopyCallState *call_state)
 {
-    return call_state->finished;
+    return qatomic_load_acquire(&call_state->finished);
 }
 
 bool block_copy_call_succeeded(BlockCopyCallState *call_state)
 {
-    return call_state->finished && !call_state->cancelled &&
-        call_state->ret == 0;
+    return qatomic_load_acquire(&call_state->finished) &&
+           !qatomic_read(&call_state->cancelled) &&
+           call_state->ret == 0;
 }
 
 bool block_copy_call_failed(BlockCopyCallState *call_state)
 {
-    return call_state->finished && !call_state->cancelled &&
-        call_state->ret < 0;
+    return qatomic_load_acquire(&call_state->finished) &&
+           !qatomic_read(&call_state->cancelled) &&
+           call_state->ret < 0;
 }
 
 bool block_copy_call_cancelled(BlockCopyCallState *call_state)
 {
-    return call_state->cancelled;
+    return qatomic_read(&call_state->cancelled);
 }
 
 int block_copy_call_status(BlockCopyCallState *call_state, bool *error_is_read)
 {
-    assert(call_state->finished);
+    assert(qatomic_load_acquire(&call_state->finished));
     if (error_is_read) {
         *error_is_read = call_state->error_is_read;
     }
@@ -874,7 +877,7 @@ int block_copy_call_status(BlockCopyCallState *call_state, bool *error_is_read)
 
 void block_copy_call_cancel(BlockCopyCallState *call_state)
 {
-    call_state->cancelled = true;
+    qatomic_set(&call_state->cancelled, true);
     block_copy_kick(call_state);
 }
 
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/5] block-copy: streamline choice of copy_range vs. read/write
  2021-06-08  7:33 ` [PATCH v3 1/5] block-copy: streamline choice of copy_range vs. read/write Emanuele Giuseppe Esposito
@ 2021-06-09  8:51   ` Vladimir Sementsov-Ogievskiy
  2021-06-09  9:33     ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-06-09  8:51 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: John Snow, Kevin Wolf, Max Reitz, Paolo Bonzini, Stefan Hajnoczi,
	qemu-devel

08.06.2021 10:33, Emanuele Giuseppe Esposito wrote:
> From: Paolo Bonzini <pbonzini@redhat.com>
> 
> Put the logic to determine the copy size in a separate function, so
> that there is a simple state machine for the possible methods of
> copying data from one BlockDriverState to the other.
> 
> Use .method instead of .copy_range as in-out argument, and
> include also .zeroes as an additional copy method.
> 
> While at it, store the common computation of block_copy_max_transfer
> into a new field of BlockCopyState, and make sure that we always
> obey max_transfer; that's more efficient even for the
> COPY_RANGE_READ_WRITE case.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

In general looks OK to me, I mostly have style remarks, see below.

> ---
>   block/block-copy.c | 171 ++++++++++++++++++++++-----------------------
>   1 file changed, 85 insertions(+), 86 deletions(-)
> 
> diff --git a/block/block-copy.c b/block/block-copy.c
> index 943e30b7e6..d58051288b 100644
> --- a/block/block-copy.c
> +++ b/block/block-copy.c
> @@ -28,6 +28,14 @@
>   #define BLOCK_COPY_MAX_WORKERS 64
>   #define BLOCK_COPY_SLICE_TIME 100000000ULL /* ns */
>   
> +typedef enum {
> +    COPY_READ_WRITE_CLUSTER,
> +    COPY_READ_WRITE,
> +    COPY_WRITE_ZEROES,
> +    COPY_RANGE_SMALL,
> +    COPY_RANGE_FULL
> +} BlockCopyMethod;
> +
>   static coroutine_fn int block_copy_task_entry(AioTask *task);
>   
>   typedef struct BlockCopyCallState {
> @@ -64,8 +72,7 @@ typedef struct BlockCopyTask {
>       BlockCopyCallState *call_state;
>       int64_t offset;
>       int64_t bytes;
> -    bool zeroes;
> -    bool copy_range;
> +    BlockCopyMethod method;
>       QLIST_ENTRY(BlockCopyTask) list;
>       CoQueue wait_queue; /* coroutines blocked on this task */
>   } BlockCopyTask;
> @@ -86,8 +93,8 @@ typedef struct BlockCopyState {
>       BdrvDirtyBitmap *copy_bitmap;
>       int64_t in_flight_bytes;
>       int64_t cluster_size;
> -    bool use_copy_range;
> -    int64_t copy_size;
> +    BlockCopyMethod method;
> +    int64_t max_transfer;
>       uint64_t len;
>       QLIST_HEAD(, BlockCopyTask) tasks; /* All tasks from all block-copy calls */
>       QLIST_HEAD(, BlockCopyCallState) calls;
> @@ -149,6 +156,24 @@ static bool coroutine_fn block_copy_wait_one(BlockCopyState *s, int64_t offset,
>       return true;
>   }
>   
> +static int64_t block_copy_chunk_size(BlockCopyState *s)
> +{
> +    switch (s->method) {
> +    case COPY_READ_WRITE_CLUSTER:
> +        return s->cluster_size;
> +    case COPY_READ_WRITE:
> +    case COPY_RANGE_SMALL:
> +        return MIN(MAX(s->cluster_size, BLOCK_COPY_MAX_BUFFER),
> +                   s->max_transfer);
> +    case COPY_RANGE_FULL:
> +        return MIN(MAX(s->cluster_size, BLOCK_COPY_MAX_COPY_RANGE),
> +                   s->max_transfer);
> +    default:
> +        /* Cannot have COPY_WRITE_ZEROES here.  */
> +        abort();
> +    }
> +}
> +
>   /*
>    * Search for the first dirty area in offset/bytes range and create task at
>    * the beginning of it.
> @@ -158,8 +183,9 @@ static BlockCopyTask *block_copy_task_create(BlockCopyState *s,
>                                                int64_t offset, int64_t bytes)
>   {
>       BlockCopyTask *task;
> -    int64_t max_chunk = MIN_NON_ZERO(s->copy_size, call_state->max_chunk);
> +    int64_t max_chunk = block_copy_chunk_size(s);
>   
> +    max_chunk = MIN_NON_ZERO(max_chunk, call_state->max_chunk);
>       if (!bdrv_dirty_bitmap_next_dirty_area(s->copy_bitmap,
>                                              offset, offset + bytes,
>                                              max_chunk, &offset, &bytes))
> @@ -183,7 +209,7 @@ static BlockCopyTask *block_copy_task_create(BlockCopyState *s,
>           .call_state = call_state,
>           .offset = offset,
>           .bytes = bytes,
> -        .copy_range = s->use_copy_range,
> +        .method = s->method,
>       };
>       qemu_co_queue_init(&task->wait_queue);
>       QLIST_INSERT_HEAD(&s->tasks, task, list);
> @@ -267,28 +293,27 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
>           .len = bdrv_dirty_bitmap_size(copy_bitmap),
>           .write_flags = write_flags,
>           .mem = shres_create(BLOCK_COPY_MAX_MEM),
> +        .max_transfer = QEMU_ALIGN_DOWN(block_copy_max_transfer(source, target)
> +                                        , cluster_size),

It seems unusual to not keep comma on the previous line (and it's actually fit into 80 characters).

I've grepped several places with '^\s*,' pattern, for example in target/mips/tcg/msa_helper.c. But at least in this file there is no such practice, let's be consistent.

>       };
>   
> -    if (block_copy_max_transfer(source, target) < cluster_size) {
> +    if (s->max_transfer < cluster_size) {
>           /*
>            * copy_range does not respect max_transfer. We don't want to bother
>            * with requests smaller than block-copy cluster size, so fallback to
>            * buffered copying (read and write respect max_transfer on their
>            * behalf).
>            */
> -        s->use_copy_range = false;
> -        s->copy_size = cluster_size;
> +        s->method = COPY_READ_WRITE_CLUSTER;
>       } else if (write_flags & BDRV_REQ_WRITE_COMPRESSED) {
>           /* Compression supports only cluster-size writes and no copy-range. */
> -        s->use_copy_range = false;
> -        s->copy_size = cluster_size;
> +        s->method = COPY_READ_WRITE_CLUSTER;
>       } else {
>           /*
>            * We enable copy-range, but keep small copy_size, until first
>            * successful copy_range (look at block_copy_do_copy).
>            */
> -        s->use_copy_range = use_copy_range;
> -        s->copy_size = MAX(s->cluster_size, BLOCK_COPY_MAX_BUFFER);
> +        s->method = use_copy_range ? COPY_RANGE_SMALL : COPY_READ_WRITE;
>       }
>   
>       ratelimit_init(&s->rate_limit);
> @@ -343,17 +368,14 @@ static coroutine_fn int block_copy_task_run(AioTaskPool *pool,
>    *
>    * No sync here: nor bitmap neighter intersecting requests handling, only copy.
>    *
> - * @copy_range is an in-out argument: if *copy_range is false, copy_range is not
> - * done. If *copy_range is true, copy_range is attempted. If the copy_range
> - * attempt fails, the function falls back to the usual read+write and
> - * *copy_range is set to false. *copy_range and zeroes must not be true
> - * simultaneously.
> - *
> + * @method is an in-out argument, so that copy_range can be either extended to
> + * a full-size buffer or disabled if the copy_range attempt fails.  The output
> + * value of @method should be used for subsequent tasks.
>    * Returns 0 on success.
>    */
>   static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
>                                              int64_t offset, int64_t bytes,
> -                                           bool zeroes, bool *copy_range,
> +                                           BlockCopyMethod *method,
>                                              bool *error_is_read)
>   {
>       int ret;
> @@ -367,9 +389,9 @@ static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
>       assert(offset + bytes <= s->len ||
>              offset + bytes == QEMU_ALIGN_UP(s->len, s->cluster_size));
>       assert(nbytes < INT_MAX);
> -    assert(!(*copy_range && zeroes));
>   
> -    if (zeroes) {
> +    switch (*method) {
> +    case COPY_WRITE_ZEROES:
>           ret = bdrv_co_pwrite_zeroes(s->target, offset, nbytes, s->write_flags &
>                                       ~BDRV_REQ_WRITE_COMPRESSED);
>           if (ret < 0) {
> @@ -377,89 +399,67 @@ static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
>               *error_is_read = false;
>           }
>           return ret;
> -    }
>   
> -    if (*copy_range) {
> +    case COPY_RANGE_SMALL:
> +    case COPY_RANGE_FULL:
>           ret = bdrv_co_copy_range(s->source, offset, s->target, offset, nbytes,
>                                    0, s->write_flags);
> -        if (ret < 0) {
> -            trace_block_copy_copy_range_fail(s, offset, ret);
> -            *copy_range = false;
> -            /* Fallback to read+write with allocated buffer */
> -        } else {
> +        if (ret >= 0) {
> +            /* Successful copy-range, increase copy_size.  */
> +            *method = COPY_RANGE_FULL;
>               return 0;
>           }
> -    }
> -
> -    /*
> -     * In case of failed copy_range request above, we may proceed with buffered
> -     * request larger than BLOCK_COPY_MAX_BUFFER. Still, further requests will
> -     * be properly limited, so don't care too much. Moreover the most likely
> -     * case (copy_range is unsupported for the configuration, so the very first
> -     * copy_range request fails) is handled by setting large copy_size only
> -     * after first successful copy_range.
> -     */
>   
> -    bounce_buffer = qemu_blockalign(s->source->bs, nbytes);
> +        trace_block_copy_copy_range_fail(s, offset, ret);
> +        *method = COPY_READ_WRITE;
> +        /* Fall through to read+write with allocated buffer */
>   
> -    ret = bdrv_co_pread(s->source, offset, nbytes, bounce_buffer, 0);
> -    if (ret < 0) {
> -        trace_block_copy_read_fail(s, offset, ret);
> -        *error_is_read = true;
> -        goto out;
> -    }
> +    default:

It would be safer to explicitly write remaining cases here and then abort() in default:.

> +        /*
> +         * In case of failed copy_range request above, we may proceed with
> +         * buffered request larger than BLOCK_COPY_MAX_BUFFER.
> +         * Still, further requests will be properly limited, so don't care too
> +         * much. Moreover the most likely case (copy_range is unsupported for
> +         * the configuration, so the very first copy_range request fails)
> +         * is handled by setting large copy_size only after first successful
> +         * copy_range.
> +         */
>   
> -    ret = bdrv_co_pwrite(s->target, offset, nbytes, bounce_buffer,
> -                         s->write_flags);
> -    if (ret < 0) {
> -        trace_block_copy_write_fail(s, offset, ret);
> -        *error_is_read = false;
> -        goto out;
> -    }
> +        bounce_buffer = qemu_blockalign(s->source->bs, nbytes);
>   
> -out:
> -    qemu_vfree(bounce_buffer);
> +        ret = bdrv_co_pread(s->source, offset, nbytes, bounce_buffer, 0);
> +        if (ret < 0) {
> +            trace_block_copy_read_fail(s, offset, ret);
> +            *error_is_read = true;
> +            goto out;
> +        }
>   
> -    return ret;
> -}
> +        ret = bdrv_co_pwrite(s->target, offset, nbytes, bounce_buffer,
> +                            s->write_flags);

alignment broken

> +        if (ret < 0) {
> +            trace_block_copy_write_fail(s, offset, ret);
> +            *error_is_read = false;
> +            goto out;
> +        }
>   
> -static void block_copy_handle_copy_range_result(BlockCopyState *s,
> -                                                bool is_success)
> -{
> -    if (!s->use_copy_range) {
> -        /* already disabled */
> -        return;
> +out:
> +        qemu_vfree(bounce_buffer);

label inside switch operator? Rather unusual. Please, let's avoid it and just keep out: after switch operator.

>       }
>   
> -    if (is_success) {
> -        /*
> -         * Successful copy-range. Now increase copy_size.  copy_range
> -         * does not respect max_transfer (it's a TODO), so we factor
> -         * that in here.
> -         */
> -        s->copy_size =
> -                MIN(MAX(s->cluster_size, BLOCK_COPY_MAX_COPY_RANGE),
> -                    QEMU_ALIGN_DOWN(block_copy_max_transfer(s->source,
> -                                                            s->target),
> -                                    s->cluster_size));
> -    } else {
> -        /* Copy-range failed, disable it. */
> -        s->use_copy_range = false;
> -        s->copy_size = MAX(s->cluster_size, BLOCK_COPY_MAX_BUFFER);
> -    }
> +    return ret;
>   }
>   
>   static coroutine_fn int block_copy_task_entry(AioTask *task)
>   {
>       BlockCopyTask *t = container_of(task, BlockCopyTask, task);
> +    BlockCopyState *s = t->s;
>       bool error_is_read = false;
> -    bool copy_range = t->copy_range;
> +    BlockCopyMethod method = t->method;
>       int ret;
>   
> -    ret = block_copy_do_copy(t->s, t->offset, t->bytes, t->zeroes,
> -                             &copy_range, &error_is_read);
> -    if (t->copy_range) {
> -        block_copy_handle_copy_range_result(t->s, copy_range);
> +    ret = block_copy_do_copy(s, t->offset, t->bytes, &method, &error_is_read);
> +    if (s->method == t->method) {
> +        s->method = method;

you leave another t->s occurrences in the function untouched. It's somehow inconsistent. Could we just use t->s in this patch, and refactor with a follow-up patch (or as preparing patch)?

>       }
>       if (ret < 0) {
>           if (!t->call_state->ret) {
> @@ -642,8 +642,7 @@ block_copy_dirty_clusters(BlockCopyCallState *call_state)
>               continue;
>           }
>           if (ret & BDRV_BLOCK_ZERO) {
> -            task->zeroes = true;
> -            task->copy_range = false;
> +            task->method = COPY_WRITE_ZEROES;
>           }
>   
>           if (!call_state->ignore_ratelimit) {
> 



-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions
  2021-06-08  7:33 ` [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions Emanuele Giuseppe Esposito
@ 2021-06-09  9:12   ` Vladimir Sementsov-Ogievskiy
  2021-06-10 10:14     ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 19+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-06-09  9:12 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: John Snow, Kevin Wolf, Max Reitz, Paolo Bonzini, Stefan Hajnoczi,
	qemu-devel

08.06.2021 10:33, Emanuele Giuseppe Esposito wrote:
> As done in BlockCopyCallState, categorize BlockCopyTask
> and BlockCopyState in IN, State and OUT fields.
> This is just to understand which field has to be protected with a lock.
> 
> .sleep_state is handled in the series "coroutine: new sleep/wake API"
> and thus here left as TODO.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>   block/block-copy.c | 47 ++++++++++++++++++++++++++++++----------------
>   1 file changed, 31 insertions(+), 16 deletions(-)
> 
> diff --git a/block/block-copy.c b/block/block-copy.c
> index d58051288b..b3533a3003 100644
> --- a/block/block-copy.c
> +++ b/block/block-copy.c
> @@ -56,25 +56,33 @@ typedef struct BlockCopyCallState {
>       QLIST_ENTRY(BlockCopyCallState) list;
>   
>       /* State */

Why previous @list field is not in the state? For sure it's not an IN parameter and should be protected somehow.

> -    int ret;
>       bool finished;
> -    QemuCoSleep sleep;
> -    bool cancelled;
> +    QemuCoSleep sleep; /* TODO: protect API with a lock */
>   
>       /* OUT parameters */
> +    bool cancelled;
>       bool error_is_read;
> +    int ret;
>   } BlockCopyCallState;
>   
>   typedef struct BlockCopyTask {
>       AioTask task;
>   
> +    /*
> +     * IN parameters. Initialized in block_copy_task_create()
> +     * and never changed.
> +     */
>       BlockCopyState *s;
>       BlockCopyCallState *call_state;
>       int64_t offset;
> -    int64_t bytes;
> -    BlockCopyMethod method;
> -    QLIST_ENTRY(BlockCopyTask) list;
> +    int64_t bytes; /* only re-set in task_shrink, before running the task */
> +    BlockCopyMethod method; /* initialized in block_copy_dirty_clusters() */

hmm. to be precise method is initialized in block_copy_task_create.

And after block_copy_task_create finished, task is in the list and can be read by parallel block_copy_dirty_clusters(). So, @bytes is part of State, we must protect it..

method is only read by block_copy_task_entry(), so can be modified without any protection before running the task.

> +
> +    /* State */
>       CoQueue wait_queue; /* coroutines blocked on this task */
> +
> +    /* To reference all call states from BlockCopyState */

That's a misleading comment.. not all sates but all tasks. I don't think we need this new comment, just keep @list in State section.

> +    QLIST_ENTRY(BlockCopyTask) list;
>   } BlockCopyTask;
>   
>   static int64_t task_end(BlockCopyTask *task)
> @@ -90,15 +98,25 @@ typedef struct BlockCopyState {
>        */
>       BdrvChild *source;
>       BdrvChild *target;
> -    BdrvDirtyBitmap *copy_bitmap;
> +
> +    /* State */
>       int64_t in_flight_bytes;
> -    int64_t cluster_size;
>       BlockCopyMethod method;
> -    int64_t max_transfer;
> -    uint64_t len;
>       QLIST_HEAD(, BlockCopyTask) tasks; /* All tasks from all block-copy calls */
>       QLIST_HEAD(, BlockCopyCallState) calls;
> +    /* State fields that use a thread-safe API */
> +    BdrvDirtyBitmap *copy_bitmap;
> +    ProgressMeter *progress;
> +    SharedResource *mem;
> +    RateLimit rate_limit;
>   
> +    /*
> +     * IN parameters. Initialized in block_copy_state_new()
> +     * and never changed.
> +     */
> +    int64_t cluster_size;
> +    int64_t max_transfer;
> +    uint64_t len;
>       BdrvRequestFlags write_flags;
>   
>       /*
> @@ -114,14 +132,11 @@ typedef struct BlockCopyState {
>        * In this case, block_copy() will query the source’s allocation status,
>        * skip unallocated regions, clear them in the copy_bitmap, and invoke
>        * block_copy_reset_unallocated() every time it does.
> +     *
> +     * This field is set in backup_run() before coroutines are run,
> +     * therefore is an IN.
>        */
>       bool skip_unallocated;
> -
> -    ProgressMeter *progress;
> -
> -    SharedResource *mem;
> -
> -    RateLimit rate_limit;
>   } BlockCopyState;
>   
>   static BlockCopyTask *find_conflicting_task(BlockCopyState *s,
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/5] block-copy: streamline choice of copy_range vs. read/write
  2021-06-09  8:51   ` Vladimir Sementsov-Ogievskiy
@ 2021-06-09  9:33     ` Paolo Bonzini
  2021-06-09 10:09       ` Vladimir Sementsov-Ogievskiy
  2021-06-09 10:54       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 19+ messages in thread
From: Paolo Bonzini @ 2021-06-09  9:33 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Emanuele Giuseppe Esposito, qemu-block
  Cc: Kevin Wolf, John Snow, qemu-devel, Stefan Hajnoczi, Max Reitz

On 09/06/21 10:51, Vladimir Sementsov-Ogievskiy wrote:
>>
>> +    default:
>> [...]
>> +        bounce_buffer = qemu_blockalign(s->source->bs, nbytes);
>> +        ret = bdrv_co_pread(s->source, offset, nbytes, bounce_buffer, 
>> 0);
>> +        if (ret < 0) {
>> +            trace_block_copy_read_fail(s, offset, ret);
>> +            *error_is_read = true;
>> +            goto out;
>> +        }
>> +        ret = bdrv_co_pwrite(s->target, offset, nbytes, bounce_buffer,
>> +                            s->write_flags);
>> +        if (ret < 0) {
>> +            trace_block_copy_write_fail(s, offset, ret);
>> +            *error_is_read = false;
>> +            goto out;
>> +        }
>> +out:
>> +        qemu_vfree(bounce_buffer);
> 
> label inside switch operator? Rather unusual. Please, let's avoid it and 
> just keep out: after switch operator.

Agreed with all comments except this one, the bounce_buffer doesn't 
exist in the other cases.

>> +    ret = block_copy_do_copy(s, t->offset, t->bytes, &method, &error_is_read);
>> +    if (s->method == t->method) {
>> +        s->method = method;
> 
> you leave another t->s occurrences in the function untouched. It's somehow inconsistent. Could we just use t->s in this patch, and refactor with a follow-up patch (or as preparing patch)? 

Maybe as a first patch, yes.

Paolo



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/5] block-copy: streamline choice of copy_range vs. read/write
  2021-06-09  9:33     ` Paolo Bonzini
@ 2021-06-09 10:09       ` Vladimir Sementsov-Ogievskiy
  2021-06-09 10:54       ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 19+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-06-09 10:09 UTC (permalink / raw)
  To: Paolo Bonzini, Emanuele Giuseppe Esposito, qemu-block
  Cc: John Snow, Kevin Wolf, Max Reitz, Stefan Hajnoczi, qemu-devel

09.06.2021 12:33, Paolo Bonzini wrote:
> On 09/06/21 10:51, Vladimir Sementsov-Ogievskiy wrote:
>>>
>>> +    default:
>>> [...]
>>> +        bounce_buffer = qemu_blockalign(s->source->bs, nbytes);
>>> +        ret = bdrv_co_pread(s->source, offset, nbytes, bounce_buffer, 0);
>>> +        if (ret < 0) {
>>> +            trace_block_copy_read_fail(s, offset, ret);
>>> +            *error_is_read = true;
>>> +            goto out;
>>> +        }
>>> +        ret = bdrv_co_pwrite(s->target, offset, nbytes, bounce_buffer,
>>> +                            s->write_flags);
>>> +        if (ret < 0) {
>>> +            trace_block_copy_write_fail(s, offset, ret);
>>> +            *error_is_read = false;
>>> +            goto out;
>>> +        }
>>> +out:
>>> +        qemu_vfree(bounce_buffer);
>>
>> label inside switch operator? Rather unusual. Please, let's avoid it and just keep out: after switch operator.
> 
> Agreed with all comments except this one, the bounce_buffer doesn't exist in the other cases.

Hmm, as well as "return ret" actually is used only for this "default:" case, other paths returns earlier :) Also, bounce_buffer is defined in outer scope anyway. So I don't think that overall picture becomes better from isolation point of view with this change. Maybe good refactoring is moving default branch to a separate helper function together with bounce_buffer local variable.

Still, I don't care too much, keep it as is if you want, that's works for me.

The thing that comes to my mind not the first time: how to make something similar with g_autofree for qemu_blockalign()?

I can imagine how to implement a macro like ERRP_GUARD, which will work like

void *bounce_buffer = qemu_blockalign(...);
QEMU_AUTO_VFREE(bounce_buffer);

...

> 
>>> +    ret = block_copy_do_copy(s, t->offset, t->bytes, &method, &error_is_read);
>>> +    if (s->method == t->method) {
>>> +        s->method = method;
>>
>> you leave another t->s occurrences in the function untouched. It's somehow inconsistent. Could we just use t->s in this patch, and refactor with a follow-up patch (or as preparing patch)? 
> 
> Maybe as a first patch, yes.
> 
> Paolo
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/5] block-copy: streamline choice of copy_range vs. read/write
  2021-06-09  9:33     ` Paolo Bonzini
  2021-06-09 10:09       ` Vladimir Sementsov-Ogievskiy
@ 2021-06-09 10:54       ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 19+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-06-09 10:54 UTC (permalink / raw)
  To: Paolo Bonzini, Emanuele Giuseppe Esposito, qemu-block
  Cc: John Snow, Kevin Wolf, Max Reitz, Stefan Hajnoczi, qemu-devel

09.06.2021 12:33, Paolo Bonzini wrote:
> On 09/06/21 10:51, Vladimir Sementsov-Ogievskiy wrote:
>>>
>>> +    default:
>>> [...]
>>> +        bounce_buffer = qemu_blockalign(s->source->bs, nbytes);
>>> +        ret = bdrv_co_pread(s->source, offset, nbytes, bounce_buffer, 0);
>>> +        if (ret < 0) {
>>> +            trace_block_copy_read_fail(s, offset, ret);
>>> +            *error_is_read = true;
>>> +            goto out;
>>> +        }
>>> +        ret = bdrv_co_pwrite(s->target, offset, nbytes, bounce_buffer,
>>> +                            s->write_flags);
>>> +        if (ret < 0) {
>>> +            trace_block_copy_write_fail(s, offset, ret);
>>> +            *error_is_read = false;
>>> +            goto out;
>>> +        }
>>> +out:
>>> +        qemu_vfree(bounce_buffer);
>>
>> label inside switch operator? Rather unusual. Please, let's avoid it and just keep out: after switch operator.
> 
> Agreed with all comments except this one, the bounce_buffer doesn't exist in the other cases.

If keep it as is, worth making indentation for out: the same as for default: ?

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/5] block-copy: add a CoMutex
  2021-06-08  7:33 ` [PATCH v3 4/5] block-copy: add a CoMutex Emanuele Giuseppe Esposito
@ 2021-06-09 12:25   ` Vladimir Sementsov-Ogievskiy
  2021-06-10 14:49     ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 19+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-06-09 12:25 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: John Snow, Kevin Wolf, Max Reitz, Paolo Bonzini, Stefan Hajnoczi,
	qemu-devel

08.06.2021 10:33, Emanuele Giuseppe Esposito wrote:
> Add a CoMutex to protect concurrent access of block-copy
> data structures.
> 
> This mutex also protects .copy_bitmap, because its thread-safe
> API does not prevent it from assigning two tasks to the same
> bitmap region.
> 
> .finished, .cancelled and reads to .ret and .error_is_read will be
> protected in the following patch, because are used also outside
> coroutines.
> 
> Also set block_copy_task_create as coroutine_fn because:
> 1) it is static and only invoked by coroutine functions
> 2) this patch introduces and uses a CoMutex lock there
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>

I missed, did you (where?) add a comment like "all APIs are thread-safe", or what is thread-safe?

> ---
>   block/block-copy.c | 82 ++++++++++++++++++++++++++++++----------------
>   1 file changed, 54 insertions(+), 28 deletions(-)
> 
> diff --git a/block/block-copy.c b/block/block-copy.c
> index e2adb5b2ea..56f62913e4 100644
> --- a/block/block-copy.c
> +++ b/block/block-copy.c
> @@ -61,6 +61,7 @@ typedef struct BlockCopyCallState {
>   
>       /* OUT parameters */
>       bool cancelled;
> +    /* Fields protected by lock in BlockCopyState */
>       bool error_is_read;
>       int ret;
>   } BlockCopyCallState;
> @@ -78,7 +79,7 @@ typedef struct BlockCopyTask {
>       int64_t bytes; /* only re-set in task_shrink, before running the task */
>       BlockCopyMethod method; /* initialized in block_copy_dirty_clusters() */
>   
> -    /* State */
> +    /* State. Protected by lock in BlockCopyState */
>       CoQueue wait_queue; /* coroutines blocked on this task */
>   
>       /* To reference all call states from BlockCopyState */
> @@ -99,7 +100,8 @@ typedef struct BlockCopyState {
>       BdrvChild *source;
>       BdrvChild *target;
>   
> -    /* State */
> +    /* State. Protected by lock */
> +    CoMutex lock;
>       int64_t in_flight_bytes;
>       BlockCopyMethod method;
>       QLIST_HEAD(, BlockCopyTask) tasks; /* All tasks from all block-copy calls */
> @@ -139,8 +141,10 @@ typedef struct BlockCopyState {
>       bool skip_unallocated;
>   } BlockCopyState;
>   

May be nitpicking, but if we want block_copy_set_progress_meter to be threadsafe it should set s->progress under mutex. Or we should document that it's not threadsafe and called once.


> -static BlockCopyTask *find_conflicting_task(BlockCopyState *s,
> -                                            int64_t offset, int64_t bytes)
> +/* Called with lock held */
> +static BlockCopyTask *find_conflicting_task_locked(BlockCopyState *s,
> +                                                   int64_t offset,
> +                                                   int64_t bytes)
>   {
>       BlockCopyTask *t;
>   
> @@ -160,18 +164,22 @@ static BlockCopyTask *find_conflicting_task(BlockCopyState *s,
>   static bool coroutine_fn block_copy_wait_one(BlockCopyState *s, int64_t offset,
>                                                int64_t bytes)
>   {
> -    BlockCopyTask *task = find_conflicting_task(s, offset, bytes);
> +    BlockCopyTask *task;
> +
> +    QEMU_LOCK_GUARD(&s->lock);
> +    task = find_conflicting_task_locked(s, offset, bytes);
>   
>       if (!task) {
>           return false;
>       }
>   
> -    qemu_co_queue_wait(&task->wait_queue, NULL);
> +    qemu_co_queue_wait(&task->wait_queue, &s->lock);
>   
>       return true;
>   }
>   
> -static int64_t block_copy_chunk_size(BlockCopyState *s)
> +/* Called with lock held */
> +static int64_t block_copy_chunk_size_locked(BlockCopyState *s)
>   {
>       switch (s->method) {
>       case COPY_READ_WRITE_CLUSTER:
> @@ -193,14 +201,16 @@ static int64_t block_copy_chunk_size(BlockCopyState *s)
>    * Search for the first dirty area in offset/bytes range and create task at
>    * the beginning of it.
>    */
> -static BlockCopyTask *block_copy_task_create(BlockCopyState *s,
> -                                             BlockCopyCallState *call_state,
> -                                             int64_t offset, int64_t bytes)
> +static coroutine_fn BlockCopyTask *block_copy_task_create(BlockCopyState *s,
> +                                                BlockCopyCallState *call_state,
> +                                                int64_t offset, int64_t bytes)
>   {
>       BlockCopyTask *task;
> -    int64_t max_chunk = block_copy_chunk_size(s);
> +    int64_t max_chunk;
>   
> -    max_chunk = MIN_NON_ZERO(max_chunk, call_state->max_chunk);
> +    QEMU_LOCK_GUARD(&s->lock);
> +    max_chunk = MIN_NON_ZERO(block_copy_chunk_size_locked(s),
> +                             call_state->max_chunk);
>       if (!bdrv_dirty_bitmap_next_dirty_area(s->copy_bitmap,
>                                              offset, offset + bytes,
>                                              max_chunk, &offset, &bytes))
> @@ -212,7 +222,7 @@ static BlockCopyTask *block_copy_task_create(BlockCopyState *s,
>       bytes = QEMU_ALIGN_UP(bytes, s->cluster_size);
>   
>       /* region is dirty, so no existent tasks possible in it */
> -    assert(!find_conflicting_task(s, offset, bytes));
> +    assert(!find_conflicting_task_locked(s, offset, bytes));
>   
>       bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
>       s->in_flight_bytes += bytes;
> @@ -248,16 +258,19 @@ static void coroutine_fn block_copy_task_shrink(BlockCopyTask *task,
>   

The function reads task->bytes not under mutex.. It's safe, as only that function is modifying the field, and it's called once. Still, let's make critical section a little bit wider, just for simplicity. I mean, simple QEMU_LOCK_GUARD() at start of function.

>       assert(new_bytes > 0 && new_bytes < task->bytes);
>   
> -    task->s->in_flight_bytes -= task->bytes - new_bytes;
> -    bdrv_set_dirty_bitmap(task->s->copy_bitmap,
> -                          task->offset + new_bytes, task->bytes - new_bytes);
> -
> -    task->bytes = new_bytes;
> -    qemu_co_queue_restart_all(&task->wait_queue);
> +    WITH_QEMU_LOCK_GUARD(&task->s->lock) {
> +        task->s->in_flight_bytes -= task->bytes - new_bytes;
> +        bdrv_set_dirty_bitmap(task->s->copy_bitmap,
> +                              task->offset + new_bytes,
> +                              task->bytes - new_bytes);
> +        task->bytes = new_bytes;
> +        qemu_co_queue_restart_all(&task->wait_queue);
> +    }
>   }
>   
>   static void coroutine_fn block_copy_task_end(BlockCopyTask *task, int ret)
>   {
> +    QEMU_LOCK_GUARD(&task->s->lock);
>       task->s->in_flight_bytes -= task->bytes;
>       if (ret < 0) {
>           bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->offset, task->bytes);
> @@ -335,6 +348,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
>       }
>   
>       ratelimit_init(&s->rate_limit);
> +    qemu_co_mutex_init(&s->lock);
>       QLIST_INIT(&s->tasks);
>       QLIST_INIT(&s->calls);
>   
> @@ -390,6 +404,8 @@ static coroutine_fn int block_copy_task_run(AioTaskPool *pool,

Oops. seems block_copy_task_run misses block_copy_task_end() call befokre freeing the task. preexisting bug..

>    * a full-size buffer or disabled if the copy_range attempt fails.  The output
>    * value of @method should be used for subsequent tasks.
>    * Returns 0 on success.
> + *
> + * Called with lock held.
>    */
>   static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
>                                              int64_t offset, int64_t bytes,
> @@ -476,16 +492,20 @@ static coroutine_fn int block_copy_task_entry(AioTask *task)
>       int ret;
>   
>       ret = block_copy_do_copy(s, t->offset, t->bytes, &method, &error_is_read);
> -    if (s->method == t->method) {
> -        s->method = method;
> -    }
> -    if (ret < 0) {
> -        if (!t->call_state->ret) {
> -            t->call_state->ret = ret;
> -            t->call_state->error_is_read = error_is_read;
> +
> +    WITH_QEMU_LOCK_GUARD(&t->s->lock) {
> +        if (s->method == t->method) {
> +            s->method = method;
> +        }
> +
> +        if (ret < 0) {
> +            if (!t->call_state->ret) {
> +                t->call_state->ret = ret;
> +                t->call_state->error_is_read = error_is_read;
> +            }
> +        } else {
> +            progress_work_done(t->s->progress, t->bytes);
>           }
> -    } else {
> -        progress_work_done(t->s->progress, t->bytes);
>       }
>       co_put_to_shres(t->s->mem, t->bytes);
>       block_copy_task_end(t, ret);
> @@ -587,10 +607,12 @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
>       bytes = clusters * s->cluster_size;
>   
>       if (!ret) {
> +        qemu_co_mutex_lock(&s->lock);
>           bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
>           progress_set_remaining(s->progress,
>                                  bdrv_get_dirty_count(s->copy_bitmap) +
>                                  s->in_flight_bytes);
> +        qemu_co_mutex_unlock(&s->lock);
>       }
>   
>       *count = bytes;
> @@ -729,7 +751,9 @@ static int coroutine_fn block_copy_common(BlockCopyCallState *call_state)
>   {
>       int ret;
>   
> +    qemu_co_mutex_lock(&call_state->s->lock);
>       QLIST_INSERT_HEAD(&call_state->s->calls, call_state, list);
> +    qemu_co_mutex_unlock(&call_state->s->lock);
>   
>       do {
>           ret = block_copy_dirty_clusters(call_state);
> @@ -756,7 +780,9 @@ static int coroutine_fn block_copy_common(BlockCopyCallState *call_state)
>           call_state->cb(call_state->cb_opaque);
>       }
>   
> +    qemu_co_mutex_lock(&call_state->s->lock);
>       QLIST_REMOVE(call_state, list);
> +    qemu_co_mutex_unlock(&call_state->s->lock);
>   
>       return ret;
>   }
> 

I looked through the whole file on top of the series, and it seems overall OK to me. I still don't really like additional atomics, but they probably should be refactored together with refactoring all status-getters into one block_copy_call_status().. So it's a work for some future day, I will not do it in parallel :)

I don't insist, but for me patches 2,4,5 only make sense as a whole, so, I'd merge them into one patch called "make block-copy APIs thread-safe". Otherwise, thread-safety comes only in last patch, and patches 2 and 4 are a kind of preparations that hard to review in separate.

Anyway, reviewing of such change is a walk through the whole file trying to understand, how much is it thread-safe now.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions
  2021-06-09  9:12   ` Vladimir Sementsov-Ogievskiy
@ 2021-06-10 10:14     ` Emanuele Giuseppe Esposito
  2021-06-10 10:27       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 19+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-06-10 10:14 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, qemu-devel, Max Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow



On 09/06/2021 11:12, Vladimir Sementsov-Ogievskiy wrote:
> 08.06.2021 10:33, Emanuele Giuseppe Esposito wrote:
>> As done in BlockCopyCallState, categorize BlockCopyTask
>> and BlockCopyState in IN, State and OUT fields.
>> This is just to understand which field has to be protected with a lock.
>>
>> .sleep_state is handled in the series "coroutine: new sleep/wake API"
>> and thus here left as TODO.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>> ---
>>   block/block-copy.c | 47 ++++++++++++++++++++++++++++++----------------
>>   1 file changed, 31 insertions(+), 16 deletions(-)
>>
>> diff --git a/block/block-copy.c b/block/block-copy.c
>> index d58051288b..b3533a3003 100644
>> --- a/block/block-copy.c
>> +++ b/block/block-copy.c
>> @@ -56,25 +56,33 @@ typedef struct BlockCopyCallState {
>>       QLIST_ENTRY(BlockCopyCallState) list;
>>       /* State */
> 
> Why previous @list field is not in the state? For sure it's not an IN 
> parameter and should be protected somehow.
> 
>> -    int ret;
>>       bool finished;
>> -    QemuCoSleep sleep;
>> -    bool cancelled;
>> +    QemuCoSleep sleep; /* TODO: protect API with a lock */
>>       /* OUT parameters */
>> +    bool cancelled;
>>       bool error_is_read;
>> +    int ret;
>>   } BlockCopyCallState;
>>   typedef struct BlockCopyTask {
>>       AioTask task;
>> +    /*
>> +     * IN parameters. Initialized in block_copy_task_create()
>> +     * and never changed.
>> +     */
>>       BlockCopyState *s;
>>       BlockCopyCallState *call_state;
>>       int64_t offset;
>> -    int64_t bytes;
>> -    BlockCopyMethod method;
>> -    QLIST_ENTRY(BlockCopyTask) list;
>> +    int64_t bytes; /* only re-set in task_shrink, before running the 
>> task */
>> +    BlockCopyMethod method; /* initialized in 
>> block_copy_dirty_clusters() */
> 
> hmm. to be precise method is initialized in block_copy_task_create.
> 
> And after block_copy_task_create finished, task is in the list and can 
> be read by parallel block_copy_dirty_clusters(). So, @bytes is part of 
> State, we must protect it..

So if I understand correctly, you refer to the fact that a parallel 
block_copy_dirty_clusters() can create another task and search with 
find_conflicting_task_locked(), or in general also block_copy_wait_one() 
can do the same in parallel, correct?

Here there is also another problem: if we add the task to the list and 
then shrink it in two different critical sections, we are going to have 
problems because in the meanwhile find_conflicting_tasks can be issued 
in parallel.

So, is there a reason why we don't want
QLIST_INSERT_HEAD(&s->tasks, task, list);
in block_copy_dirty_clusters()?

By doing that, I think we also spare @bytes from the critical section, 
since it is only read from that point onwards.

I am also trying to see if I can group some critical sections.

Btw I think we already talked about @bytes and it's not the first time 
we switch it from IN to STATE and vice-versa...
I mean, I agree with you but it starts to be confusing.


This also goes against your comment later in patch 4,
>> @@ -212,7 +222,7 @@ static BlockCopyTask *block_copy_task_create(BlockCopyState *s,
>>       bytes = QEMU_ALIGN_UP(bytes, s->cluster_size);
>>         /* region is dirty, so no existent tasks possible in it */
>> -    assert(!find_conflicting_task(s, offset, bytes));
>> +    assert(!find_conflicting_task_locked(s, offset, bytes));
>>         bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
>>       s->in_flight_bytes += bytes;
>> @@ -248,16 +258,19 @@ static void coroutine_fn block_copy_task_shrink(BlockCopyTask *task,
>>   
> 
> The function reads task->bytes not under mutex.. It's safe, as only that function is modifying the field, and it's called once. Still, let's make critical section a little bit wider, just for simplicity. I mean, simple QEMU_LOCK_GUARD() at start of function. 

Where if I understand correctly, it is not safe, because 
find_conflicting_tasks might search the non-updated task.

Thank you,
Emanuele



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions
  2021-06-10 10:14     ` Emanuele Giuseppe Esposito
@ 2021-06-10 10:27       ` Vladimir Sementsov-Ogievskiy
  2021-06-10 10:46         ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 19+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-06-10 10:27 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: John Snow, Kevin Wolf, Max Reitz, Paolo Bonzini, Stefan Hajnoczi,
	qemu-devel

10.06.2021 13:14, Emanuele Giuseppe Esposito wrote:
> 
> 
> On 09/06/2021 11:12, Vladimir Sementsov-Ogievskiy wrote:
>> 08.06.2021 10:33, Emanuele Giuseppe Esposito wrote:
>>> As done in BlockCopyCallState, categorize BlockCopyTask
>>> and BlockCopyState in IN, State and OUT fields.
>>> This is just to understand which field has to be protected with a lock.
>>>
>>> .sleep_state is handled in the series "coroutine: new sleep/wake API"
>>> and thus here left as TODO.
>>>
>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>> ---
>>>   block/block-copy.c | 47 ++++++++++++++++++++++++++++++----------------
>>>   1 file changed, 31 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/block/block-copy.c b/block/block-copy.c
>>> index d58051288b..b3533a3003 100644
>>> --- a/block/block-copy.c
>>> +++ b/block/block-copy.c
>>> @@ -56,25 +56,33 @@ typedef struct BlockCopyCallState {
>>>       QLIST_ENTRY(BlockCopyCallState) list;
>>>       /* State */
>>
>> Why previous @list field is not in the state? For sure it's not an IN parameter and should be protected somehow.
>>
>>> -    int ret;
>>>       bool finished;
>>> -    QemuCoSleep sleep;
>>> -    bool cancelled;
>>> +    QemuCoSleep sleep; /* TODO: protect API with a lock */
>>>       /* OUT parameters */
>>> +    bool cancelled;
>>>       bool error_is_read;
>>> +    int ret;
>>>   } BlockCopyCallState;
>>>   typedef struct BlockCopyTask {
>>>       AioTask task;
>>> +    /*
>>> +     * IN parameters. Initialized in block_copy_task_create()
>>> +     * and never changed.
>>> +     */
>>>       BlockCopyState *s;
>>>       BlockCopyCallState *call_state;
>>>       int64_t offset;
>>> -    int64_t bytes;
>>> -    BlockCopyMethod method;
>>> -    QLIST_ENTRY(BlockCopyTask) list;
>>> +    int64_t bytes; /* only re-set in task_shrink, before running the task */
>>> +    BlockCopyMethod method; /* initialized in block_copy_dirty_clusters() */
>>
>> hmm. to be precise method is initialized in block_copy_task_create.
>>
>> And after block_copy_task_create finished, task is in the list and can be read by parallel block_copy_dirty_clusters(). So, @bytes is part of State, we must protect it..
> 
> So if I understand correctly, you refer to the fact that a parallel block_copy_dirty_clusters() can create another task and search with find_conflicting_task_locked(), or in general also block_copy_wait_one() can do the same in parallel, correct?

yes

> 
> Here there is also another problem: if we add the task to the list and then shrink it in two different critical sections, we are going to have problems because in the meanwhile find_conflicting_tasks can be issued in parallel.

But we shrink task only once, and we do it under mutex, so we are OK I think?

> 
> So, is there a reason why we don't want
> QLIST_INSERT_HEAD(&s->tasks, task, list);
> in block_copy_dirty_clusters()?
> 
> By doing that, I think we also spare @bytes from the critical section, since it is only read from that point onwards.

This way find_conflicting_tasks will just skip our new creating task.. And we'll get conflict when try to add our new task. No, we should add task to the list at same critical section where we clear dirty bits from the bitmap.

Then we shrink task in another critical section, it should be OK too.

> 
> I am also trying to see if I can group some critical sections.
> 
> Btw I think we already talked about @bytes and it's not the first time we switch it from IN to STATE and vice-versa...
> I mean, I agree with you but it starts to be confusing.

On last review it seemed to me that you actually protect bytes by critical section where it is needed. So here I'm saying only about the comment..

> 
> 
> This also goes against your comment later in patch 4,
>>> @@ -212,7 +222,7 @@ static BlockCopyTask *block_copy_task_create(BlockCopyState *s,
>>>       bytes = QEMU_ALIGN_UP(bytes, s->cluster_size);
>>>         /* region is dirty, so no existent tasks possible in it */
>>> -    assert(!find_conflicting_task(s, offset, bytes));
>>> +    assert(!find_conflicting_task_locked(s, offset, bytes));
>>>         bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
>>>       s->in_flight_bytes += bytes;
>>> @@ -248,16 +258,19 @@ static void coroutine_fn block_copy_task_shrink(BlockCopyTask *task,
>>
>> The function reads task->bytes not under mutex.. It's safe, as only that function is modifying the field, and it's called once. Still, let's make critical section a little bit wider, just for simplicity. I mean, simple QEMU_LOCK_GUARD() at start of function. 
> 
> Where if I understand correctly, it is not safe, because find_conflicting_tasks might search the non-updated task.
> 

find_conflicting_tasks only reads bytes, so it can't make damage.. Anyway making critical sections a bit wider won't hurt.


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions
  2021-06-10 10:27       ` Vladimir Sementsov-Ogievskiy
@ 2021-06-10 10:46         ` Emanuele Giuseppe Esposito
  2021-06-10 11:12           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 19+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-06-10 10:46 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, qemu-devel, Max Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow



On 10/06/2021 12:27, Vladimir Sementsov-Ogievskiy wrote:
> 10.06.2021 13:14, Emanuele Giuseppe Esposito wrote:
>>
>>
>> On 09/06/2021 11:12, Vladimir Sementsov-Ogievskiy wrote:
>>> 08.06.2021 10:33, Emanuele Giuseppe Esposito wrote:
>>>> As done in BlockCopyCallState, categorize BlockCopyTask
>>>> and BlockCopyState in IN, State and OUT fields.
>>>> This is just to understand which field has to be protected with a lock.
>>>>
>>>> .sleep_state is handled in the series "coroutine: new sleep/wake API"
>>>> and thus here left as TODO.
>>>>
>>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>>> ---
>>>>   block/block-copy.c | 47 
>>>> ++++++++++++++++++++++++++++++----------------
>>>>   1 file changed, 31 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/block/block-copy.c b/block/block-copy.c
>>>> index d58051288b..b3533a3003 100644
>>>> --- a/block/block-copy.c
>>>> +++ b/block/block-copy.c
>>>> @@ -56,25 +56,33 @@ typedef struct BlockCopyCallState {
>>>>       QLIST_ENTRY(BlockCopyCallState) list;
>>>>       /* State */
>>>
>>> Why previous @list field is not in the state? For sure it's not an IN 
>>> parameter and should be protected somehow.
>>>
>>>> -    int ret;
>>>>       bool finished;
>>>> -    QemuCoSleep sleep;
>>>> -    bool cancelled;
>>>> +    QemuCoSleep sleep; /* TODO: protect API with a lock */
>>>>       /* OUT parameters */
>>>> +    bool cancelled;
>>>>       bool error_is_read;
>>>> +    int ret;
>>>>   } BlockCopyCallState;
>>>>   typedef struct BlockCopyTask {
>>>>       AioTask task;
>>>> +    /*
>>>> +     * IN parameters. Initialized in block_copy_task_create()
>>>> +     * and never changed.
>>>> +     */
>>>>       BlockCopyState *s;
>>>>       BlockCopyCallState *call_state;
>>>>       int64_t offset;
>>>> -    int64_t bytes;
>>>> -    BlockCopyMethod method;
>>>> -    QLIST_ENTRY(BlockCopyTask) list;
>>>> +    int64_t bytes; /* only re-set in task_shrink, before running 
>>>> the task */
>>>> +    BlockCopyMethod method; /* initialized in 
>>>> block_copy_dirty_clusters() */
>>>
>>> hmm. to be precise method is initialized in block_copy_task_create.
>>>
>>> And after block_copy_task_create finished, task is in the list and 
>>> can be read by parallel block_copy_dirty_clusters(). So, @bytes is 
>>> part of State, we must protect it..
>>
>> So if I understand correctly, you refer to the fact that a parallel 
>> block_copy_dirty_clusters() can create another task and search with 
>> find_conflicting_task_locked(), or in general also 
>> block_copy_wait_one() can do the same in parallel, correct?
> 
> yes
> 
>>
>> Here there is also another problem: if we add the task to the list and 
>> then shrink it in two different critical sections, we are going to 
>> have problems because in the meanwhile find_conflicting_tasks can be 
>> issued in parallel.
> 
> But we shrink task only once, and we do it under mutex, so we are OK I 
> think?

I think you understood, but just in case: I am thinking the case where 
we have:
T1: block_copy_task_create()
T2: find_conflicting_tasks() <-- sees the initial task
T1: task_shrink() <-- bytes are updated, T2 saw the wrong amount of 
bytes. This might or might not have consequences, I am not sure.

But maybe I am overcomplicating.


> 
>>
>> So, is there a reason why we don't want
>> QLIST_INSERT_HEAD(&s->tasks, task, list);
>> in block_copy_dirty_clusters()?
>>
>> By doing that, I think we also spare @bytes from the critical section, 
>> since it is only read from that point onwards.
> 
> This way find_conflicting_tasks will just skip our new creating task.. 
> And we'll get conflict when try to add our new task. No, we should add 
> task to the list at same critical section where we clear dirty bits from 
> the bitmap.


I agree, with the above.
So to me the most correct solution would be to call create and shrink in 
the same lock, but this creates a much wider critical section.

Alternatively, I can leave it as it is and just update the comment.

> 
> Then we shrink task in another critical section, it should be OK too.
> 
>>
>> I am also trying to see if I can group some critical sections.
>>
>> Btw I think we already talked about @bytes and it's not the first time 
>> we switch it from IN to STATE and vice-versa...
>> I mean, I agree with you but it starts to be confusing.
> 
> On last review it seemed to me that you actually protect bytes by 
> critical section where it is needed. So here I'm saying only about the 
> comment..
> 
>>
>>
>> This also goes against your comment later in patch 4,
>>>> @@ -212,7 +222,7 @@ static BlockCopyTask 
>>>> *block_copy_task_create(BlockCopyState *s,
>>>>       bytes = QEMU_ALIGN_UP(bytes, s->cluster_size);
>>>>         /* region is dirty, so no existent tasks possible in it */
>>>> -    assert(!find_conflicting_task(s, offset, bytes));
>>>> +    assert(!find_conflicting_task_locked(s, offset, bytes));
>>>>         bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
>>>>       s->in_flight_bytes += bytes;
>>>> @@ -248,16 +258,19 @@ static void coroutine_fn 
>>>> block_copy_task_shrink(BlockCopyTask *task,
>>>
>>> The function reads task->bytes not under mutex.. It's safe, as only 
>>> that function is modifying the field, and it's called once. Still, 
>>> let's make critical section a little bit wider, just for simplicity. 
>>> I mean, simple QEMU_LOCK_GUARD() at start of function. 
>>
>> Where if I understand correctly, it is not safe, because 
>> find_conflicting_tasks might search the non-updated task.
>>
> 
> find_conflicting_tasks only reads bytes, so it can't make damage.. 
> Anyway making critical sections a bit wider won't hurt.
> 
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions
  2021-06-10 10:46         ` Emanuele Giuseppe Esposito
@ 2021-06-10 11:12           ` Vladimir Sementsov-Ogievskiy
  2021-06-10 14:21             ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 19+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-06-10 11:12 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: John Snow, Kevin Wolf, Max Reitz, Paolo Bonzini, Stefan Hajnoczi,
	qemu-devel

10.06.2021 13:46, Emanuele Giuseppe Esposito wrote:
> 
> 
> On 10/06/2021 12:27, Vladimir Sementsov-Ogievskiy wrote:
>> 10.06.2021 13:14, Emanuele Giuseppe Esposito wrote:
>>>
>>>
>>> On 09/06/2021 11:12, Vladimir Sementsov-Ogievskiy wrote:
>>>> 08.06.2021 10:33, Emanuele Giuseppe Esposito wrote:
>>>>> As done in BlockCopyCallState, categorize BlockCopyTask
>>>>> and BlockCopyState in IN, State and OUT fields.
>>>>> This is just to understand which field has to be protected with a lock.
>>>>>
>>>>> .sleep_state is handled in the series "coroutine: new sleep/wake API"
>>>>> and thus here left as TODO.
>>>>>
>>>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>>>> ---
>>>>>   block/block-copy.c | 47 ++++++++++++++++++++++++++++++----------------
>>>>>   1 file changed, 31 insertions(+), 16 deletions(-)
>>>>>
>>>>> diff --git a/block/block-copy.c b/block/block-copy.c
>>>>> index d58051288b..b3533a3003 100644
>>>>> --- a/block/block-copy.c
>>>>> +++ b/block/block-copy.c
>>>>> @@ -56,25 +56,33 @@ typedef struct BlockCopyCallState {
>>>>>       QLIST_ENTRY(BlockCopyCallState) list;
>>>>>       /* State */
>>>>
>>>> Why previous @list field is not in the state? For sure it's not an IN parameter and should be protected somehow.
>>>>
>>>>> -    int ret;
>>>>>       bool finished;
>>>>> -    QemuCoSleep sleep;
>>>>> -    bool cancelled;
>>>>> +    QemuCoSleep sleep; /* TODO: protect API with a lock */
>>>>>       /* OUT parameters */
>>>>> +    bool cancelled;
>>>>>       bool error_is_read;
>>>>> +    int ret;
>>>>>   } BlockCopyCallState;
>>>>>   typedef struct BlockCopyTask {
>>>>>       AioTask task;
>>>>> +    /*
>>>>> +     * IN parameters. Initialized in block_copy_task_create()
>>>>> +     * and never changed.
>>>>> +     */
>>>>>       BlockCopyState *s;
>>>>>       BlockCopyCallState *call_state;
>>>>>       int64_t offset;
>>>>> -    int64_t bytes;
>>>>> -    BlockCopyMethod method;
>>>>> -    QLIST_ENTRY(BlockCopyTask) list;
>>>>> +    int64_t bytes; /* only re-set in task_shrink, before running the task */
>>>>> +    BlockCopyMethod method; /* initialized in block_copy_dirty_clusters() */
>>>>
>>>> hmm. to be precise method is initialized in block_copy_task_create.
>>>>
>>>> And after block_copy_task_create finished, task is in the list and can be read by parallel block_copy_dirty_clusters(). So, @bytes is part of State, we must protect it..
>>>
>>> So if I understand correctly, you refer to the fact that a parallel block_copy_dirty_clusters() can create another task and search with find_conflicting_task_locked(), or in general also block_copy_wait_one() can do the same in parallel, correct?
>>
>> yes
>>
>>>
>>> Here there is also another problem: if we add the task to the list and then shrink it in two different critical sections, we are going to have problems because in the meanwhile find_conflicting_tasks can be issued in parallel.
>>
>> But we shrink task only once, and we do it under mutex, so we are OK I think?
> 
> I think you understood, but just in case: I am thinking the case where we have:
> T1: block_copy_task_create()
> T2: find_conflicting_tasks() <-- sees the initial task
> T1: task_shrink() <-- bytes are updated, T2 saw the wrong amount of bytes. This might or might not have consequences, I am not sure.
> 
> But maybe I am overcomplicating.
> 

Both shrink and find_ are done under mutex, so they can't intersect. But yes, we should keep in mind that if we do find_ under mutex, and then release mutex, the information get from find_ may become incorrect.

Check callers of find_conflicting_task_locked():

block_copy_wait_one has one critical section.

if no conflicting tasks we are OK.. Are we? Ok, look at the only caller of block_copy_wait_one() - block_copy_common().

assume block_copy_dirty_clusters() returns 0, so there no dirty bits at some moment...

than in parallel thread some task may finish with failure, leaving some new dirty bits.. Then we check that there no conflicting tasks.. And then we go out of the loop, when actually we must retry for these new dirty bits.

So I'm afraid you are right, we are not threadsafe yet in block_copy_common(), as we should check conflicting tasks and dirty bits in same critical section to be consistent.

> 
>>
>>>
>>> So, is there a reason why we don't want
>>> QLIST_INSERT_HEAD(&s->tasks, task, list);
>>> in block_copy_dirty_clusters()?
>>>
>>> By doing that, I think we also spare @bytes from the critical section, since it is only read from that point onwards.
>>
>> This way find_conflicting_tasks will just skip our new creating task.. And we'll get conflict when try to add our new task. No, we should add task to the list at same critical section where we clear dirty bits from the bitmap.
> 
> 
> I agree, with the above.
> So to me the most correct solution would be to call create and shrink in the same lock, but this creates a much wider critical section.
> 
> Alternatively, I can leave it as it is and just update the comment.
> 
>>
>> Then we shrink task in another critical section, it should be OK too.
>>
>>>
>>> I am also trying to see if I can group some critical sections.
>>>
>>> Btw I think we already talked about @bytes and it's not the first time we switch it from IN to STATE and vice-versa...
>>> I mean, I agree with you but it starts to be confusing.
>>
>> On last review it seemed to me that you actually protect bytes by critical section where it is needed. So here I'm saying only about the comment..
>>
>>>
>>>
>>> This also goes against your comment later in patch 4,
>>>>> @@ -212,7 +222,7 @@ static BlockCopyTask *block_copy_task_create(BlockCopyState *s,
>>>>>       bytes = QEMU_ALIGN_UP(bytes, s->cluster_size);
>>>>>         /* region is dirty, so no existent tasks possible in it */
>>>>> -    assert(!find_conflicting_task(s, offset, bytes));
>>>>> +    assert(!find_conflicting_task_locked(s, offset, bytes));
>>>>>         bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
>>>>>       s->in_flight_bytes += bytes;
>>>>> @@ -248,16 +258,19 @@ static void coroutine_fn block_copy_task_shrink(BlockCopyTask *task,
>>>>
>>>> The function reads task->bytes not under mutex.. It's safe, as only that function is modifying the field, and it's called once. Still, let's make critical section a little bit wider, just for simplicity. I mean, simple QEMU_LOCK_GUARD() at start of function. 
>>>
>>> Where if I understand correctly, it is not safe, because find_conflicting_tasks might search the non-updated task.
>>>
>>
>> find_conflicting_tasks only reads bytes, so it can't make damage.. Anyway making critical sections a bit wider won't hurt.
>>
>>
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions
  2021-06-10 11:12           ` Vladimir Sementsov-Ogievskiy
@ 2021-06-10 14:21             ` Emanuele Giuseppe Esposito
  2021-06-10 15:05               ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 19+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-06-10 14:21 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, qemu-devel, Max Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow



On 10/06/2021 13:12, Vladimir Sementsov-Ogievskiy wrote:
> 10.06.2021 13:46, Emanuele Giuseppe Esposito wrote:
>>
>>
>> On 10/06/2021 12:27, Vladimir Sementsov-Ogievskiy wrote:
>>> 10.06.2021 13:14, Emanuele Giuseppe Esposito wrote:
>>>>
>>>>
>>>> On 09/06/2021 11:12, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 08.06.2021 10:33, Emanuele Giuseppe Esposito wrote:
>>>>>> As done in BlockCopyCallState, categorize BlockCopyTask
>>>>>> and BlockCopyState in IN, State and OUT fields.
>>>>>> This is just to understand which field has to be protected with a 
>>>>>> lock.
>>>>>>
>>>>>> .sleep_state is handled in the series "coroutine: new sleep/wake API"
>>>>>> and thus here left as TODO.
>>>>>>
>>>>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>>>>> ---
>>>>>>   block/block-copy.c | 47 
>>>>>> ++++++++++++++++++++++++++++++----------------
>>>>>>   1 file changed, 31 insertions(+), 16 deletions(-)
>>>>>>
>>>>>> diff --git a/block/block-copy.c b/block/block-copy.c
>>>>>> index d58051288b..b3533a3003 100644
>>>>>> --- a/block/block-copy.c
>>>>>> +++ b/block/block-copy.c
>>>>>> @@ -56,25 +56,33 @@ typedef struct BlockCopyCallState {
>>>>>>       QLIST_ENTRY(BlockCopyCallState) list;
>>>>>>       /* State */
>>>>>
>>>>> Why previous @list field is not in the state? For sure it's not an 
>>>>> IN parameter and should be protected somehow.
>>>>>
>>>>>> -    int ret;
>>>>>>       bool finished;
>>>>>> -    QemuCoSleep sleep;
>>>>>> -    bool cancelled;
>>>>>> +    QemuCoSleep sleep; /* TODO: protect API with a lock */
>>>>>>       /* OUT parameters */
>>>>>> +    bool cancelled;
>>>>>>       bool error_is_read;
>>>>>> +    int ret;
>>>>>>   } BlockCopyCallState;
>>>>>>   typedef struct BlockCopyTask {
>>>>>>       AioTask task;
>>>>>> +    /*
>>>>>> +     * IN parameters. Initialized in block_copy_task_create()
>>>>>> +     * and never changed.
>>>>>> +     */
>>>>>>       BlockCopyState *s;
>>>>>>       BlockCopyCallState *call_state;
>>>>>>       int64_t offset;
>>>>>> -    int64_t bytes;
>>>>>> -    BlockCopyMethod method;
>>>>>> -    QLIST_ENTRY(BlockCopyTask) list;
>>>>>> +    int64_t bytes; /* only re-set in task_shrink, before running 
>>>>>> the task */
>>>>>> +    BlockCopyMethod method; /* initialized in 
>>>>>> block_copy_dirty_clusters() */
>>>>>
>>>>> hmm. to be precise method is initialized in block_copy_task_create.
>>>>>
>>>>> And after block_copy_task_create finished, task is in the list and 
>>>>> can be read by parallel block_copy_dirty_clusters(). So, @bytes is 
>>>>> part of State, we must protect it..
>>>>
>>>> So if I understand correctly, you refer to the fact that a parallel 
>>>> block_copy_dirty_clusters() can create another task and search with 
>>>> find_conflicting_task_locked(), or in general also 
>>>> block_copy_wait_one() can do the same in parallel, correct?
>>>
>>> yes
>>>
>>>>
>>>> Here there is also another problem: if we add the task to the list 
>>>> and then shrink it in two different critical sections, we are going 
>>>> to have problems because in the meanwhile find_conflicting_tasks can 
>>>> be issued in parallel.
>>>
>>> But we shrink task only once, and we do it under mutex, so we are OK 
>>> I think?
>>
>> I think you understood, but just in case: I am thinking the case where 
>> we have:

>>
>> But maybe I am overcomplicating.
>>
> 
> Both shrink and find_ are done under mutex, so they can't intersect. But 
> yes, we should keep in mind that if we do find_ under mutex, and then 
> release mutex, the information get from find_ may become incorrect.
> 
> Check callers of find_conflicting_task_locked():
> 
> block_copy_wait_one has one critical section.
> 
> if no conflicting tasks we are OK.. Are we? Ok, look at the only caller 
> of block_copy_wait_one() - block_copy_common().
> 
> assume block_copy_dirty_clusters() returns 0, so there no dirty bits at 
> some moment...
> 
> than in parallel thread some task may finish with failure, leaving some 
> new dirty bits.. Then we check that there no conflicting tasks.. And 
> then we go out of the loop, when actually we must retry for these new 
> dirty bits.
> 
> So I'm afraid you are right, we are not threadsafe yet in 
> block_copy_common(), as we should check conflicting tasks and dirty bits 
> in same critical section to be consistent.

Wait, we are talking about two different problems:

- What I wanted to point out has to do with @bytes, not (as far as I 
understand) with the dirty bits. From the example I made below, I assume 
there are 3 separate non-overlapping critical sections:

>>> T1: block_copy_task_create()
>>> T2: find_conflicting_tasks() <-- sees the initial task
>>> T1: task_shrink() <-- bytes are updated, T2 saw the wrong amount of 
>>> bytes. This might or might not have consequences, I am not sure.

T1 creates the task, T2 iterates to search for conflicting tasks (called 
from a parallel block_copy_wait_one), T1 shrinks the current task. I 
think that T2 in this case misses the updated task, even though the 
worst it can happen is that the task is smaller, so a false positive (a 
task is not conflicting but might be marked as conflicting).
The outcome is that T2 is waiting for a task it shouldn't, but there is 
no error there.

- Your point is about a task failing between block_copy_dirty_clusters 
and block_copy_wait_one. The task failing calls block_copy_task_end and 
sets the dirty bitmap, but at that point block_copy_wait_one won't check 
it anymore and the bitmap is left dirty. I think the default behavior 
here should be that block_copy_dirty_clusters() is called and a new task 
is created. This, as you pointed out, is a proper error.

In this case, we need to stop iterating only when 1) the whole bitmap is 
clear, and 2) no conflicting task is present.

Therefore a possible solution can be the one below:

int stop_looping = 0;

...

do {

     // create all the tasks, clears the bitmap but
     // adds tasks to the task list
     block_copy_dirty_clusters();

     /* here a task can fail, but then the dirty map will be set */

     lock();
     // make sure no task is running for this operation
     stop_looping = (find_conflicting_task() == NULL);
     // make sure that the dirty bitmap is clear
     stop_looping |= (!bdrv_dirty_bitmap_next_dirty_area()) << 1;
     unlock();

     /* if stop_looping is == 0, no task can fail */

     /* if a task fails here, the if below won't see it but it will
      * block_copy_dirty_clusters in the next iteration */

     if (stop_looping & 1) {
	// there is some conflicting task, wait for it
         qemu_co_queue_wait(&task->wait_queue);
     }

} while(stop_looping != 0);

...

What do you think?

Emanuele

> 
>>
>>>
>>>>
>>>> So, is there a reason why we don't want
>>>> QLIST_INSERT_HEAD(&s->tasks, task, list);
>>>> in block_copy_dirty_clusters()?
>>>>
>>>> By doing that, I think we also spare @bytes from the critical 
>>>> section, since it is only read from that point onwards.
>>>
>>> This way find_conflicting_tasks will just skip our new creating 
>>> task.. And we'll get conflict when try to add our new task. No, we 
>>> should add task to the list at same critical section where we clear 
>>> dirty bits from the bitmap.
>>
>>
>> I agree, with the above.
>> So to me the most correct solution would be to call create and shrink 
>> in the same lock, but this creates a much wider critical section.
>>
>> Alternatively, I can leave it as it is and just update the comment.
>>
>>>
>>> Then we shrink task in another critical section, it should be OK too.
>>>
>>>>
>>>> I am also trying to see if I can group some critical sections.
>>>>
>>>> Btw I think we already talked about @bytes and it's not the first 
>>>> time we switch it from IN to STATE and vice-versa...
>>>> I mean, I agree with you but it starts to be confusing.
>>>
>>> On last review it seemed to me that you actually protect bytes by 
>>> critical section where it is needed. So here I'm saying only about 
>>> the comment..
>>>
>>>>
>>>>
>>>> This also goes against your comment later in patch 4,
>>>>>> @@ -212,7 +222,7 @@ static BlockCopyTask 
>>>>>> *block_copy_task_create(BlockCopyState *s,
>>>>>>       bytes = QEMU_ALIGN_UP(bytes, s->cluster_size);
>>>>>>         /* region is dirty, so no existent tasks possible in it */
>>>>>> -    assert(!find_conflicting_task(s, offset, bytes));
>>>>>> +    assert(!find_conflicting_task_locked(s, offset, bytes));
>>>>>>         bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
>>>>>>       s->in_flight_bytes += bytes;
>>>>>> @@ -248,16 +258,19 @@ static void coroutine_fn 
>>>>>> block_copy_task_shrink(BlockCopyTask *task,
>>>>>
>>>>> The function reads task->bytes not under mutex.. It's safe, as only 
>>>>> that function is modifying the field, and it's called once. Still, 
>>>>> let's make critical section a little bit wider, just for 
>>>>> simplicity. I mean, simple QEMU_LOCK_GUARD() at start of function. 
>>>>
>>>> Where if I understand correctly, it is not safe, because 
>>>> find_conflicting_tasks might search the non-updated task.
>>>>
>>>
>>> find_conflicting_tasks only reads bytes, so it can't make damage.. 
>>> Anyway making critical sections a bit wider won't hurt.
>>>
>>>
>>
> 
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/5] block-copy: add a CoMutex
  2021-06-09 12:25   ` Vladimir Sementsov-Ogievskiy
@ 2021-06-10 14:49     ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 19+ messages in thread
From: Emanuele Giuseppe Esposito @ 2021-06-10 14:49 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, qemu-devel, Max Reitz, Stefan Hajnoczi,
	Paolo Bonzini, John Snow



On 09/06/2021 14:25, Vladimir Sementsov-Ogievskiy wrote:
> 08.06.2021 10:33, Emanuele Giuseppe Esposito wrote:
>> Add a CoMutex to protect concurrent access of block-copy
>> data structures.
>>
>> This mutex also protects .copy_bitmap, because its thread-safe
>> API does not prevent it from assigning two tasks to the same
>> bitmap region.
>>
>> .finished, .cancelled and reads to .ret and .error_is_read will be
>> protected in the following patch, because are used also outside
>> coroutines.
>>
>> Also set block_copy_task_create as coroutine_fn because:
>> 1) it is static and only invoked by coroutine functions
>> 2) this patch introduces and uses a CoMutex lock there
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> 
> I missed, did you (where?) add a comment like "all APIs are 
> thread-safe", or what is thread-safe?

You're right, I can't find that comment too. I will add it once more.
> 
>> ---
>>   block/block-copy.c | 82 ++++++++++++++++++++++++++++++----------------
>>   1 file changed, 54 insertions(+), 28 deletions(-)
>>
>> diff --git a/block/block-copy.c b/block/block-copy.c
>> index e2adb5b2ea..56f62913e4 100644
>> --- a/block/block-copy.c
>> +++ b/block/block-copy.c
>> @@ -61,6 +61,7 @@ typedef struct BlockCopyCallState {
>>       /* OUT parameters */
>>       bool cancelled;
>> +    /* Fields protected by lock in BlockCopyState */
>>       bool error_is_read;
>>       int ret;
>>   } BlockCopyCallState;
>> @@ -78,7 +79,7 @@ typedef struct BlockCopyTask {
>>       int64_t bytes; /* only re-set in task_shrink, before running the 
>> task */
>>       BlockCopyMethod method; /* initialized in 
>> block_copy_dirty_clusters() */
>> -    /* State */
>> +    /* State. Protected by lock in BlockCopyState */
>>       CoQueue wait_queue; /* coroutines blocked on this task */
>>       /* To reference all call states from BlockCopyState */
>> @@ -99,7 +100,8 @@ typedef struct BlockCopyState {
>>       BdrvChild *source;
>>       BdrvChild *target;
>> -    /* State */
>> +    /* State. Protected by lock */
>> +    CoMutex lock;
>>       int64_t in_flight_bytes;
>>       BlockCopyMethod method;
>>       QLIST_HEAD(, BlockCopyTask) tasks; /* All tasks from all 
>> block-copy calls */
>> @@ -139,8 +141,10 @@ typedef struct BlockCopyState {
>>       bool skip_unallocated;
>>   } BlockCopyState;
> 
> May be nitpicking, but if we want block_copy_set_progress_meter to be 
> threadsafe it should set s->progress under mutex. Or we should document 
> that it's not threadsafe and called once.

Document it definitely. It is only called by the job before starting 
block-copy, so it is safe to do as it is.

> 
> 
>> -static BlockCopyTask *find_conflicting_task(BlockCopyState *s,
>> -                                            int64_t offset, int64_t 
>> bytes)
>> +/* Called with lock held */
>> +static BlockCopyTask *find_conflicting_task_locked(BlockCopyState *s,
>> +                                                   int64_t offset,
>> +                                                   int64_t bytes)
>>   {
>>       BlockCopyTask *t;
>> @@ -160,18 +164,22 @@ static BlockCopyTask 
>> *find_conflicting_task(BlockCopyState *s,
>>   static bool coroutine_fn block_copy_wait_one(BlockCopyState *s, 
>> int64_t offset,
>>                                                int64_t bytes)
>>   {
>> -    BlockCopyTask *task = find_conflicting_task(s, offset, bytes);
>> +    BlockCopyTask *task;
>> +
>> +    QEMU_LOCK_GUARD(&s->lock);
>> +    task = find_conflicting_task_locked(s, offset, bytes);
>>       if (!task) {
>>           return false;
>>       }
>> -    qemu_co_queue_wait(&task->wait_queue, NULL);
>> +    qemu_co_queue_wait(&task->wait_queue, &s->lock);
>>       return true;
>>   }
>> -static int64_t block_copy_chunk_size(BlockCopyState *s)
>> +/* Called with lock held */
>> +static int64_t block_copy_chunk_size_locked(BlockCopyState *s)
>>   {
>>       switch (s->method) {
>>       case COPY_READ_WRITE_CLUSTER:
>> @@ -193,14 +201,16 @@ static int64_t 
>> block_copy_chunk_size(BlockCopyState *s)
>>    * Search for the first dirty area in offset/bytes range and create 
>> task at
>>    * the beginning of it.
>>    */
>> -static BlockCopyTask *block_copy_task_create(BlockCopyState *s,
>> -                                             BlockCopyCallState 
>> *call_state,
>> -                                             int64_t offset, int64_t 
>> bytes)
>> +static coroutine_fn BlockCopyTask 
>> *block_copy_task_create(BlockCopyState *s,
>> +                                                BlockCopyCallState 
>> *call_state,
>> +                                                int64_t offset, 
>> int64_t bytes)
>>   {
>>       BlockCopyTask *task;
>> -    int64_t max_chunk = block_copy_chunk_size(s);
>> +    int64_t max_chunk;
>> -    max_chunk = MIN_NON_ZERO(max_chunk, call_state->max_chunk);
>> +    QEMU_LOCK_GUARD(&s->lock);
>> +    max_chunk = MIN_NON_ZERO(block_copy_chunk_size_locked(s),
>> +                             call_state->max_chunk);
>>       if (!bdrv_dirty_bitmap_next_dirty_area(s->copy_bitmap,
>>                                              offset, offset + bytes,
>>                                              max_chunk, &offset, &bytes))
>> @@ -212,7 +222,7 @@ static BlockCopyTask 
>> *block_copy_task_create(BlockCopyState *s,
>>       bytes = QEMU_ALIGN_UP(bytes, s->cluster_size);
>>       /* region is dirty, so no existent tasks possible in it */
>> -    assert(!find_conflicting_task(s, offset, bytes));
>> +    assert(!find_conflicting_task_locked(s, offset, bytes));
>>       bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
>>       s->in_flight_bytes += bytes;
>> @@ -248,16 +258,19 @@ static void coroutine_fn 
>> block_copy_task_shrink(BlockCopyTask *task,
> 
> The function reads task->bytes not under mutex.. It's safe, as only that 
> function is modifying the field, and it's called once. Still, let's make 
> critical section a little bit wider, just for simplicity. I mean, simple 
> QEMU_LOCK_GUARD() at start of function.

Done.
> 
>>       assert(new_bytes > 0 && new_bytes < task->bytes);
>> -    task->s->in_flight_bytes -= task->bytes - new_bytes;
>> -    bdrv_set_dirty_bitmap(task->s->copy_bitmap,
>> -                          task->offset + new_bytes, task->bytes - 
>> new_bytes);
>> -
>> -    task->bytes = new_bytes;
>> -    qemu_co_queue_restart_all(&task->wait_queue);
>> +    WITH_QEMU_LOCK_GUARD(&task->s->lock) {
>> +        task->s->in_flight_bytes -= task->bytes - new_bytes;
>> +        bdrv_set_dirty_bitmap(task->s->copy_bitmap,
>> +                              task->offset + new_bytes,
>> +                              task->bytes - new_bytes);
>> +        task->bytes = new_bytes;
>> +        qemu_co_queue_restart_all(&task->wait_queue);
>> +    }
>>   }
>>   static void coroutine_fn block_copy_task_end(BlockCopyTask *task, 
>> int ret)
>>   {
>> +    QEMU_LOCK_GUARD(&task->s->lock);
>>       task->s->in_flight_bytes -= task->bytes;
>>       if (ret < 0) {
>>           bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->offset, 
>> task->bytes);
>> @@ -335,6 +348,7 @@ BlockCopyState *block_copy_state_new(BdrvChild 
>> *source, BdrvChild *target,
>>       }
>>       ratelimit_init(&s->rate_limit);
>> +    qemu_co_mutex_init(&s->lock);
>>       QLIST_INIT(&s->tasks);
>>       QLIST_INIT(&s->calls);
>> @@ -390,6 +404,8 @@ static coroutine_fn int 
>> block_copy_task_run(AioTaskPool *pool,
> 
> Oops. seems block_copy_task_run misses block_copy_task_end() call 
> befokre freeing the task. preexisting bug..

Nope, it is not a bug.  .func() internally calls block_copy_task_end(). 
All good here.

> 
>>    * a full-size buffer or disabled if the copy_range attempt fails.  
>> The output
>>    * value of @method should be used for subsequent tasks.
>>    * Returns 0 on success.
>> + *
>> + * Called with lock held.
>>    */
>>   static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
>>                                              int64_t offset, int64_t 
>> bytes,
>> @@ -476,16 +492,20 @@ static coroutine_fn int 
>> block_copy_task_entry(AioTask *task)
>>       int ret;
>>       ret = block_copy_do_copy(s, t->offset, t->bytes, &method, 
>> &error_is_read);
>> -    if (s->method == t->method) {
>> -        s->method = method;
>> -    }
>> -    if (ret < 0) {
>> -        if (!t->call_state->ret) {
>> -            t->call_state->ret = ret;
>> -            t->call_state->error_is_read = error_is_read;
>> +
>> +    WITH_QEMU_LOCK_GUARD(&t->s->lock) {
>> +        if (s->method == t->method) {
>> +            s->method = method;
>> +        }
>> +
>> +        if (ret < 0) {
>> +            if (!t->call_state->ret) {
>> +                t->call_state->ret = ret;
>> +                t->call_state->error_is_read = error_is_read;
>> +            }
>> +        } else {
>> +            progress_work_done(t->s->progress, t->bytes);
>>           }
>> -    } else {
>> -        progress_work_done(t->s->progress, t->bytes);
>>       }
>>       co_put_to_shres(t->s->mem, t->bytes);
>>       block_copy_task_end(t, ret);
>> @@ -587,10 +607,12 @@ int64_t 
>> block_copy_reset_unallocated(BlockCopyState *s,
>>       bytes = clusters * s->cluster_size;
>>       if (!ret) {
>> +        qemu_co_mutex_lock(&s->lock);
>>           bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
>>           progress_set_remaining(s->progress,
>>                                  bdrv_get_dirty_count(s->copy_bitmap) +
>>                                  s->in_flight_bytes);
>> +        qemu_co_mutex_unlock(&s->lock);
>>       }
>>       *count = bytes;
>> @@ -729,7 +751,9 @@ static int coroutine_fn 
>> block_copy_common(BlockCopyCallState *call_state)
>>   {
>>       int ret;
>> +    qemu_co_mutex_lock(&call_state->s->lock);
>>       QLIST_INSERT_HEAD(&call_state->s->calls, call_state, list);
>> +    qemu_co_mutex_unlock(&call_state->s->lock);
>>       do {
>>           ret = block_copy_dirty_clusters(call_state);
>> @@ -756,7 +780,9 @@ static int coroutine_fn 
>> block_copy_common(BlockCopyCallState *call_state)
>>           call_state->cb(call_state->cb_opaque);
>>       }
>> +    qemu_co_mutex_lock(&call_state->s->lock);
>>       QLIST_REMOVE(call_state, list);
>> +    qemu_co_mutex_unlock(&call_state->s->lock);
>>       return ret;
>>   }
>>
> 
> I looked through the whole file on top of the series, and it seems 
> overall OK to me. I still don't really like additional atomics, but they 
> probably should be refactored together with refactoring all 
> status-getters into one block_copy_call_status().. So it's a work for 
> some future day, I will not do it in parallel :)
> 
> I don't insist, but for me patches 2,4,5 only make sense as a whole, so, 
> I'd merge them into one patch called "make block-copy APIs thread-safe". 
> Otherwise, thread-safety comes only in last patch, and patches 2 and 4 
> are a kind of preparations that hard to review in separate.

I will try to see how they look. I think that separate do not look too 
bad, putting everything together might be very difficult to understand. 
And this stuff is already complicated on its own...

Thank you,
Emanuele
> 
> Anyway, reviewing of such change is a walk through the whole file trying 
> to understand, how much is it thread-safe now.
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions
  2021-06-10 14:21             ` Emanuele Giuseppe Esposito
@ 2021-06-10 15:05               ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 19+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-06-10 15:05 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, qemu-block
  Cc: John Snow, Kevin Wolf, Max Reitz, Paolo Bonzini, Stefan Hajnoczi,
	qemu-devel

10.06.2021 17:21, Emanuele Giuseppe Esposito wrote:
> 
> 
> On 10/06/2021 13:12, Vladimir Sementsov-Ogievskiy wrote:
>> 10.06.2021 13:46, Emanuele Giuseppe Esposito wrote:
>>>
>>>
>>> On 10/06/2021 12:27, Vladimir Sementsov-Ogievskiy wrote:
>>>> 10.06.2021 13:14, Emanuele Giuseppe Esposito wrote:
>>>>>
>>>>>
>>>>> On 09/06/2021 11:12, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> 08.06.2021 10:33, Emanuele Giuseppe Esposito wrote:
>>>>>>> As done in BlockCopyCallState, categorize BlockCopyTask
>>>>>>> and BlockCopyState in IN, State and OUT fields.
>>>>>>> This is just to understand which field has to be protected with a lock.
>>>>>>>
>>>>>>> .sleep_state is handled in the series "coroutine: new sleep/wake API"
>>>>>>> and thus here left as TODO.
>>>>>>>
>>>>>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>>>>>> ---
>>>>>>>   block/block-copy.c | 47 ++++++++++++++++++++++++++++++----------------
>>>>>>>   1 file changed, 31 insertions(+), 16 deletions(-)
>>>>>>>
>>>>>>> diff --git a/block/block-copy.c b/block/block-copy.c
>>>>>>> index d58051288b..b3533a3003 100644
>>>>>>> --- a/block/block-copy.c
>>>>>>> +++ b/block/block-copy.c
>>>>>>> @@ -56,25 +56,33 @@ typedef struct BlockCopyCallState {
>>>>>>>       QLIST_ENTRY(BlockCopyCallState) list;
>>>>>>>       /* State */
>>>>>>
>>>>>> Why previous @list field is not in the state? For sure it's not an IN parameter and should be protected somehow.
>>>>>>
>>>>>>> -    int ret;
>>>>>>>       bool finished;
>>>>>>> -    QemuCoSleep sleep;
>>>>>>> -    bool cancelled;
>>>>>>> +    QemuCoSleep sleep; /* TODO: protect API with a lock */
>>>>>>>       /* OUT parameters */
>>>>>>> +    bool cancelled;
>>>>>>>       bool error_is_read;
>>>>>>> +    int ret;
>>>>>>>   } BlockCopyCallState;
>>>>>>>   typedef struct BlockCopyTask {
>>>>>>>       AioTask task;
>>>>>>> +    /*
>>>>>>> +     * IN parameters. Initialized in block_copy_task_create()
>>>>>>> +     * and never changed.
>>>>>>> +     */
>>>>>>>       BlockCopyState *s;
>>>>>>>       BlockCopyCallState *call_state;
>>>>>>>       int64_t offset;
>>>>>>> -    int64_t bytes;
>>>>>>> -    BlockCopyMethod method;
>>>>>>> -    QLIST_ENTRY(BlockCopyTask) list;
>>>>>>> +    int64_t bytes; /* only re-set in task_shrink, before running the task */
>>>>>>> +    BlockCopyMethod method; /* initialized in block_copy_dirty_clusters() */
>>>>>>
>>>>>> hmm. to be precise method is initialized in block_copy_task_create.
>>>>>>
>>>>>> And after block_copy_task_create finished, task is in the list and can be read by parallel block_copy_dirty_clusters(). So, @bytes is part of State, we must protect it..
>>>>>
>>>>> So if I understand correctly, you refer to the fact that a parallel block_copy_dirty_clusters() can create another task and search with find_conflicting_task_locked(), or in general also block_copy_wait_one() can do the same in parallel, correct?
>>>>
>>>> yes
>>>>
>>>>>
>>>>> Here there is also another problem: if we add the task to the list and then shrink it in two different critical sections, we are going to have problems because in the meanwhile find_conflicting_tasks can be issued in parallel.
>>>>
>>>> But we shrink task only once, and we do it under mutex, so we are OK I think?
>>>
>>> I think you understood, but just in case: I am thinking the case where we have:
> 
>>>
>>> But maybe I am overcomplicating.
>>>
>>
>> Both shrink and find_ are done under mutex, so they can't intersect. But yes, we should keep in mind that if we do find_ under mutex, and then release mutex, the information get from find_ may become incorrect.
>>
>> Check callers of find_conflicting_task_locked():
>>
>> block_copy_wait_one has one critical section.
>>
>> if no conflicting tasks we are OK.. Are we? Ok, look at the only caller of block_copy_wait_one() - block_copy_common().
>>
>> assume block_copy_dirty_clusters() returns 0, so there no dirty bits at some moment...
>>
>> than in parallel thread some task may finish with failure, leaving some new dirty bits.. Then we check that there no conflicting tasks.. And then we go out of the loop, when actually we must retry for these new dirty bits.
>>
>> So I'm afraid you are right, we are not threadsafe yet in block_copy_common(), as we should check conflicting tasks and dirty bits in same critical section to be consistent.
> 
> Wait, we are talking about two different problems:
> 
> - What I wanted to point out has to do with @bytes, not (as far as I understand) with the dirty bits. From the example I made below, I assume there are 3 separate non-overlapping critical sections:
> 
>>>> T1: block_copy_task_create()
>>>> T2: find_conflicting_tasks() <-- sees the initial task
>>>> T1: task_shrink() <-- bytes are updated, T2 saw the wrong amount of bytes. This might or might not have consequences, I am not sure.
> 
> T1 creates the task, T2 iterates to search for conflicting tasks (called from a parallel block_copy_wait_one), T1 shrinks the current task. I think that T2 in this case misses the updated task, even though the worst it can happen is that the task is smaller, so a false positive (a task is not conflicting but might be marked as conflicting).
> The outcome is that T2 is waiting for a task it shouldn't, but there is no error there.

That is handled in shrink: we do wake up all waiting tasks in shrink()

> 
> - Your point is about a task failing between block_copy_dirty_clusters and block_copy_wait_one. The task failing calls block_copy_task_end and sets the dirty bitmap, but at that point block_copy_wait_one won't check it anymore and the bitmap is left dirty. I think the default behavior here should be that block_copy_dirty_clusters() is called and a new task is created. This, as you pointed out, is a proper error.
> 
> In this case, we need to stop iterating only when 1) the whole bitmap is clear, and 2) no conflicting task is present.
> 
> Therefore a possible solution can be the one below:
> 
> int stop_looping = 0;
> 
> ...
> 
> do {
> 
>      // create all the tasks, clears the bitmap but
>      // adds tasks to the task list
>      block_copy_dirty_clusters();
> 
>      /* here a task can fail, but then the dirty map will be set */
> 
>      lock();
>      // make sure no task is running for this operation
>      stop_looping = (find_conflicting_task() == NULL);
>      // make sure that the dirty bitmap is clear
>      stop_looping |= (!bdrv_dirty_bitmap_next_dirty_area()) << 1;

bdrv_dirty_bitmap_next_dirty would be more efficient to check, is there any dirty bits in the range.

>      unlock();
> 
>      /* if stop_looping is == 0, no task can fail */
> 
>      /* if a task fails here, the if below won't see it but it will
>       * block_copy_dirty_clusters in the next iteration */
> 
>      if (stop_looping & 1) {
>      // there is some conflicting task, wait for it
>          qemu_co_queue_wait(&task->wait_queue);

we should do it under lock like it done in block_copy_wait_one (and qemu_co_queue_wait() release lock by itself)

>      }
> 
> } while(stop_looping != 0);
> 
> ...
> 
> What do you think?

Yes something like this. But please, let's use boolean variables instead of bit arithmetic )

The reason why it is written another way curretly is that I wanted to save extra call to bdrv_dirty_bitmap_next_dirty_area(). But now we should search for conflicting tasks and for dirty bits in same critical section, otherwise it doesn't make sense.


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-06-10 15:06 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-08  7:33 [PATCH v3 0/5] block-copy: protect block-copy internal structures Emanuele Giuseppe Esposito
2021-06-08  7:33 ` [PATCH v3 1/5] block-copy: streamline choice of copy_range vs. read/write Emanuele Giuseppe Esposito
2021-06-09  8:51   ` Vladimir Sementsov-Ogievskiy
2021-06-09  9:33     ` Paolo Bonzini
2021-06-09 10:09       ` Vladimir Sementsov-Ogievskiy
2021-06-09 10:54       ` Vladimir Sementsov-Ogievskiy
2021-06-08  7:33 ` [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions Emanuele Giuseppe Esposito
2021-06-09  9:12   ` Vladimir Sementsov-Ogievskiy
2021-06-10 10:14     ` Emanuele Giuseppe Esposito
2021-06-10 10:27       ` Vladimir Sementsov-Ogievskiy
2021-06-10 10:46         ` Emanuele Giuseppe Esposito
2021-06-10 11:12           ` Vladimir Sementsov-Ogievskiy
2021-06-10 14:21             ` Emanuele Giuseppe Esposito
2021-06-10 15:05               ` Vladimir Sementsov-Ogievskiy
2021-06-08  7:33 ` [PATCH v3 3/5] block-copy: move progress_set_remaining in block_copy_task_end Emanuele Giuseppe Esposito
2021-06-08  7:33 ` [PATCH v3 4/5] block-copy: add a CoMutex Emanuele Giuseppe Esposito
2021-06-09 12:25   ` Vladimir Sementsov-Ogievskiy
2021-06-10 14:49     ` Emanuele Giuseppe Esposito
2021-06-08  7:33 ` [PATCH v3 5/5] block-copy: atomic .cancelled and .finished fields in BlockCopyCallState Emanuele Giuseppe Esposito

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.