On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote: > Drop write notifiers and use filter node instead. Changes: > > 1. copy-before-writes now handled by filter node, so, drop all > is_write_notifier arguments. > > 2. we don't have intersecting requests, so their handling is dropped. > Instead, synchronization works as follows: > when backup or backup-top starts copying of some area it firstly > clears copy-bitmap bits, and nobody touches areas, not marked with > dirty bits in copy-bitmap, so there is no intersection. Also, backup > job copy operations are surrounded by bdrv region lock, which is > actually serializing request, to not interfer with guest writes and > not read changed data from source (before reading we clear > corresponding bit in copy-bitmap, so, this area is not more handled by > backup-top). > > 3. To sync with in-flight requests we now just drain hook node, we > don't need rw-lock. > > 4. After the whole backup loop (top, full, incremental modes), we need > to check for not copied clusters, which were under backup-top operation > and we skipped them, but backup-top operation failed, error returned to > the guest and dirty bits set back. > > 5. Don't create additional blk, use backup-top children for copy > operations. > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > block/backup.c | 285 ++++++++++++++++++++++++++----------------------- > 1 file changed, 149 insertions(+), 136 deletions(-) > > diff --git a/block/backup.c b/block/backup.c > index 88c0242b4e..e332909fb7 100644 > --- a/block/backup.c > +++ b/block/backup.c [...] > @@ -300,21 +231,23 @@ static void backup_abort(Job *job) > static void backup_clean(Job *job) > { > BackupBlockJob *s = container_of(job, BackupBlockJob, common.job); > - assert(s->target); > - blk_unref(s->target); > + > + /* We must clean it to not crash in backup_drain. */ > s->target = NULL; Why not set s->source to NULL along with it? It makes sense if you're going to drop the backup-top node because both of these are its children. > > if (s->copy_bitmap) { > hbitmap_free(s->copy_bitmap); > s->copy_bitmap = NULL; > } > + > + bdrv_backup_top_drop(s->backup_top); > } [...] > @@ -386,21 +319,45 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job) > bool error_is_read; > int64_t offset; > HBitmapIter hbi; > + void *lock = NULL; > > hbitmap_iter_init(&hbi, job->copy_bitmap, 0); > - while ((offset = hbitmap_iter_next(&hbi)) != -1) { > + while (hbitmap_count(job->copy_bitmap)) { > + offset = hbitmap_iter_next(&hbi); > + if (offset == -1) { > + /* > + * we may have skipped some clusters, which were handled by > + * backup-top, but failed and finished by returning error to > + * the guest and set dirty bit back. > + */ > + hbitmap_iter_init(&hbi, job->copy_bitmap, 0); > + offset = hbitmap_iter_next(&hbi); > + assert(offset); I think you want to assert "offset >= 0". > + } > + > + lock = bdrv_co_try_lock(job->source, offset, job->cluster_size); > + /* > + * Dirty bit is set, which means that there are no in-flight > + * write requests on this area. We must succeed. > + */ > + assert(lock); I'm not sure that is true right now, but more on that below in backup_run(). > + > do { > if (yield_and_check(job)) { > + bdrv_co_unlock(lock); > return 0; > } > - ret = backup_do_cow(job, offset, > - job->cluster_size, &error_is_read, false); > + ret = backup_do_cow(job, offset, job->cluster_size, &error_is_read); > if (ret < 0 && backup_error_action(job, error_is_read, -ret) == > BLOCK_ERROR_ACTION_REPORT) > { > + bdrv_co_unlock(lock); > return ret; > } > } while (ret < 0); > + > + bdrv_co_unlock(lock); > + lock = NULL; This statement seems unnecessary here. > } > > return 0; [...] > @@ -447,26 +402,39 @@ static int coroutine_fn backup_run(Job *job, Error **errp) > hbitmap_set(s->copy_bitmap, 0, s->len); > } > > - s->before_write.notify = backup_before_write_notify; > - bdrv_add_before_write_notifier(bs, &s->before_write); > - > if (s->sync_mode == MIRROR_SYNC_MODE_NONE) { > /* All bits are set in copy_bitmap to allow any cluster to be copied. > * This does not actually require them to be copied. */ > while (!job_is_cancelled(job)) { > - /* Yield until the job is cancelled. We just let our before_write > - * notify callback service CoW requests. */ > + /* > + * Yield until the job is cancelled. We just let our backup-top > + * fileter driver service CbW requests. *filter > + */ > job_yield(job); > } > } else if (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) { > ret = backup_run_incremental(s); > } else { [...] > @@ -505,8 +474,20 @@ static int coroutine_fn backup_run(Job *job, Error **errp) > if (alloced < 0) { > ret = alloced; > } else { > + if (!hbitmap_get(s->copy_bitmap, offset)) { > + trace_backup_do_cow_skip(job, offset); > + continue; /* already copied */ > + } > + if (!lock) { > + lock = bdrv_co_try_lock(s->source, offset, s->cluster_size); > + /* > + * Dirty bit is set, which means that there are no in-flight > + * write requests on this area. We must succeed. > + */ > + assert(lock); What if I have a different parent node for the source that issues concurrent writes? This used to work fine because the before_write notifier would still work. After this patch, that would be broken because those writes would not cause a CbW. That's not so bad because we just have to make sure that all writes go through the backup-top node. That would make this assertion valid again, too. But that means we cannot share PERM_WRITE; see [1]. > + } > ret = backup_do_cow(s, offset, s->cluster_size, > - &error_is_read, false); > + &error_is_read); > } > if (ret < 0) { > /* Depending on error action, fail now or retry cluster */ > @@ -516,17 +497,34 @@ static int coroutine_fn backup_run(Job *job, Error **errp) > break; > } else { > offset -= s->cluster_size; > + retry = true; > continue; > } > } > } > + if (lock) { > + bdrv_co_unlock(lock); > + lock = NULL; > + } > + if (ret == 0 && !job_is_cancelled(job) && > + hbitmap_count(s->copy_bitmap)) > + { > + /* > + * we may have skipped some clusters, which were handled by > + * backup-top, but failed and finished by returning error to > + * the guest and set dirty bit back. So it's a matter of a race? > + */ > + goto iteration; > + } Why not wrap everything in a do {} while (ret == 0 && !job_is...) instead? Because it would mean reindenting everything? > } > > - notifier_with_return_remove(&s->before_write); > + /* wait pending CBW operations in backup-top */ > + bdrv_drain(s->backup_top); > > - /* wait until pending backup_do_cow() calls have completed */ > - qemu_co_rwlock_wrlock(&s->flush_rwlock); > - qemu_co_rwlock_unlock(&s->flush_rwlock); > + backup_top_progress = bdrv_backup_top_progress(s->backup_top); > + job_progress_update(job, ret + backup_top_progress - Why the "ret"? > + s->backup_top_progress); > + s->backup_top_progress = backup_top_progress; So the backup-top progress is ignored during basically all of the block job until it suddenly jumps to 100 % completion? That doesn't sound ideal. Or did you mean to put this into the for () loop of MODE_TOP/MODE_FULL? (And the while() loop of MODE_NONE) > > return ret; > } > @@ -563,6 +561,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs, > int ret; > int64_t cluster_size; > HBitmap *copy_bitmap = NULL; > + BlockDriverState *backup_top = NULL; > + uint64_t all_except_resize = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE | > + BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD; BLK_PERM_ALL & ~BLK_PERM_RESIZE? [1] But we probably do not want to share either PERM_WRITE or PERM_WRITE_UNCHANGED because during the duration of the backup, everything should go through the backup-top filter (not sure about PERM_WRITE_UNCHANGED right now). Or is that something that the filter node should enforce in backup_top_child_perm()? > > assert(bs); > assert(target); > @@ -655,25 +656,31 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs, > > copy_bitmap = hbitmap_alloc(len, ctz32(cluster_size)); > > - /* job->len is fixed, so we can't allow resize */ > - job = block_job_create(job_id, &backup_job_driver, txn, bs, > - BLK_PERM_CONSISTENT_READ, > - BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE | > - BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD, > - speed, creation_flags, cb, opaque, errp); > - if (!job) { > + /* > + * bdrv_get_device_name will not help to find device name starting from > + * @bs after backup-top append, Why not? Since backup-top is appended, shouldn't all parents of @bs be parents of @backup_top then? (Making bdrv_get_parent_name() return the same result) > so let's calculate job_id before. Do > + * it in the same way like block_job_create > + */ > + if (job_id == NULL && !(creation_flags & JOB_INTERNAL)) { > + job_id = bdrv_get_device_name(bs); > + } > + > + backup_top = bdrv_backup_top_append(bs, target, copy_bitmap, errp); > + if (!backup_top) { > goto error; > } > > - /* The target must match the source in size, so no resize here either */ > - job->target = blk_new(BLK_PERM_WRITE, > - BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE | > - BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD); > - ret = blk_insert_bs(job->target, target, errp); > - if (ret < 0) { > + /* job->len is fixed, so we can't allow resize */ > + job = block_job_create(job_id, &backup_job_driver, txn, bs, 0, Is there a reason you dropped PERM_CONSISTENT_READ here? > + all_except_resize, speed, creation_flags, > + cb, opaque, errp); > + if (!job) { > goto error; > } > > + job->source = backup_top->backing; > + job->target = ((BDRVBackupTopState *)backup_top->opaque)->target; This looks really ugly. I think as long as the block job performs writes itself, it should use its own BlockBackend. Alternatively, I think it would make sense for the backup-top filter to offer functions that this job can use to issue writes to the target. Then the backup job would no longer need direct access to the target as a BdrvChild. > + > job->on_source_error = on_source_error; > job->on_target_error = on_target_error; > job->sync_mode = sync_mode; [...] > @@ -712,6 +722,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs, > backup_clean(&job->common.job); > job_early_fail(&job->common.job); > } > + if (backup_top) { > + bdrv_backup_top_drop(backup_top); > + } While it shouldn't be a problem in practice, backup_top has a reference to copy_bitmap, so that bitmap should be freed after backup_top is dropped. Max > > return NULL; > } >