All of lore.kernel.org
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org, qemu-devel@nongnu.org
Cc: fam@euphon.net, stefanha@redhat.com, jcody@redhat.com,
	kwolf@redhat.com, den@openvz.org, eblake@redhat.com,
	jsnow@redhat.com
Subject: Re: [Qemu-devel] [PATCH v5 11/11] block/backup: use backup-top instead of write notifiers
Date: Fri, 18 Jan 2019 15:56:45 +0100	[thread overview]
Message-ID: <1ea09d99-04ff-711e-9137-cdae33ca8681@redhat.com> (raw)
In-Reply-To: <20181229122027.42245-12-vsementsov@virtuozzo.com>

[-- Attachment #1: Type: text/plain, Size: 12784 bytes --]

On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> Drop write notifiers and use filter node instead. Changes:
> 
> 1. copy-before-writes now handled by filter node, so, drop all
>    is_write_notifier arguments.
> 
> 2. we don't have intersecting requests, so their handling is dropped.
> Instead, synchronization works as follows:
> when backup or backup-top starts copying of some area it firstly
> clears copy-bitmap bits, and nobody touches areas, not marked with
> dirty bits in copy-bitmap, so there is no intersection. Also, backup
> job copy operations are surrounded by bdrv region lock, which is
> actually serializing request, to not interfer with guest writes and
> not read changed data from source (before reading we clear
> corresponding bit in copy-bitmap, so, this area is not more handled by
> backup-top).
> 
> 3. To sync with in-flight requests we now just drain hook node, we
> don't need rw-lock.
> 
> 4. After the whole backup loop (top, full, incremental modes), we need
> to check for not copied clusters, which were under backup-top operation
> and we skipped them, but backup-top operation failed, error returned to
> the guest and dirty bits set back.
> 
> 5. Don't create additional blk, use backup-top children for copy
> operations.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/backup.c | 285 ++++++++++++++++++++++++++-----------------------
>  1 file changed, 149 insertions(+), 136 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 88c0242b4e..e332909fb7 100644
> --- a/block/backup.c
> +++ b/block/backup.c

[...]

> @@ -300,21 +231,23 @@ static void backup_abort(Job *job)
>  static void backup_clean(Job *job)
>  {
>      BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
> -    assert(s->target);
> -    blk_unref(s->target);
> +
> +    /* We must clean it to not crash in backup_drain. */
>      s->target = NULL;

Why not set s->source to NULL along with it?  It makes sense if you're
going to drop the backup-top node because both of these are its children.

>  
>      if (s->copy_bitmap) {
>          hbitmap_free(s->copy_bitmap);
>          s->copy_bitmap = NULL;
>      }
> +
> +    bdrv_backup_top_drop(s->backup_top);
>  }

[...]

> @@ -386,21 +319,45 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
>      bool error_is_read;
>      int64_t offset;
>      HBitmapIter hbi;
> +    void *lock = NULL;
>  
>      hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
> -    while ((offset = hbitmap_iter_next(&hbi)) != -1) {
> +    while (hbitmap_count(job->copy_bitmap)) {
> +        offset = hbitmap_iter_next(&hbi);
> +        if (offset == -1) {
> +            /*
> +             * we may have skipped some clusters, which were handled by
> +             * backup-top, but failed and finished by returning error to
> +             * the guest and set dirty bit back.
> +             */
> +            hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
> +            offset = hbitmap_iter_next(&hbi);
> +            assert(offset);

I think you want to assert "offset >= 0".

> +        }
> +
> +        lock = bdrv_co_try_lock(job->source, offset, job->cluster_size);
> +        /*
> +         * Dirty bit is set, which means that there are no in-flight
> +         * write requests on this area. We must succeed.
> +         */
> +        assert(lock);

I'm not sure that is true right now, but more on that below in backup_run().

> +
>          do {
>              if (yield_and_check(job)) {
> +                bdrv_co_unlock(lock);
>                  return 0;
>              }
> -            ret = backup_do_cow(job, offset,
> -                                job->cluster_size, &error_is_read, false);
> +            ret = backup_do_cow(job, offset, job->cluster_size, &error_is_read);
>              if (ret < 0 && backup_error_action(job, error_is_read, -ret) ==
>                             BLOCK_ERROR_ACTION_REPORT)
>              {
> +                bdrv_co_unlock(lock);
>                  return ret;
>              }
>          } while (ret < 0);
> +
> +        bdrv_co_unlock(lock);
> +        lock = NULL;

This statement seems unnecessary here.

>      }
>  
>      return 0;

[...]

> @@ -447,26 +402,39 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>          hbitmap_set(s->copy_bitmap, 0, s->len);
>      }
>  
> -    s->before_write.notify = backup_before_write_notify;
> -    bdrv_add_before_write_notifier(bs, &s->before_write);
> -
>      if (s->sync_mode == MIRROR_SYNC_MODE_NONE) {
>          /* All bits are set in copy_bitmap to allow any cluster to be copied.
>           * This does not actually require them to be copied. */
>          while (!job_is_cancelled(job)) {
> -            /* Yield until the job is cancelled.  We just let our before_write
> -             * notify callback service CoW requests. */
> +            /*
> +             * Yield until the job is cancelled.  We just let our backup-top
> +             * fileter driver service CbW requests.

*filter

> +             */
>              job_yield(job);
>          }
>      } else if (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
>          ret = backup_run_incremental(s);
>      } else {

[...]

> @@ -505,8 +474,20 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>              if (alloced < 0) {
>                  ret = alloced;
>              } else {
> +                if (!hbitmap_get(s->copy_bitmap, offset)) {
> +                    trace_backup_do_cow_skip(job, offset);
> +                    continue; /* already copied */
> +                }
> +                if (!lock) {
> +                    lock = bdrv_co_try_lock(s->source, offset, s->cluster_size);
> +                    /*
> +                     * Dirty bit is set, which means that there are no in-flight
> +                     * write requests on this area. We must succeed.
> +                     */
> +                    assert(lock);

What if I have a different parent node for the source that issues
concurrent writes?  This used to work fine because the before_write
notifier would still work.  After this patch, that would be broken
because those writes would not cause a CbW.

That's not so bad because we just have to make sure that all writes go
through the backup-top node.  That would make this assertion valid
again, too.  But that means we cannot share PERM_WRITE; see [1].

> +                }
>                  ret = backup_do_cow(s, offset, s->cluster_size,
> -                                    &error_is_read, false);
> +                                    &error_is_read);
>              }
>              if (ret < 0) {
>                  /* Depending on error action, fail now or retry cluster */
> @@ -516,17 +497,34 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>                      break;
>                  } else {
>                      offset -= s->cluster_size;
> +                    retry = true;
>                      continue;
>                  }
>              }
>          }
> +        if (lock) {
> +            bdrv_co_unlock(lock);
> +            lock = NULL;
> +        }
> +        if (ret == 0 && !job_is_cancelled(job) &&
> +            hbitmap_count(s->copy_bitmap))
> +        {
> +            /*
> +             * we may have skipped some clusters, which were handled by
> +             * backup-top, but failed and finished by returning error to
> +             * the guest and set dirty bit back.

So it's a matter of a race?

> +             */
> +            goto iteration;
> +        }

Why not wrap everything in a do {} while (ret == 0 && !job_is...)
instead?  Because it would mean reindenting everything?

>      }
>  
> -    notifier_with_return_remove(&s->before_write);
> +    /* wait pending CBW operations in backup-top */
> +    bdrv_drain(s->backup_top);
>  
> -    /* wait until pending backup_do_cow() calls have completed */
> -    qemu_co_rwlock_wrlock(&s->flush_rwlock);
> -    qemu_co_rwlock_unlock(&s->flush_rwlock);
> +    backup_top_progress = bdrv_backup_top_progress(s->backup_top);
> +    job_progress_update(job, ret + backup_top_progress -

Why the "ret"?

> +                        s->backup_top_progress);
> +    s->backup_top_progress = backup_top_progress;

So the backup-top progress is ignored during basically all of the block
job until it suddenly jumps to 100 % completion?  That doesn't sound ideal.

Or did you mean to put this into the for () loop of MODE_TOP/MODE_FULL?
 (And the while() loop of MODE_NONE)

>  
>      return ret;
>  }
> @@ -563,6 +561,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>      int ret;
>      int64_t cluster_size;
>      HBitmap *copy_bitmap = NULL;
> +    BlockDriverState *backup_top = NULL;
> +    uint64_t all_except_resize = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
> +                                 BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD;

BLK_PERM_ALL & ~BLK_PERM_RESIZE?

[1] But we probably do not want to share either PERM_WRITE or
PERM_WRITE_UNCHANGED because during the duration of the backup,
everything should go through the backup-top filter (not sure about
PERM_WRITE_UNCHANGED right now).  Or is that something that the filter
node should enforce in backup_top_child_perm()?

>  
>      assert(bs);
>      assert(target);
> @@ -655,25 +656,31 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>  
>      copy_bitmap = hbitmap_alloc(len, ctz32(cluster_size));
>  
> -    /* job->len is fixed, so we can't allow resize */
> -    job = block_job_create(job_id, &backup_job_driver, txn, bs,
> -                           BLK_PERM_CONSISTENT_READ,
> -                           BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
> -                           BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD,
> -                           speed, creation_flags, cb, opaque, errp);
> -    if (!job) {
> +    /*
> +     * bdrv_get_device_name will not help to find device name starting from
> +     * @bs after backup-top append,

Why not?  Since backup-top is appended, shouldn't all parents of @bs be
parents of @backup_top then?  (Making bdrv_get_parent_name() return the
same result)

>                                      so let's calculate job_id before. Do
> +     * it in the same way like block_job_create
> +     */
> +    if (job_id == NULL && !(creation_flags & JOB_INTERNAL)) {
> +        job_id = bdrv_get_device_name(bs);
> +    }
> +
> +    backup_top = bdrv_backup_top_append(bs, target, copy_bitmap, errp);
> +    if (!backup_top) {
>          goto error;
>      }
>  
> -    /* The target must match the source in size, so no resize here either */
> -    job->target = blk_new(BLK_PERM_WRITE,
> -                          BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
> -                          BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD);
> -    ret = blk_insert_bs(job->target, target, errp);
> -    if (ret < 0) {
> +    /* job->len is fixed, so we can't allow resize */
> +    job = block_job_create(job_id, &backup_job_driver, txn, bs, 0,

Is there a reason you dropped PERM_CONSISTENT_READ here?

> +                           all_except_resize, speed, creation_flags,
> +                           cb, opaque, errp);
> +    if (!job) {
>          goto error;
>      }
>  
> +    job->source = backup_top->backing;
> +    job->target = ((BDRVBackupTopState *)backup_top->opaque)->target;

This looks really ugly.  I think as long as the block job performs
writes itself, it should use its own BlockBackend.

Alternatively, I think it would make sense for the backup-top filter to
offer functions that this job can use to issue writes to the target.
Then the backup job would no longer need direct access to the target as
a BdrvChild.

> +
>      job->on_source_error = on_source_error;
>      job->on_target_error = on_target_error;
>      job->sync_mode = sync_mode;

[...]

> @@ -712,6 +722,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>          backup_clean(&job->common.job);
>          job_early_fail(&job->common.job);
>      }
> +    if (backup_top) {
> +        bdrv_backup_top_drop(backup_top);
> +    }

While it shouldn't be a problem in practice, backup_top has a reference
to copy_bitmap, so that bitmap should be freed after backup_top is dropped.

Max

>  
>      return NULL;
>  }
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2019-01-18 14:57 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-29 12:20 [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup Vladimir Sementsov-Ogievskiy
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 01/11] block/backup: simplify backup_incremental_init_copy_bitmap Vladimir Sementsov-Ogievskiy
2019-01-14 13:10   ` Max Reitz
2019-01-14 13:13     ` Max Reitz
2019-01-14 14:01     ` Vladimir Sementsov-Ogievskiy
2019-01-14 14:13       ` Max Reitz
2019-01-14 14:48         ` Vladimir Sementsov-Ogievskiy
2019-01-16 13:05           ` Max Reitz
2019-01-23  8:20             ` Vladimir Sementsov-Ogievskiy
2019-01-23 13:19               ` Max Reitz
2019-01-23 14:36               ` Eric Blake
2019-01-24 14:20                 ` Vladimir Sementsov-Ogievskiy
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 02/11] block/backup: move to copy_bitmap with granularity Vladimir Sementsov-Ogievskiy
2019-01-14 14:10   ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 03/11] block: improve should_update_child Vladimir Sementsov-Ogievskiy
2019-01-14 14:32   ` Max Reitz
2019-01-14 16:13     ` Vladimir Sementsov-Ogievskiy
2019-01-16 13:17       ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 04/11] iotests: handle -f argument correctly for qemu_io_silent Vladimir Sementsov-Ogievskiy
2019-01-14 14:36   ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 05/11] iotests: allow resume_drive by node name Vladimir Sementsov-Ogievskiy
2019-01-14 14:46   ` Max Reitz
2019-01-14 16:06     ` Vladimir Sementsov-Ogievskiy
2019-01-16 13:11       ` Max Reitz
2019-01-23 13:22         ` Vladimir Sementsov-Ogievskiy
2019-01-23 13:31           ` Vladimir Sementsov-Ogievskiy
2019-01-23 13:33             ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 06/11] iotests: prepare 055 to graph changes during backup job Vladimir Sementsov-Ogievskiy
2019-01-16 13:48   ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 07/11] block: introduce backup-top filter driver Vladimir Sementsov-Ogievskiy
2019-01-16 16:02   ` Max Reitz
2019-01-17 12:13     ` Vladimir Sementsov-Ogievskiy
2019-01-18 12:05       ` Max Reitz
2019-01-23 13:47         ` Vladimir Sementsov-Ogievskiy
2019-04-13 16:08     ` Vladimir Sementsov-Ogievskiy
2019-04-13 16:08       ` Vladimir Sementsov-Ogievskiy
2019-04-13 17:03       ` Vladimir Sementsov-Ogievskiy
2019-04-13 17:03         ` Vladimir Sementsov-Ogievskiy
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 08/11] block/io: refactor wait_serialising_requests Vladimir Sementsov-Ogievskiy
2019-01-16 16:18   ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 09/11] block: add lock/unlock range functions Vladimir Sementsov-Ogievskiy
2019-01-16 16:36   ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 10/11] block/backup: tiny refactor backup_job_create Vladimir Sementsov-Ogievskiy
2019-01-18 13:00   ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 11/11] block/backup: use backup-top instead of write notifiers Vladimir Sementsov-Ogievskiy
2019-01-18 14:56   ` Max Reitz [this message]
2019-01-28 11:29     ` Vladimir Sementsov-Ogievskiy
2019-01-28 15:59       ` Max Reitz
2019-01-28 16:44         ` Vladimir Sementsov-Ogievskiy
2019-01-28 16:53           ` Max Reitz
2019-01-28 17:14             ` Vladimir Sementsov-Ogievskiy
2019-01-28 17:40           ` Kevin Wolf
2019-01-28 19:00             ` Vladimir Sementsov-Ogievskiy
2019-01-23 15:26 ` [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup no-reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1ea09d99-04ff-711e-9137-cdae33ca8681@redhat.com \
    --to=mreitz@redhat.com \
    --cc=den@openvz.org \
    --cc=eblake@redhat.com \
    --cc=fam@euphon.net \
    --cc=jcody@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.