On 09.07.20 16:52, Andrey Shinkevich wrote:
> On 25.06.2020 18:21, Max Reitz wrote:
>> Because of the (not so recent anymore) changes that make the stream job
>> independent of the base node and instead track the node above it, we
>> have to split that "bottom" node into two cases: The bottom COW node,
>> and the node directly above the base node (which may be an R/W filter
>> or the bottom COW node).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   qapi/block-core.json |  4 +++
>>   block/stream.c       | 63 ++++++++++++++++++++++++++++++++------------
>>   blockdev.c           |  4 ++-
>>   3 files changed, 53 insertions(+), 18 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index b20332e592..df87855429 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -2486,6 +2486,10 @@
>>   # On successful completion the image file is updated to drop the
>> backing file
>>   # and the BLOCK_JOB_COMPLETED event is emitted.
>>   #
>> +# In case @device is a filter node, block-stream modifies the first
>> non-filter
>> +# overlay node below it to point to base's backing node (or NULL if
>> @base was
>> +# not specified) instead of modifying @device itself.
>> +#
>>   # @job-id: identifier for the newly-created block job. If
>>   #          omitted, the device name will be used. (Since 2.7)
>>   #
>> diff --git a/block/stream.c b/block/stream.c
>> index aa2e7af98e..b9c1141656 100644
>> --- a/block/stream.c
>> +++ b/block/stream.c
>> @@ -31,7 +31,8 @@ enum {
>>     typedef struct StreamBlockJob {
>>       BlockJob common;
>> -    BlockDriverState *bottom;
>> +    BlockDriverState *base_overlay; /* COW overlay (stream from this) */
>> +    BlockDriverState *above_base;   /* Node directly above the base */
> 
> Keeping the base_overlay is enough to complete the stream job.

Depends on the definition.  If we decide it isn’t enough, then it isn’t
enough.

> The above_base may disappear during the job and we can't rely on it.

In this version of this series, it may not, because the chain is frozen.
 So the above_base cannot disappear.

We can discuss whether we should allow it to disappear, but I think not.

The problem is, we need something to set as the backing file after
streaming.  How do we figure out what that should be?  My proposal is we
keep above_base and use its immediate child.

If we don’t keep above_base, then we’re basically left guessing as to
what should be the backing file after the stream job.

>>       BlockdevOnError on_error;
>>       char *backing_file_str;
>>       bool bs_read_only;
>> @@ -53,7 +54,7 @@ static void stream_abort(Job *job)
>>         if (s->chain_frozen) {
>>           BlockJob *bjob = &s->common;
>> -        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->bottom);
>> +        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base);
>>       }
>>   }
>>   @@ -62,14 +63,15 @@ static int stream_prepare(Job *job)
>>       StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>>       BlockJob *bjob = &s->common;
>>       BlockDriverState *bs = blk_bs(bjob->blk);
>> -    BlockDriverState *base = backing_bs(s->bottom);
>> +    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
>> +    BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
> 
> The initial base node may be a top node for a concurrent commit job and
> 
> may disappear.
Then it would just be replaced by another node, though, so above_base
keeps a child.  The @base here is not necessarily the initial @base, and
that’s intentional.

> base = bdrv_filter_or_cow_bs(s->base_overlay) is more reliable.

But also wrong.  The point of keeping above_base around is to get its
child here to use that child as the new backing child of the top node.

>>       Error *local_err = NULL;
>>       int ret = 0;
>>   -    bdrv_unfreeze_backing_chain(bs, s->bottom);
>> +    bdrv_unfreeze_backing_chain(bs, s->above_base);
>>       s->chain_frozen = false;
>>   -    if (bs->backing) {
>> +    if (bdrv_cow_child(unfiltered_bs)) {
>>           const char *base_id = NULL, *base_fmt = NULL;
>>           if (base) {
>>               base_id = s->backing_file_str;
>> @@ -77,8 +79,8 @@ static int stream_prepare(Job *job)
>>                   base_fmt = base->drv->format_name;
>>               }
>>           }
>> -        bdrv_set_backing_hd(bs, base, &local_err);
>> -        ret = bdrv_change_backing_file(bs, base_id, base_fmt);
>> +        bdrv_set_backing_hd(unfiltered_bs, base, &local_err);
>> +        ret = bdrv_change_backing_file(unfiltered_bs, base_id,
>> base_fmt);
>>           if (local_err) {
>>               error_report_err(local_err);
>>               return -EPERM;
>> @@ -109,14 +111,15 @@ static int coroutine_fn stream_run(Job *job,
>> Error **errp)
>>       StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>>       BlockBackend *blk = s->common.blk;
>>       BlockDriverState *bs = blk_bs(blk);
>> -    bool enable_cor = !backing_bs(s->bottom);
>> +    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
>> +    bool enable_cor = !bdrv_cow_child(s->base_overlay);
>>       int64_t len;
>>       int64_t offset = 0;
>>       uint64_t delay_ns = 0;
>>       int error = 0;
>>       int64_t n = 0; /* bytes */
>>   -    if (bs == s->bottom) {
>> +    if (unfiltered_bs == s->base_overlay) {
>>           /* Nothing to stream */
>>           return 0;
>>       }
>> @@ -150,13 +153,14 @@ static int coroutine_fn stream_run(Job *job,
>> Error **errp)
>>             copy = false;
>>   -        ret = bdrv_is_allocated(bs, offset, STREAM_CHUNK, &n);
>> +        ret = bdrv_is_allocated(unfiltered_bs, offset, STREAM_CHUNK,
>> &n);
>>           if (ret == 1) {
>>               /* Allocated in the top, no need to copy.  */
>>           } else if (ret >= 0) {
>>               /* Copy if allocated in the intermediate images.  Limit
>> to the
>>                * known-unallocated area [offset,
>> offset+n*BDRV_SECTOR_SIZE).  */
>> -            ret = bdrv_is_allocated_above(backing_bs(bs), s->bottom,
>> true,
>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(unfiltered_bs),
>> +                                          s->base_overlay, true,
>>                                             offset, n, &n);
>>               /* Finish early if end of backing file has been reached */
>>               if (ret == 0 && n == 0) {
>> @@ -223,9 +227,29 @@ void stream_start(const char *job_id,
>> BlockDriverState *bs,
>>       BlockDriverState *iter;
>>       bool bs_read_only;
>>       int basic_flags = BLK_PERM_CONSISTENT_READ |
>> BLK_PERM_WRITE_UNCHANGED;
>> -    BlockDriverState *bottom = bdrv_find_overlay(bs, base);
>> +    BlockDriverState *base_overlay = bdrv_find_overlay(bs, base);
>> +    BlockDriverState *above_base;
>>   -    if (bdrv_freeze_backing_chain(bs, bottom, errp) < 0) {
>> +    if (!base_overlay) {
>> +        error_setg(errp, "'%s' is not in the backing chain of '%s'",
>> +                   base->node_name, bs->node_name);
> 
> Sorry, I am not clear with the error message.
> 
> In this case, there is no an intermediate COW node but the base, if not
> NULL, is
> 
> in the backing chain of bs, isn't it?
> 
>> +        return;
>> +    }
>> +
>> +    /*
>> +     * Find the node directly above @base.  @base_overlay is a COW
>> overlay, so
>> +     * it must have a bdrv_cow_child(), but it is the immediate
>> overlay of
>> +     * @base, so between the two there can only be filters.
>> +     */
>> +    above_base = base_overlay;
>> +    if (bdrv_cow_bs(above_base) != base) {
>> +        above_base = bdrv_cow_bs(above_base);
>> +        while (bdrv_filter_bs(above_base) != base) {
>> +            above_base = bdrv_filter_bs(above_base);
>> +        }
>> +    }
>> +
>> +    if (bdrv_freeze_backing_chain(bs, above_base, errp) < 0) {
> 
> When a concurrent stream job tries to freeze or remove the above_base node,
> 
> we will encounter the frozen node error. The above_base node is a part
> of the
> 
> concurrent job frozen chain.

Correct.

>>           return;
>>       }
>>   @@ -255,14 +279,19 @@ void stream_start(const char *job_id,
>> BlockDriverState *bs,
>>        * and resizes. Reassign the base node pointer because the
>> backing BS of the
>>        * bottom node might change after the call to
>> bdrv_reopen_set_read_only()
>>        * due to parallel block jobs running.
>> +     * above_base node might change after the call to
> Yes, if not frozen.
>> +     * bdrv_reopen_set_read_only() due to parallel block jobs running.
>>        */
>> -    base = backing_bs(bottom);
>> -    for (iter = backing_bs(bs); iter && iter != base; iter =
>> backing_bs(iter)) {
>> +    base = bdrv_filter_or_cow_bs(above_base);
>> +    for (iter = bdrv_filter_or_cow_bs(bs); iter != base;
>> +         iter = bdrv_filter_or_cow_bs(iter))
>> +    {
>>           block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
>>                              basic_flags, &error_abort);
>>       }
>>   -    s->bottom = bottom;
>> +    s->base_overlay = base_overlay;
>> +    s->above_base = above_base;
> 
> Generally, being the filter for a concurrent job, the above_base node
> may be deleted any time
> 
> and we will keep the dangling pointer. It may happen even earlier if
> above_base is not frozen.
> 
> If it is, as it here, we may get the frozen link error then.

I’m not sure what you mean here.  Freezing it was absolutely
intentional.  A dangling pointer would be a problem, but that’s why it’s
frozen, so it stays around and can’t be deleted any time.

Max