On 10.07.20 19:41, Andrey Shinkevich wrote:
> On 10.07.2020 18:24, Max Reitz wrote:
>> On 09.07.20 16:52, Andrey Shinkevich wrote:
>>> On 25.06.2020 18:21, Max Reitz wrote:
>>>> Because of the (not so recent anymore) changes that make the stream job
>>>> independent of the base node and instead track the node above it, we
>>>> have to split that "bottom" node into two cases: The bottom COW node,
>>>> and the node directly above the base node (which may be an R/W filter
>>>> or the bottom COW node).
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>    qapi/block-core.json |  4 +++
>>>>    block/stream.c       | 63
>>>> ++++++++++++++++++++++++++++++++------------
>>>>    blockdev.c           |  4 ++-
>>>>    3 files changed, 53 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>> index b20332e592..df87855429 100644
>>>> --- a/qapi/block-core.json
>>>> +++ b/qapi/block-core.json
>>>> @@ -2486,6 +2486,10 @@
>>>>    # On successful completion the image file is updated to drop the
>>>> backing file
>>>>    # and the BLOCK_JOB_COMPLETED event is emitted.
>>>>    #
>>>> +# In case @device is a filter node, block-stream modifies the first
>>>> non-filter
>>>> +# overlay node below it to point to base's backing node (or NULL if
>>>> @base was
>>>> +# not specified) instead of modifying @device itself.
>>>> +#
>>>>    # @job-id: identifier for the newly-created block job. If
>>>>    #          omitted, the device name will be used. (Since 2.7)
>>>>    #
>>>> diff --git a/block/stream.c b/block/stream.c
>>>> index aa2e7af98e..b9c1141656 100644
>>>> --- a/block/stream.c
>>>> +++ b/block/stream.c
>>>> @@ -31,7 +31,8 @@ enum {
>>>>      typedef struct StreamBlockJob {
>>>>        BlockJob common;
>>>> -    BlockDriverState *bottom;
>>>> +    BlockDriverState *base_overlay; /* COW overlay (stream from
>>>> this) */
>>>> +    BlockDriverState *above_base;   /* Node directly above the base */
>>> Keeping the base_overlay is enough to complete the stream job.
>> Depends on the definition.  If we decide it isn’t enough, then it isn’t
>> enough.
>>
>>> The above_base may disappear during the job and we can't rely on it.
>> In this version of this series, it may not, because the chain is frozen.
>>   So the above_base cannot disappear.
> 
> Once we insert a filter above the top bs of the stream job, the parallel
> jobs in
> 
> the iotests #030 will fail with 'frozen link error'. It is because of the
> 
> independent parallel stream or commit jobs that insert/remove their filters
> 
> asynchroniously.

I’m not sure whether that’s a problem with this series specifically.

>> We can discuss whether we should allow it to disappear, but I think not.
>>
>> The problem is, we need something to set as the backing file after
>> streaming.  How do we figure out what that should be?  My proposal is we
>> keep above_base and use its immediate child.
> 
> We can do the same with the base_overlay.
> 
> If the backing node turns out to be a filter, the proper backing child will
> 
> be set after the filter is removed. So, we shouldn't care.

And what if the user manually added some filter above the base (i.e.
below base_overlay) that they want to keep after the job?

>> If we don’t keep above_base, then we’re basically left guessing as to
>> what should be the backing file after the stream job.
>>
>>>>        BlockdevOnError on_error;
>>>>        char *backing_file_str;
>>>>        bool bs_read_only;
>>>> @@ -53,7 +54,7 @@ static void stream_abort(Job *job)
>>>>          if (s->chain_frozen) {
>>>>            BlockJob *bjob = &s->common;
>>>> -        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->bottom);
>>>> +        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base);
>>>>        }
>>>>    }
>>>>    @@ -62,14 +63,15 @@ static int stream_prepare(Job *job)
>>>>        StreamBlockJob *s = container_of(job, StreamBlockJob,
>>>> common.job);
>>>>        BlockJob *bjob = &s->common;
>>>>        BlockDriverState *bs = blk_bs(bjob->blk);
>>>> -    BlockDriverState *base = backing_bs(s->bottom);
>>>> +    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
>>>> +    BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
>>> The initial base node may be a top node for a concurrent commit job and
>>>
>>> may disappear.
>> Then it would just be replaced by another node, though, so above_base
>> keeps a child.  The @base here is not necessarily the initial @base, and
>> that’s intentional.
> 
> Not really. In my example, above_base becomes a dangling
> 
> pointer because after the commit job finishes, its filter that should
> belong to the
> 
> commit job frozen chain will be deleted. If we freeze the link to the
> above_base
> 
> for this job, the iotests #30 will not pass.

So it doesn’t become a dangling pointer, because it’s frozen.

030 passes after this series, so I’m not sure whether I can consider
that problem part of this series.

I think if adding a filter node becomes a problem, we have to consider
relaxing the restrictions when we do that, not now.

>>> base = bdrv_filter_or_cow_bs(s->base_overlay) is more reliable.
>> But also wrong.  The point of keeping above_base around is to get its
>> child here to use that child as the new backing child of the top node.
>>
>>>>        Error *local_err = NULL;
>>>>        int ret = 0;
>>>>    -    bdrv_unfreeze_backing_chain(bs, s->bottom);
>>>> +    bdrv_unfreeze_backing_chain(bs, s->above_base);
>>>>        s->chain_frozen = false;
>>>>    -    if (bs->backing) {
>>>> +    if (bdrv_cow_child(unfiltered_bs)) {
>>>>            const char *base_id = NULL, *base_fmt = NULL;
>>>>            if (base) {
>>>>                base_id = s->backing_file_str;
>>>> @@ -77,8 +79,8 @@ static int stream_prepare(Job *job)
>>>>                    base_fmt = base->drv->format_name;
>>>>                }
>>>>            }
>>>> -        bdrv_set_backing_hd(bs, base, &local_err);
>>>> -        ret = bdrv_change_backing_file(bs, base_id, base_fmt);
>>>> +        bdrv_set_backing_hd(unfiltered_bs, base, &local_err);
>>>> +        ret = bdrv_change_backing_file(unfiltered_bs, base_id,
>>>> base_fmt);
>>>>            if (local_err) {
>>>>                error_report_err(local_err);
>>>>                return -EPERM;
>>>> @@ -109,14 +111,15 @@ static int coroutine_fn stream_run(Job *job,
>>>> Error **errp)
>>>>        StreamBlockJob *s = container_of(job, StreamBlockJob,
>>>> common.job);
>>>>        BlockBackend *blk = s->common.blk;
>>>>        BlockDriverState *bs = blk_bs(blk);
>>>> -    bool enable_cor = !backing_bs(s->bottom);
>>>> +    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
>>>> +    bool enable_cor = !bdrv_cow_child(s->base_overlay);
>>>>        int64_t len;
>>>>        int64_t offset = 0;
>>>>        uint64_t delay_ns = 0;
>>>>        int error = 0;
>>>>        int64_t n = 0; /* bytes */
>>>>    -    if (bs == s->bottom) {
>>>> +    if (unfiltered_bs == s->base_overlay) {
>>>>            /* Nothing to stream */
>>>>            return 0;
>>>>        }
>>>> @@ -150,13 +153,14 @@ static int coroutine_fn stream_run(Job *job,
>>>> Error **errp)
>>>>              copy = false;
>>>>    -        ret = bdrv_is_allocated(bs, offset, STREAM_CHUNK, &n);
>>>> +        ret = bdrv_is_allocated(unfiltered_bs, offset, STREAM_CHUNK,
>>>> &n);
>>>>            if (ret == 1) {
>>>>                /* Allocated in the top, no need to copy.  */
>>>>            } else if (ret >= 0) {
>>>>                /* Copy if allocated in the intermediate images.  Limit
>>>> to the
>>>>                 * known-unallocated area [offset,
>>>> offset+n*BDRV_SECTOR_SIZE).  */
>>>> -            ret = bdrv_is_allocated_above(backing_bs(bs), s->bottom,
>>>> true,
>>>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(unfiltered_bs),
>>>> +                                          s->base_overlay, true,
>>>>                                              offset, n, &n);
>>>>                /* Finish early if end of backing file has been
>>>> reached */
>>>>                if (ret == 0 && n == 0) {
>>>> @@ -223,9 +227,29 @@ void stream_start(const char *job_id,
>>>> BlockDriverState *bs,
>>>>        BlockDriverState *iter;
>>>>        bool bs_read_only;
>>>>        int basic_flags = BLK_PERM_CONSISTENT_READ |
>>>> BLK_PERM_WRITE_UNCHANGED;
>>>> -    BlockDriverState *bottom = bdrv_find_overlay(bs, base);
>>>> +    BlockDriverState *base_overlay = bdrv_find_overlay(bs, base);
>>>> +    BlockDriverState *above_base;
>>>>    -    if (bdrv_freeze_backing_chain(bs, bottom, errp) < 0) {
>>>> +    if (!base_overlay) {
>>>> +        error_setg(errp, "'%s' is not in the backing chain of '%s'",
>>>> +                   base->node_name, bs->node_name);
>>> Sorry, I am not clear with the error message.
>>>
>>> In this case, there is no an intermediate COW node but the base, if not
>>> NULL, is
>>>
>>> in the backing chain of bs, isn't it?
>>>
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * Find the node directly above @base.  @base_overlay is a COW
>>>> overlay, so
>>>> +     * it must have a bdrv_cow_child(), but it is the immediate
>>>> overlay of
>>>> +     * @base, so between the two there can only be filters.
>>>> +     */
>>>> +    above_base = base_overlay;
>>>> +    if (bdrv_cow_bs(above_base) != base) {
>>>> +        above_base = bdrv_cow_bs(above_base);
>>>> +        while (bdrv_filter_bs(above_base) != base) {
>>>> +            above_base = bdrv_filter_bs(above_base);
>>>> +        }
>>>> +    }
>>>> +
>>>> +    if (bdrv_freeze_backing_chain(bs, above_base, errp) < 0) {
>>> When a concurrent stream job tries to freeze or remove the above_base
>>> node,
>>>
>>> we will encounter the frozen node error. The above_base node is a part
>>> of the
>>>
>>> concurrent job frozen chain.
>> Correct.
>>
>>>>            return;
>>>>        }
>>>>    @@ -255,14 +279,19 @@ void stream_start(const char *job_id,
>>>> BlockDriverState *bs,
>>>>         * and resizes. Reassign the base node pointer because the
>>>> backing BS of the
>>>>         * bottom node might change after the call to
>>>> bdrv_reopen_set_read_only()
>>>>         * due to parallel block jobs running.
>>>> +     * above_base node might change after the call to
>>> Yes, if not frozen.
>>>> +     * bdrv_reopen_set_read_only() due to parallel block jobs running.
>>>>         */
>>>> -    base = backing_bs(bottom);
>>>> -    for (iter = backing_bs(bs); iter && iter != base; iter =
>>>> backing_bs(iter)) {
>>>> +    base = bdrv_filter_or_cow_bs(above_base);
>>>> +    for (iter = bdrv_filter_or_cow_bs(bs); iter != base;
>>>> +         iter = bdrv_filter_or_cow_bs(iter))
>>>> +    {
>>>>            block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
>>>>                               basic_flags, &error_abort);
>>>>        }
>>>>    -    s->bottom = bottom;
>>>> +    s->base_overlay = base_overlay;
>>>> +    s->above_base = above_base;
>>> Generally, being the filter for a concurrent job, the above_base node
>>> may be deleted any time
>>>
>>> and we will keep the dangling pointer. It may happen even earlier if
>>> above_base is not frozen.
>>>
>>> If it is, as it here, we may get the frozen link error then.
>> I’m not sure what you mean here.  Freezing it was absolutely
>> intentional.  A dangling pointer would be a problem, but that’s why it’s
>> frozen, so it stays around and can’t be deleted any time.
>>
>> Max
> 
> The nodes we freeze should be in one context of the relevant job:
> 
> filter->top_node->intermediate_node(s)
> 
> We would not include the base or any filter above it to the frozen chain
> 
> because they are of a different job context.

They aren’t really, because we need to know the backing node of @device
after the job.

> Once 'this' job is completed, we set the current backing child of the
> base_overlay
> 
> and may not care of its character. If that is another job filter, it
> will be replaced
> 
> with the proper node afterwards.

But what if there are filters above the base that the user wants to keep
after the job?

Max