On 10.07.20 19:41, Andrey Shinkevich wrote: > On 10.07.2020 18:24, Max Reitz wrote: >> On 09.07.20 16:52, Andrey Shinkevich wrote: >>> On 25.06.2020 18:21, Max Reitz wrote: >>>> Because of the (not so recent anymore) changes that make the stream job >>>> independent of the base node and instead track the node above it, we >>>> have to split that "bottom" node into two cases: The bottom COW node, >>>> and the node directly above the base node (which may be an R/W filter >>>> or the bottom COW node). >>>> >>>> Signed-off-by: Max Reitz >>>> --- >>>>    qapi/block-core.json |  4 +++ >>>>    block/stream.c       | 63 >>>> ++++++++++++++++++++++++++++++++------------ >>>>    blockdev.c           |  4 ++- >>>>    3 files changed, 53 insertions(+), 18 deletions(-) >>>> >>>> diff --git a/qapi/block-core.json b/qapi/block-core.json >>>> index b20332e592..df87855429 100644 >>>> --- a/qapi/block-core.json >>>> +++ b/qapi/block-core.json >>>> @@ -2486,6 +2486,10 @@ >>>>    # On successful completion the image file is updated to drop the >>>> backing file >>>>    # and the BLOCK_JOB_COMPLETED event is emitted. >>>>    # >>>> +# In case @device is a filter node, block-stream modifies the first >>>> non-filter >>>> +# overlay node below it to point to base's backing node (or NULL if >>>> @base was >>>> +# not specified) instead of modifying @device itself. >>>> +# >>>>    # @job-id: identifier for the newly-created block job. If >>>>    #          omitted, the device name will be used. (Since 2.7) >>>>    # >>>> diff --git a/block/stream.c b/block/stream.c >>>> index aa2e7af98e..b9c1141656 100644 >>>> --- a/block/stream.c >>>> +++ b/block/stream.c >>>> @@ -31,7 +31,8 @@ enum { >>>>      typedef struct StreamBlockJob { >>>>        BlockJob common; >>>> -    BlockDriverState *bottom; >>>> +    BlockDriverState *base_overlay; /* COW overlay (stream from >>>> this) */ >>>> +    BlockDriverState *above_base;   /* Node directly above the base */ >>> Keeping the base_overlay is enough to complete the stream job. >> Depends on the definition.  If we decide it isn’t enough, then it isn’t >> enough. >> >>> The above_base may disappear during the job and we can't rely on it. >> In this version of this series, it may not, because the chain is frozen. >>   So the above_base cannot disappear. > > Once we insert a filter above the top bs of the stream job, the parallel > jobs in > > the iotests #030 will fail with 'frozen link error'. It is because of the > > independent parallel stream or commit jobs that insert/remove their filters > > asynchroniously. I’m not sure whether that’s a problem with this series specifically. >> We can discuss whether we should allow it to disappear, but I think not. >> >> The problem is, we need something to set as the backing file after >> streaming.  How do we figure out what that should be?  My proposal is we >> keep above_base and use its immediate child. > > We can do the same with the base_overlay. > > If the backing node turns out to be a filter, the proper backing child will > > be set after the filter is removed. So, we shouldn't care. And what if the user manually added some filter above the base (i.e. below base_overlay) that they want to keep after the job? >> If we don’t keep above_base, then we’re basically left guessing as to >> what should be the backing file after the stream job. >> >>>>        BlockdevOnError on_error; >>>>        char *backing_file_str; >>>>        bool bs_read_only; >>>> @@ -53,7 +54,7 @@ static void stream_abort(Job *job) >>>>          if (s->chain_frozen) { >>>>            BlockJob *bjob = &s->common; >>>> -        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->bottom); >>>> +        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base); >>>>        } >>>>    } >>>>    @@ -62,14 +63,15 @@ static int stream_prepare(Job *job) >>>>        StreamBlockJob *s = container_of(job, StreamBlockJob, >>>> common.job); >>>>        BlockJob *bjob = &s->common; >>>>        BlockDriverState *bs = blk_bs(bjob->blk); >>>> -    BlockDriverState *base = backing_bs(s->bottom); >>>> +    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs); >>>> +    BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base); >>> The initial base node may be a top node for a concurrent commit job and >>> >>> may disappear. >> Then it would just be replaced by another node, though, so above_base >> keeps a child.  The @base here is not necessarily the initial @base, and >> that’s intentional. > > Not really. In my example, above_base becomes a dangling > > pointer because after the commit job finishes, its filter that should > belong to the > > commit job frozen chain will be deleted. If we freeze the link to the > above_base > > for this job, the iotests #30 will not pass. So it doesn’t become a dangling pointer, because it’s frozen. 030 passes after this series, so I’m not sure whether I can consider that problem part of this series. I think if adding a filter node becomes a problem, we have to consider relaxing the restrictions when we do that, not now. >>> base = bdrv_filter_or_cow_bs(s->base_overlay) is more reliable. >> But also wrong.  The point of keeping above_base around is to get its >> child here to use that child as the new backing child of the top node. >> >>>>        Error *local_err = NULL; >>>>        int ret = 0; >>>>    -    bdrv_unfreeze_backing_chain(bs, s->bottom); >>>> +    bdrv_unfreeze_backing_chain(bs, s->above_base); >>>>        s->chain_frozen = false; >>>>    -    if (bs->backing) { >>>> +    if (bdrv_cow_child(unfiltered_bs)) { >>>>            const char *base_id = NULL, *base_fmt = NULL; >>>>            if (base) { >>>>                base_id = s->backing_file_str; >>>> @@ -77,8 +79,8 @@ static int stream_prepare(Job *job) >>>>                    base_fmt = base->drv->format_name; >>>>                } >>>>            } >>>> -        bdrv_set_backing_hd(bs, base, &local_err); >>>> -        ret = bdrv_change_backing_file(bs, base_id, base_fmt); >>>> +        bdrv_set_backing_hd(unfiltered_bs, base, &local_err); >>>> +        ret = bdrv_change_backing_file(unfiltered_bs, base_id, >>>> base_fmt); >>>>            if (local_err) { >>>>                error_report_err(local_err); >>>>                return -EPERM; >>>> @@ -109,14 +111,15 @@ static int coroutine_fn stream_run(Job *job, >>>> Error **errp) >>>>        StreamBlockJob *s = container_of(job, StreamBlockJob, >>>> common.job); >>>>        BlockBackend *blk = s->common.blk; >>>>        BlockDriverState *bs = blk_bs(blk); >>>> -    bool enable_cor = !backing_bs(s->bottom); >>>> +    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs); >>>> +    bool enable_cor = !bdrv_cow_child(s->base_overlay); >>>>        int64_t len; >>>>        int64_t offset = 0; >>>>        uint64_t delay_ns = 0; >>>>        int error = 0; >>>>        int64_t n = 0; /* bytes */ >>>>    -    if (bs == s->bottom) { >>>> +    if (unfiltered_bs == s->base_overlay) { >>>>            /* Nothing to stream */ >>>>            return 0; >>>>        } >>>> @@ -150,13 +153,14 @@ static int coroutine_fn stream_run(Job *job, >>>> Error **errp) >>>>              copy = false; >>>>    -        ret = bdrv_is_allocated(bs, offset, STREAM_CHUNK, &n); >>>> +        ret = bdrv_is_allocated(unfiltered_bs, offset, STREAM_CHUNK, >>>> &n); >>>>            if (ret == 1) { >>>>                /* Allocated in the top, no need to copy.  */ >>>>            } else if (ret >= 0) { >>>>                /* Copy if allocated in the intermediate images.  Limit >>>> to the >>>>                 * known-unallocated area [offset, >>>> offset+n*BDRV_SECTOR_SIZE).  */ >>>> -            ret = bdrv_is_allocated_above(backing_bs(bs), s->bottom, >>>> true, >>>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(unfiltered_bs), >>>> +                                          s->base_overlay, true, >>>>                                              offset, n, &n); >>>>                /* Finish early if end of backing file has been >>>> reached */ >>>>                if (ret == 0 && n == 0) { >>>> @@ -223,9 +227,29 @@ void stream_start(const char *job_id, >>>> BlockDriverState *bs, >>>>        BlockDriverState *iter; >>>>        bool bs_read_only; >>>>        int basic_flags = BLK_PERM_CONSISTENT_READ | >>>> BLK_PERM_WRITE_UNCHANGED; >>>> -    BlockDriverState *bottom = bdrv_find_overlay(bs, base); >>>> +    BlockDriverState *base_overlay = bdrv_find_overlay(bs, base); >>>> +    BlockDriverState *above_base; >>>>    -    if (bdrv_freeze_backing_chain(bs, bottom, errp) < 0) { >>>> +    if (!base_overlay) { >>>> +        error_setg(errp, "'%s' is not in the backing chain of '%s'", >>>> +                   base->node_name, bs->node_name); >>> Sorry, I am not clear with the error message. >>> >>> In this case, there is no an intermediate COW node but the base, if not >>> NULL, is >>> >>> in the backing chain of bs, isn't it? >>> >>>> +        return; >>>> +    } >>>> + >>>> +    /* >>>> +     * Find the node directly above @base.  @base_overlay is a COW >>>> overlay, so >>>> +     * it must have a bdrv_cow_child(), but it is the immediate >>>> overlay of >>>> +     * @base, so between the two there can only be filters. >>>> +     */ >>>> +    above_base = base_overlay; >>>> +    if (bdrv_cow_bs(above_base) != base) { >>>> +        above_base = bdrv_cow_bs(above_base); >>>> +        while (bdrv_filter_bs(above_base) != base) { >>>> +            above_base = bdrv_filter_bs(above_base); >>>> +        } >>>> +    } >>>> + >>>> +    if (bdrv_freeze_backing_chain(bs, above_base, errp) < 0) { >>> When a concurrent stream job tries to freeze or remove the above_base >>> node, >>> >>> we will encounter the frozen node error. The above_base node is a part >>> of the >>> >>> concurrent job frozen chain. >> Correct. >> >>>>            return; >>>>        } >>>>    @@ -255,14 +279,19 @@ void stream_start(const char *job_id, >>>> BlockDriverState *bs, >>>>         * and resizes. Reassign the base node pointer because the >>>> backing BS of the >>>>         * bottom node might change after the call to >>>> bdrv_reopen_set_read_only() >>>>         * due to parallel block jobs running. >>>> +     * above_base node might change after the call to >>> Yes, if not frozen. >>>> +     * bdrv_reopen_set_read_only() due to parallel block jobs running. >>>>         */ >>>> -    base = backing_bs(bottom); >>>> -    for (iter = backing_bs(bs); iter && iter != base; iter = >>>> backing_bs(iter)) { >>>> +    base = bdrv_filter_or_cow_bs(above_base); >>>> +    for (iter = bdrv_filter_or_cow_bs(bs); iter != base; >>>> +         iter = bdrv_filter_or_cow_bs(iter)) >>>> +    { >>>>            block_job_add_bdrv(&s->common, "intermediate node", iter, 0, >>>>                               basic_flags, &error_abort); >>>>        } >>>>    -    s->bottom = bottom; >>>> +    s->base_overlay = base_overlay; >>>> +    s->above_base = above_base; >>> Generally, being the filter for a concurrent job, the above_base node >>> may be deleted any time >>> >>> and we will keep the dangling pointer. It may happen even earlier if >>> above_base is not frozen. >>> >>> If it is, as it here, we may get the frozen link error then. >> I’m not sure what you mean here.  Freezing it was absolutely >> intentional.  A dangling pointer would be a problem, but that’s why it’s >> frozen, so it stays around and can’t be deleted any time. >> >> Max > > The nodes we freeze should be in one context of the relevant job: > > filter->top_node->intermediate_node(s) > > We would not include the base or any filter above it to the frozen chain > > because they are of a different job context. They aren’t really, because we need to know the backing node of @device after the job. > Once 'this' job is completed, we set the current backing child of the > base_overlay > > and may not care of its character. If that is another job filter, it > will be replaced > > with the proper node afterwards. But what if there are filters above the base that the user wants to keep after the job? Max