All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Max Reitz <mreitz@redhat.com>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>
Cc: Kevin Wolf <kwolf@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
Date: Tue, 7 May 2019 13:30:01 +0000	[thread overview]
Message-ID: <344eec5c-8908-7b32-5d5f-61911253a621@virtuozzo.com> (raw)
In-Reply-To: <20190410202033.28617-3-mreitz@redhat.com>

10.04.2019 23:20, Max Reitz wrote:
> What bs->file and bs->backing mean depends on the node.  For filter
> nodes, both signify a node that will eventually receive all R/W
> accesses.  For format nodes, bs->file contains metadata and data, and
> bs->backing will not receive writes -- instead, writes are COWed to
> bs->file.  Usually.
> 
> In any case, it is not trivial to guess what a child means exactly with
> our currently limited form of expression.  It is better to introduce
> some functions that actually guarantee a meaning:
> 
> - bdrv_filtered_cow_child() will return the child that receives requests
>    filtered through COW.  That is, reads may or may not be forwarded
>    (depending on the overlay's allocation status), but writes never go to
>    this child.
> 
> - bdrv_filtered_rw_child() will return the child that receives requests
>    filtered through some very plain process.  Reads and writes issued to
>    the parent will go to the child as well (although timing, etc. may be
>    modified).
> 
> - All drivers but quorum (but quorum is pretty opaque to the general
>    block layer anyway) always only have one of these children: All read
>    requests must be served from the filtered_rw_child (if it exists), so
>    if there was a filtered_cow_child in addition, it would not receive
>    any requests at all.
>    (The closest here is mirror, where all requests are passed on to the
>    source, but with write-blocking, write requests are "COWed" to the
>    target.  But that just means that the target is a special child that
>    cannot be introspected by the generic block layer functions, and that
>    source is a filtered_rw_child.)
>    Therefore, we can also add bdrv_filtered_child() which returns that
>    one child (or NULL, if there is no filtered child).
> 
> Also, many places in the current block layer should be skipping filters
> (all filters or just the ones added implicitly, it depends) when going
> through a block node chain.  They do not do that currently, but this
> patch makes them.
> 
> One example for this is qemu-img map, which should skip filters and only
> look at the COW elements in the graph.  The change to iotest 204's
> reference output shows how using blkdebug on top of a COW node used to
> make qemu-img map disregard the rest of the backing chain, but with this
> patch, the allocation in the base image is reported correctly.
> 
> Furthermore, a note should be made that sometimes we do want to access
> bs->backing directly.  This is whenever the operation in question is not
> about accessing the COW child, but the "backing" child, be it COW or
> not.  This is the case in functions such as bdrv_open_backing_file() or
> whenever we have to deal with the special behavior of @backing as a
> blockdev option, which is that it does not default to null like all
> other child references do.
> 
> Finally, the query functions (query-block and query-named-block-nodes)
> are modified to return any filtered child under "backing", not just
> bs->backing or COW children.  This is so that filters do not interrupt
> the reported backing chain.  This changes the output of iotest 184, as
> the throttled node now appears as a backing child.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   qapi/block-core.json           |   4 +
>   include/block/block.h          |   1 +
>   include/block/block_int.h      |  40 +++++--
>   block.c                        | 210 +++++++++++++++++++++++++++------
>   block/backup.c                 |   8 +-
>   block/block-backend.c          |  16 ++-
>   block/commit.c                 |  33 +++---
>   block/io.c                     |  45 ++++---
>   block/mirror.c                 |  21 ++--
>   block/qapi.c                   |  30 +++--
>   block/stream.c                 |  13 +-
>   blockdev.c                     |  88 +++++++++++---
>   migration/block-dirty-bitmap.c |   4 +-
>   nbd/server.c                   |   6 +-
>   qemu-img.c                     |  29 ++---
>   tests/qemu-iotests/184.out     |   7 +-
>   tests/qemu-iotests/204.out     |   1 +
>   17 files changed, 411 insertions(+), 145 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 7ccbfff9d0..dbd9286e4a 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2502,6 +2502,10 @@
>   # On successful completion the image file is updated to drop the backing file
>   # and the BLOCK_JOB_COMPLETED event is emitted.
>   #
> +# In case @device is a filter node, block-stream modifies the first non-filter
> +# overlay node below it to point to base's backing node (or NULL if @base was
> +# not specified) instead of modifying @device itself.
> +#

Is it necessary, why we can't keep it as is, modifying exactly device node? May be,
user wants to use filter in stream process, throttling for example.

>   # @job-id: identifier for the newly-created block job. If
>   #          omitted, the device name will be used. (Since 2.7)
>   #
> diff --git a/include/block/block.h b/include/block/block.h
> index c7a26199aa..2005664f14 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -467,6 +467,7 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
>                                    const char *node_name,
>                                    Error **errp);
>   bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base);
> +bool bdrv_legacy_chain_contains(BlockDriverState *top, BlockDriverState *base);
>   BlockDriverState *bdrv_next_node(BlockDriverState *bs);
>   BlockDriverState *bdrv_next_all_states(BlockDriverState *bs);
>   
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 01e855a066..b22b1164f8 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -90,9 +90,11 @@ struct BlockDriver {
>       int instance_size;
>   
>       /* set to true if the BlockDriver is a block filter. Block filters pass
> -     * certain callbacks that refer to data (see block.c) to their bs->file if
> -     * the driver doesn't implement them. Drivers that do not wish to forward
> -     * must implement them and return -ENOTSUP.
> +     * certain callbacks that refer to data (see block.c) to their bs->file
> +     * or bs->backing (whichever one exists) if the driver doesn't implement
> +     * them. Drivers that do not wish to forward must implement them and return
> +     * -ENOTSUP.
> +     * Note that filters are not allowed to modify data.
>        */
>       bool is_filter;
>       /* for snapshots block filter like Quorum can implement the
> @@ -906,11 +908,6 @@ typedef enum BlockMirrorBackingMode {
>       MIRROR_LEAVE_BACKING_CHAIN,
>   } BlockMirrorBackingMode;
>   
> -static inline BlockDriverState *backing_bs(BlockDriverState *bs)
> -{
> -    return bs->backing ? bs->backing->bs : NULL;
> -}
> -
>   
>   /* Essential block drivers which must always be statically linked into qemu, and
>    * which therefore can be accessed without using bdrv_find_format() */
> @@ -1243,4 +1240,31 @@ int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
>   
>   int refresh_total_sectors(BlockDriverState *bs, int64_t hint);
>   
> +BdrvChild *bdrv_filtered_cow_child(BlockDriverState *bs);
> +BdrvChild *bdrv_filtered_rw_child(BlockDriverState *bs);
> +BdrvChild *bdrv_filtered_child(BlockDriverState *bs);
> +BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs);
> +BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs);
> +BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs);
> +
> +static inline BlockDriverState *child_bs(BdrvChild *child)
> +{
> +    return child ? child->bs : NULL;
> +}
> +
> +static inline BlockDriverState *bdrv_filtered_cow_bs(BlockDriverState *bs)
> +{
> +    return child_bs(bdrv_filtered_cow_child(bs));
> +}
> +
> +static inline BlockDriverState *bdrv_filtered_rw_bs(BlockDriverState *bs)
> +{
> +    return child_bs(bdrv_filtered_rw_child(bs));
> +}
> +
> +static inline BlockDriverState *bdrv_filtered_bs(BlockDriverState *bs)
> +{
> +    return child_bs(bdrv_filtered_child(bs));
> +}
> +
>   #endif /* BLOCK_INT_H */
> diff --git a/block.c b/block.c
> index 16615bc876..e8f6febda0 100644
> --- a/block.c
> +++ b/block.c
> @@ -556,11 +556,12 @@ int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
>   int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
>   {
>       BlockDriver *drv = bs->drv;
> +    BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
>   
>       if (drv && drv->bdrv_probe_blocksizes) {
>           return drv->bdrv_probe_blocksizes(bs, bsz);
> -    } else if (drv && drv->is_filter && bs->file) {
> -        return bdrv_probe_blocksizes(bs->file->bs, bsz);
> +    } else if (filtered) {
> +        return bdrv_probe_blocksizes(filtered, bsz);
>       }

OK: add support for backing-filters

>   
>       return -ENOTSUP;
> @@ -575,11 +576,12 @@ int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
>   int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo)
>   {
>       BlockDriver *drv = bs->drv;
> +    BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
>   
>       if (drv && drv->bdrv_probe_geometry) {
>           return drv->bdrv_probe_geometry(bs, geo);
> -    } else if (drv && drv->is_filter && bs->file) {
> -        return bdrv_probe_geometry(bs->file->bs, geo);
> +    } else if (filtered) {
> +        return bdrv_probe_geometry(filtered, geo);
>       }


OK: add support for backing-filters (short for backing-child-based filters, as
well as file-filtesr = file-child-based filters)

>   
>       return -ENOTSUP;
> @@ -2336,7 +2338,7 @@ static bool bdrv_inherits_from_recursive(BlockDriverState *child,
>   }
>   
>   /*
> - * Sets the backing file link of a BDS. A new reference is created; callers
> + * Sets the bs->backing link of a BDS. A new reference is created; callers
>    * which don't need their own reference any more must call bdrv_unref().
>    */
>   void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
> @@ -2345,7 +2347,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
>       bool update_inherits_from = bdrv_chain_contains(bs, backing_hd) &&
>           bdrv_inherits_from_recursive(backing_hd, bs);
>   
> -    if (bdrv_is_backing_chain_frozen(bs, backing_bs(bs), errp)) {
> +    if (bdrv_is_backing_chain_frozen(bs, child_bs(bs->backing), errp)) {

If we support file-filters for frozen backing chain, could it go through file child here?
Hmm, only in case when we are going to set backing hd for file-filter.. Hmm, could filter have
both file and backing children? Your new API don't restrict it, and choses backing as a default
in this case in bdrv_filtered_rw_child(), so, I assume you suppose possibility of it.

Here we don't want to check the chain, we exactly want to check backing link, so it should be
something like

if (bs->backing && bs->backing->frozen) {
    error_setg("backig exists and frozen!");
    return;
}


Hmm, on the other hand, if we have frozen backing chain, going through file child, we must not add
backing child to the node with file child, as it will change backing chain (which by default goes
through backing)..

Anyway, we don't need to check the whole backing chain, as we may find other frozen backing subchain,
far away of bs.. So, we possibly want to check

if (bdrv_filtered_child(bs) && bdrv_filtered_child(bs)->frozed) {
   ERROR
}


....

also, we'll need to check for frozen file child, when we want to replace it.


>           return;
>       }
>   
> @@ -3467,14 +3469,17 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>       /*
>        * Find the "actual" backing file by skipping all links that point
>        * to an implicit node, if any (e.g. a commit filter node).
> +     * We cannot use any of the bdrv_skip_*() functions here because
> +     * those return the first explicit node, while we are looking for
> +     * its overlay here.
>        */
>       overlay_bs = bs;
> -    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
> -        overlay_bs = backing_bs(overlay_bs);
> +    while (overlay_bs->backing && bdrv_filtered_bs(overlay_bs)->implicit) {
> +        overlay_bs = bdrv_filtered_bs(overlay_bs);
>       }

Agree, that we somehow want to support implicit file-fitlers here too.

>   
>       /* If we want to replace the backing file we need some extra checks */
> -    if (new_backing_bs != backing_bs(overlay_bs)) {
> +    if (new_backing_bs != child_bs(overlay_bs->backing)) {
>           /* Check for implicit nodes between bs and its backing file */
>           if (bs != overlay_bs) {
>               error_setg(errp, "Cannot change backing link if '%s' has "
> @@ -3482,8 +3487,8 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>               return -EPERM;
>           }
>           /* Check if the backing link that we want to replace is frozen */
> -        if (bdrv_is_backing_chain_frozen(overlay_bs, backing_bs(overlay_bs),
> -                                         errp)) {
> +        if (bdrv_is_backing_chain_frozen(overlay_bs,
> +                                         child_bs(overlay_bs->backing), errp)) {

Again, I think we need bdrv_is_child_frozen() to check such things.

>               return -EPERM;
>           }
>           reopen_state->replace_backing_bs = true;
> @@ -3634,7 +3639,7 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue,
>        * its metadata. Otherwise the 'backing' option can be omitted.
>        */
>       if (drv->supports_backing && reopen_state->backing_missing &&
> -        (backing_bs(reopen_state->bs) || reopen_state->bs->backing_file[0])) {
> +        (reopen_state->bs->backing || reopen_state->bs->backing_file[0])) {

and if we skip implicit filters in bdrv_backing_chain_next(), shouldn't we skip them
here too?

>           error_setg(errp, "backing is missing for '%s'",
>                      reopen_state->bs->node_name);
>           ret = -EINVAL;
> @@ -3779,7 +3784,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
>        * from bdrv_set_backing_hd()) has the new values.
>        */
>       if (reopen_state->replace_backing_bs) {
> -        BlockDriverState *old_backing_bs = backing_bs(bs);
> +        BlockDriverState *old_backing_bs = child_bs(bs->backing);
>           assert(!old_backing_bs || !old_backing_bs->implicit);
>           /* Abort the permission update on the backing bs we're detaching */
>           if (old_backing_bs) {
> @@ -4203,8 +4208,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>   BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
>                                       BlockDriverState *bs)
>   {
> -    while (active && bs != backing_bs(active)) {
> -        active = backing_bs(active);
> +    while (active && bs != bdrv_filtered_bs(active)) {

need to adjust comment to the function then, as we may find file-based-overlay, not backing.

> +        active = bdrv_filtered_bs(active);
>       }
>   
>       return active;
> @@ -4226,11 +4231,11 @@ bool bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState *base,
>   {
>       BlockDriverState *i;
>   
> -    for (i = bs; i != base; i = backing_bs(i)) {
> +    for (i = bs; i != base; i = child_bs(i->backing)) {
>           if (i->backing && i->backing->frozen) {
>               error_setg(errp, "Cannot change '%s' link from '%s' to '%s'",
>                          i->backing->name, i->node_name,
> -                       backing_bs(i)->node_name);
> +                       i->backing->bs->node_name);
>               return true;
>           }
>       }
> @@ -4254,7 +4259,7 @@ int bdrv_freeze_backing_chain(BlockDriverState *bs, BlockDriverState *base,
>           return -EPERM;
>       }
>   
> -    for (i = bs; i != base; i = backing_bs(i)) {
> +    for (i = bs; i != base; i = child_bs(i->backing)) {
>           if (i->backing) {
>               i->backing->frozen = true;
>           }
> @@ -4272,7 +4277,7 @@ void bdrv_unfreeze_backing_chain(BlockDriverState *bs, BlockDriverState *base)
>   {
>       BlockDriverState *i;
>   
> -    for (i = bs; i != base; i = backing_bs(i)) {
> +    for (i = bs; i != base; i = child_bs(i->backing)) {
>           if (i->backing) {
>               assert(i->backing->frozen);
>               i->backing->frozen = false;
> @@ -4342,9 +4347,7 @@ int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
>        * other intermediate nodes have been dropped.
>        * If 'top' is an implicit node (e.g. "commit_top") we should skip
>        * it because no one inherits from it. We use explicit_top for that. */
> -    while (explicit_top && explicit_top->implicit) {
> -        explicit_top = backing_bs(explicit_top);
> -    }
> +    explicit_top = bdrv_skip_implicit_filters(explicit_top);
>       update_inherits_from = bdrv_inherits_from_recursive(base, explicit_top);
>   
>       /* success - we can delete the intermediate states, and link top->base */
> @@ -4494,10 +4497,14 @@ bool bdrv_is_sg(BlockDriverState *bs)
>   
>   bool bdrv_is_encrypted(BlockDriverState *bs)
>   {
> -    if (bs->backing && bs->backing->bs->encrypted) {
> +    BlockDriverState *filtered = bdrv_filtered_bs(bs);
> +    if (bs->encrypted) {
> +        return true;
> +    }
> +    if (filtered && bdrv_is_encrypted(filtered)) {
>           return true;
>       }
> -    return bs->encrypted;
> +    return false;
>   }

one backing child -> recursion through extended backing chain

>   
>   const char *bdrv_get_format_name(BlockDriverState *bs)
> @@ -4794,7 +4801,21 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
>   bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base)
>   {
>       while (top && top != base) {
> -        top = backing_bs(top);
> +        top = bdrv_filtered_bs(top);
> +    }
> +
> +    return top != NULL;
> +}

support file-filters

> +
> +/*
> + * Same as bdrv_chain_contains(), but skip implicitly added R/W filter
> + * nodes and do not move past explicitly added R/W filters.
> + */
> +bool bdrv_legacy_chain_contains(BlockDriverState *top, BlockDriverState *base)
> +{
> +    top = bdrv_skip_implicit_filters(top);
> +    while (top && top != base) {
> +        top = bdrv_skip_implicit_filters(bdrv_filtered_cow_bs(top));
>       }

ok

>   
>       return top != NULL;
> @@ -4866,20 +4887,24 @@ int bdrv_has_zero_init_1(BlockDriverState *bs)
>   
>   int bdrv_has_zero_init(BlockDriverState *bs)
>   {
> +    BlockDriverState *filtered;
> +
>       if (!bs->drv) {
>           return 0;
>       }
>   
>       /* If BS is a copy on write image, it is initialized to
>          the contents of the base image, which may not be zeroes.  */
> -    if (bs->backing) {
> +    if (bdrv_filtered_cow_child(bs)) {
>           return 0;
>       }
>       if (bs->drv->bdrv_has_zero_init) {
>           return bs->drv->bdrv_has_zero_init(bs);
>       }
> -    if (bs->file && bs->drv->is_filter) {
> -        return bdrv_has_zero_init(bs->file->bs);
> +
> +    filtered = bdrv_filtered_rw_bs(bs);
> +    if (filtered) {
> +        return bdrv_has_zero_init(filtered);
>       }

add recursion for filters

>   
>       /* safe default */
> @@ -4890,7 +4915,7 @@ bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs)
>   {
>       BlockDriverInfo bdi;
>   
> -    if (bs->backing) {
> +    if (bdrv_filtered_cow_child(bs)) {
>           return false;
>       }
>   
> @@ -4924,8 +4949,9 @@ int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
>           return -ENOMEDIUM;
>       }
>       if (!drv->bdrv_get_info) {
> -        if (bs->file && drv->is_filter) {
> -            return bdrv_get_info(bs->file->bs, bdi);
> +        BlockDriverState *filtered = bdrv_filtered_rw_bs(bs);
> +        if (filtered) {
> +            return bdrv_get_info(filtered, bdi);
>           }
>           return -ENOTSUP;
>       }
> @@ -5028,7 +5054,17 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
>   
>       is_protocol = path_has_protocol(backing_file);
>   
> -    for (curr_bs = bs; curr_bs->backing; curr_bs = curr_bs->backing->bs) {
> +    /*
> +     * Being largely a legacy function, skip any filters here
> +     * (because filters do not have normal filenames, so they cannot
> +     * match anyway; and allowing json:{} filenames is a bit out of
> +     * scope).
> +     */
> +    for (curr_bs = bdrv_skip_rw_filters(bs);
> +         bdrv_filtered_cow_child(curr_bs) != NULL;
> +         curr_bs = bdrv_backing_chain_next(curr_bs))
> +    {
> +        BlockDriverState *bs_below = bdrv_backing_chain_next(curr_bs);
>   
>           /* If either of the filename paths is actually a protocol, then
>            * compare unmodified paths; otherwise make paths relative */
> @@ -5036,7 +5072,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
>               char *backing_file_full_ret;
>   
>               if (strcmp(backing_file, curr_bs->backing_file) == 0) {
> -                retval = curr_bs->backing->bs;
> +                retval = bs_below;
>                   break;
>               }
>               /* Also check against the full backing filename for the image */
> @@ -5046,7 +5082,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
>                   bool equal = strcmp(backing_file, backing_file_full_ret) == 0;
>                   g_free(backing_file_full_ret);
>                   if (equal) {
> -                    retval = curr_bs->backing->bs;
> +                    retval = bs_below;
>                       break;
>                   }
>               }
> @@ -5072,7 +5108,7 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
>               g_free(filename_tmp);
>   
>               if (strcmp(backing_file_full, filename_full) == 0) {
> -                retval = curr_bs->backing->bs;
> +                retval = bs_below;
>                   break;
>               }
>           }
> @@ -6237,3 +6273,107 @@ bool bdrv_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name,
>   
>       return drv->bdrv_can_store_new_dirty_bitmap(bs, name, granularity, errp);
>   }
> +
> +/*
> + * Return the child that @bs acts as an overlay for, and from which data may be
> + * copied in COW or COR operations.  Usually this is the backing file.
> + */
> +BdrvChild *bdrv_filtered_cow_child(BlockDriverState *bs)
> +{
> +    if (!bs || !bs->drv) {
> +        return NULL;
> +    }
> +
> +    if (bs->drv->is_filter) {
> +        return NULL;
> +    }
> +
> +    return bs->backing;
> +}
> +
> +/*
> + * If @bs acts as a pass-through filter for one of its children,
> + * return that child.  "Pass-through" means that write operations to
> + * @bs are forwarded to that child instead of triggering COW.
> + */
> +BdrvChild *bdrv_filtered_rw_child(BlockDriverState *bs)
> +{
> +    if (!bs || !bs->drv) {
> +        return NULL;
> +    }
> +
> +    if (!bs->drv->is_filter) {
> +        return NULL;
> +    }
> +
> +    return bs->backing ?: bs->file;
> +}
> +
> +/*
> + * Return any filtered child, independently of how it reacts to write
> + * accesses and whether data is copied onto this BDS through COR.
> + */
> +BdrvChild *bdrv_filtered_child(BlockDriverState *bs)
> +{
> +    BdrvChild *cow_child = bdrv_filtered_cow_child(bs);
> +    BdrvChild *rw_child = bdrv_filtered_rw_child(bs);
> +
> +    /* There can only be one filtered child at a time */
> +    assert(!(cow_child && rw_child));
> +
> +    return cow_child ?: rw_child;
> +}
> +
> +static BlockDriverState *bdrv_skip_filters(BlockDriverState *bs,
> +                                           bool stop_on_explicit_filter)
> +{
> +    BdrvChild *filtered;
> +
> +    if (!bs) {
> +        return NULL;
> +    }
> +
> +    while (!(stop_on_explicit_filter && !bs->implicit)) {
> +        filtered = bdrv_filtered_rw_child(bs);
> +        if (!filtered) {
> +            break;
> +        }
> +        bs = filtered->bs;
> +    }
> +    /*
> +     * Note that this treats nodes with bs->drv == NULL as not being
> +     * R/W filters (bs->drv == NULL should be replaced by something
> +     * else anyway).
> +     * The advantage of this behavior is that this function will thus
> +     * always return a non-NULL value (given a non-NULL @bs).
> +     */
> +
> +    return bs;
> +}
> +
> +/*
> + * Return the first BDS that has not been added implicitly or that
> + * does not have an RW-filtered child down the chain starting from @bs
> + * (including @bs itself).
> + */
> +BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs)
> +{
> +    return bdrv_skip_filters(bs, true);
> +}
> +
> +/*
> + * Return the first BDS that does not have an RW-filtered child down
> + * the chain starting from @bs (including @bs itself).
> + */
> +BlockDriverState *bdrv_skip_rw_filters(BlockDriverState *bs)
> +{
> +    return bdrv_skip_filters(bs, false);
> +}
> +
> +/*
> + * For a backing chain, return the first non-filter backing image.
> + */
> +BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs)
> +{
> +    return bdrv_skip_rw_filters(bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs)));
> +}
> diff --git a/block/backup.c b/block/backup.c
> index 9988753249..9c08353b23 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -577,6 +577,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>       int64_t len;
>       BlockDriverInfo bdi;
>       BackupBlockJob *job = NULL;
> +    bool target_does_cow;
>       int ret;
>   
>       assert(bs);
> @@ -671,8 +672,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>       /* If there is no backing file on the target, we cannot rely on COW if our
>        * backup cluster size is smaller than the target cluster size. Even for
>        * targets with a backing file, try to avoid COW if possible. */
> +    target_does_cow = bdrv_filtered_cow_child(target);

So, you excluded false-positive case when target is backing-filter. I think, we'd better skip
filters here:

target_does_cow = bdrv_filtered_cow_child(bdrv_skip_rw_filters(target))

>       ret = bdrv_get_info(target, &bdi);
> -    if (ret == -ENOTSUP && !target->backing) {
> +    if (ret == -ENOTSUP && !target_does_cow) {
>           /* Cluster size is not defined */
>           warn_report("The target block device doesn't provide "
>                       "information about the block size and it doesn't have a "
> @@ -681,14 +683,14 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>                       "this default, the backup may be unusable",
>                       BACKUP_CLUSTER_SIZE_DEFAULT);
>           job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
> -    } else if (ret < 0 && !target->backing) {
> +    } else if (ret < 0 && !target_does_cow) {
>           error_setg_errno(errp, -ret,
>               "Couldn't determine the cluster size of the target image, "
>               "which has no backing file");
>           error_append_hint(errp,
>               "Aborting, since this may create an unusable destination image\n");
>           goto error;
> -    } else if (ret < 0 && target->backing) {
> +    } else if (ret < 0 && target_does_cow) {
>           /* Not fatal; just trudge on ahead. */
>           job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
>       } else {
> diff --git a/block/block-backend.c b/block/block-backend.c
> index f78e82a707..aa9a1d84a6 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -2089,11 +2089,17 @@ int blk_commit_all(void)
>           AioContext *aio_context = blk_get_aio_context(blk);
>   
>           aio_context_acquire(aio_context);
> -        if (blk_is_inserted(blk) && blk->root->bs->backing) {
> -            int ret = bdrv_commit(blk->root->bs);
> -            if (ret < 0) {
> -                aio_context_release(aio_context);
> -                return ret;
> +        if (blk_is_inserted(blk)) {
> +            BlockDriverState *non_filter;
> +
> +            /* Legacy function, so skip implicit filters */
> +            non_filter = bdrv_skip_implicit_filters(blk->root->bs);
> +            if (bdrv_filtered_cow_child(non_filter)) {
> +                int ret = bdrv_commit(non_filter);
> +                if (ret < 0) {
> +                    aio_context_release(aio_context);
> +                    return ret;
> +                }
>               }
>           }
>           aio_context_release(aio_context);
> diff --git a/block/commit.c b/block/commit.c
> index 02eab34925..252007fd57 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -113,7 +113,7 @@ static void commit_abort(Job *job)
>        * something to base, the intermediate images aren't valid any more. */
>       bdrv_child_try_set_perm(s->commit_top_bs->backing, 0, BLK_PERM_ALL,
>                               &error_abort);
> -    bdrv_replace_node(s->commit_top_bs, backing_bs(s->commit_top_bs),
> +    bdrv_replace_node(s->commit_top_bs, s->commit_top_bs->backing->bs,
>                         &error_abort);
>   
>       bdrv_unref(s->commit_top_bs);
> @@ -324,10 +324,16 @@ void commit_start(const char *job_id, BlockDriverState *bs,
>       s->commit_top_bs = commit_top_bs;
>       bdrv_unref(commit_top_bs);
>   
> -    /* Block all nodes between top and base, because they will
> -     * disappear from the chain after this operation. */
> +    /*
> +     * Block all nodes between top and base, because they will
> +     * disappear from the chain after this operation.
> +     * Note that this assumes that the user is fine with removing all
> +     * nodes (including R/W filters) between top and base.  Assuring
> +     * this is the responsibility of the interface (i.e. whoever calls
> +     * commit_start()).
> +     */
>       assert(bdrv_chain_contains(top, base));
> -    for (iter = top; iter != base; iter = backing_bs(iter)) {
> +    for (iter = top; iter != base; iter = bdrv_filtered_bs(iter)) {
>           /* XXX BLK_PERM_WRITE needs to be allowed so we don't block ourselves
>            * at s->base (if writes are blocked for a node, they are also blocked
>            * for its backing file). The other options would be a second filter
> @@ -414,19 +420,22 @@ int bdrv_commit(BlockDriverState *bs)
>       if (!drv)
>           return -ENOMEDIUM;
>   
> -    if (!bs->backing) {
> +    backing_file_bs = bdrv_filtered_cow_bs(bs);
> +
> +    if (!backing_file_bs) {
>           return -ENOTSUP;
>       }
>   
>       if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_COMMIT_SOURCE, NULL) ||
> -        bdrv_op_is_blocked(bs->backing->bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL)) {
> +        bdrv_op_is_blocked(backing_file_bs, BLOCK_OP_TYPE_COMMIT_TARGET, NULL))
> +    {
>           return -EBUSY;
>       }
>   
> -    ro = bs->backing->bs->read_only;
> +    ro = backing_file_bs->read_only;
>   
>       if (ro) {
> -        if (bdrv_reopen_set_read_only(bs->backing->bs, false, NULL)) {
> +        if (bdrv_reopen_set_read_only(backing_file_bs, false, NULL)) {
>               return -EACCES;
>           }
>       }
> @@ -441,8 +450,6 @@ int bdrv_commit(BlockDriverState *bs)
>       }
>   
>       /* Insert commit_top block node above backing, so we can write to it */
> -    backing_file_bs = backing_bs(bs);
> -
>       commit_top_bs = bdrv_new_open_driver(&bdrv_commit_top, NULL, BDRV_O_RDWR,
>                                            &local_err);
>       if (commit_top_bs == NULL) {
> @@ -528,15 +535,13 @@ ro_cleanup:
>       qemu_vfree(buf);
>   
>       blk_unref(backing);
> -    if (backing_file_bs) {
> -        bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);
> -    }
> +    bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);
>       bdrv_unref(commit_top_bs);
>       blk_unref(src);
>   
>       if (ro) {
>           /* ignoring error return here */
> -        bdrv_reopen_set_read_only(bs->backing->bs, true, NULL);
> +        bdrv_reopen_set_read_only(backing_file_bs, true, NULL);
>       }
>   
>       return ret;
> diff --git a/block/io.c b/block/io.c
> index dfc153b8d8..83c2b6b46a 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -118,8 +118,17 @@ static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
>   void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
>   {
>       BlockDriver *drv = bs->drv;
> +    BlockDriverState *storage_bs;
> +    BlockDriverState *cow_bs = bdrv_filtered_cow_bs(bs);
>       Error *local_err = NULL;
>   
> +    /*
> +     * FIXME: There should be a function for this, and in fact there
> +     * will be as of a follow-up patch.
> +     */
> +    storage_bs =
> +        child_bs(bs->file) ?: bdrv_filtered_rw_bs(bs);
> +
>       memset(&bs->bl, 0, sizeof(bs->bl));
>   
>       if (!drv) {
> @@ -131,13 +140,13 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
>                                   drv->bdrv_aio_preadv) ? 1 : 512;
>   
>       /* Take some limits from the children as a default */
> -    if (bs->file) {
> -        bdrv_refresh_limits(bs->file->bs, &local_err);
> +    if (storage_bs) {
> +        bdrv_refresh_limits(storage_bs, &local_err);
>           if (local_err) {
>               error_propagate(errp, local_err);
>               return;
>           }
> -        bdrv_merge_limits(&bs->bl, &bs->file->bs->bl);
> +        bdrv_merge_limits(&bs->bl, &storage_bs->bl);
>       } else {
>           bs->bl.min_mem_alignment = 512;
>           bs->bl.opt_mem_alignment = getpagesize();
> @@ -146,13 +155,13 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
>           bs->bl.max_iov = IOV_MAX;
>       }
>   
> -    if (bs->backing) {
> -        bdrv_refresh_limits(bs->backing->bs, &local_err);
> +    if (cow_bs) {
> +        bdrv_refresh_limits(cow_bs, &local_err);
>           if (local_err) {
>               error_propagate(errp, local_err);
>               return;
>           }
> -        bdrv_merge_limits(&bs->bl, &bs->backing->bs->bl);
> +        bdrv_merge_limits(&bs->bl, &cow_bs->bl);
>       }
>   
>       /* Then let the driver override it */
> @@ -2139,11 +2148,12 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
>       if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
>           ret |= BDRV_BLOCK_ALLOCATED;
>       } else if (want_zero) {
> +        BlockDriverState *cow_bs = bdrv_filtered_cow_bs(bs);
> +
>           if (bdrv_unallocated_blocks_are_zero(bs)) {
>               ret |= BDRV_BLOCK_ZERO;
> -        } else if (bs->backing) {
> -            BlockDriverState *bs2 = bs->backing->bs;
> -            int64_t size2 = bdrv_getlength(bs2);
> +        } else if (cow_bs) {
> +            int64_t size2 = bdrv_getlength(cow_bs);
>   
>               if (size2 >= 0 && offset >= size2) {
>                   ret |= BDRV_BLOCK_ZERO;
> @@ -2208,7 +2218,7 @@ static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
>       bool first = true;
>   
>       assert(bs != base);
> -    for (p = bs; p != base; p = backing_bs(p)) {
> +    for (p = bs; p != base; p = bdrv_filtered_bs(p)) {
>           ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
>                                      file);

Interesting that for filters who use bdrv_co_block_status_from_backing and
bdrv_co_block_status_from_file we will finally call .bdrv_co_block_status of
underalying real node two or more times.. It's not wrong but obviously not optimal.


>           if (ret < 0) {
> @@ -2294,7 +2304,7 @@ int bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
>   int bdrv_block_status(BlockDriverState *bs, int64_t offset, int64_t bytes,
>                         int64_t *pnum, int64_t *map, BlockDriverState **file)
>   {
> -    return bdrv_block_status_above(bs, backing_bs(bs),
> +    return bdrv_block_status_above(bs, bdrv_filtered_bs(bs),
>                                      offset, bytes, pnum, map, file);
>   }
>   
> @@ -2304,9 +2314,9 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
>       int ret;
>       int64_t dummy;
>   
> -    ret = bdrv_common_block_status_above(bs, backing_bs(bs), false, offset,
> -                                         bytes, pnum ? pnum : &dummy, NULL,
> -                                         NULL);
> +    ret = bdrv_common_block_status_above(bs, bdrv_filtered_bs(bs), false,
> +                                         offset, bytes, pnum ? pnum : &dummy,
> +                                         NULL, NULL);
>       if (ret < 0) {
>           return ret;
>       }
> @@ -2360,7 +2370,7 @@ int bdrv_is_allocated_above(BlockDriverState *top,
>               n = pnum_inter;
>           }
>   
> -        intermediate = backing_bs(intermediate);
> +        intermediate = bdrv_filtered_bs(intermediate);
>       }
>   
>       *pnum = n;
> @@ -3135,8 +3145,9 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset,
>       }
>   
>       if (!drv->bdrv_co_truncate) {
> -        if (bs->file && drv->is_filter) {
> -            ret = bdrv_co_truncate(bs->file, offset, prealloc, errp);
> +        BdrvChild *filtered = bdrv_filtered_rw_child(bs);
> +        if (filtered) {
> +            ret = bdrv_co_truncate(filtered, offset, prealloc, errp);
>               goto out;
>           }
>           error_setg(errp, "Image format driver does not support resize");
> diff --git a/block/mirror.c b/block/mirror.c
> index 8b2404051f..80cef587f0 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -660,8 +660,9 @@ static int mirror_exit_common(Job *job)
>                               &error_abort);
>       if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
>           BlockDriverState *backing = s->is_none_mode ? src : s->base;
> -        if (backing_bs(target_bs) != backing) {
> -            bdrv_set_backing_hd(target_bs, backing, &local_err);
> +        if (bdrv_backing_chain_next(target_bs) != backing) {
> +            bdrv_set_backing_hd(bdrv_skip_rw_filters(target_bs), backing,

hmm, here you support filters above target_bs ...

> +                                &local_err);
>               if (local_err) {
>                   error_report_err(local_err);
>                   ret = -EPERM;
> @@ -711,7 +712,7 @@ static int mirror_exit_common(Job *job)
>       block_job_remove_all_bdrv(bjob);
>       bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
>                               &error_abort);
> -    bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), &error_abort);
> +    bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, &error_abort);
>   
>       /* We just changed the BDS the job BB refers to (with either or both of the
>        * bdrv_replace_node() calls), so switch the BB back so the cleanup does
> @@ -903,7 +904,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>       } else {
>           s->target_cluster_size = BDRV_SECTOR_SIZE;
>       }
> -    if (backing_filename[0] && !target_bs->backing &&
> +    if (backing_filename[0] && !bdrv_filtered_cow_child(target_bs) &&

... and here - not

[stopped here for now]



-- 
Best regards,
Vladimir

  parent reply	other threads:[~2019-05-07 13:31 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-10 20:20 [Qemu-devel] [PATCH v4 00/11] block: Deal with filters Max Reitz
2019-04-10 20:20 ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 01/11] block: Mark commit and mirror as filter drivers Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-16 10:02   ` Vladimir Sementsov-Ogievskiy
2019-04-17 16:22     ` Max Reitz
2019-04-18  8:36       ` Vladimir Sementsov-Ogievskiy
2019-04-24 15:23         ` Max Reitz
2019-04-19 10:23       ` Vladimir Sementsov-Ogievskiy
2019-04-24 16:36         ` Max Reitz
2019-05-07  9:32           ` Vladimir Sementsov-Ogievskiy
2019-05-07 13:15             ` Max Reitz
2019-05-07 13:33               ` Vladimir Sementsov-Ogievskiy
2019-05-31 16:26     ` Max Reitz
2019-05-31 17:02       ` Max Reitz
2019-05-07 13:30   ` Vladimir Sementsov-Ogievskiy [this message]
2019-05-07 15:13     ` Max Reitz
2019-05-17 11:50       ` Vladimir Sementsov-Ogievskiy
2019-05-23 14:49         ` Max Reitz
2019-05-23 15:08           ` Vladimir Sementsov-Ogievskiy
2019-05-23 15:56             ` Max Reitz
2019-05-17 14:50   ` Vladimir Sementsov-Ogievskiy
2019-05-23 17:27     ` Max Reitz
2019-05-24  8:12       ` Vladimir Sementsov-Ogievskiy
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 03/11] block: Storage child access function Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-05-20 10:41   ` Vladimir Sementsov-Ogievskiy
2019-05-28 18:09     ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 04/11] block: Inline bdrv_co_block_status_from_*() Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-05-21  8:57   ` Vladimir Sementsov-Ogievskiy
2019-05-28 17:58     ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 05/11] block: Fix check_to_replace_node() Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 06/11] iotests: Add tests for mirror @replaces loops Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 07/11] block: Leave BDS.backing_file constant Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 08/11] iotests: Add filter commit test cases Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 09/11] iotests: Add filter mirror " Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 10/11] iotests: Add test for commit in sub directory Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 11/11] iotests: Test committing to overridden backing Max Reitz
2019-04-10 20:20   ` Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=344eec5c-8908-7b32-5d5f-61911253a621@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.